Redirect Input and Output For an Existing Process On Linux

Redirecting input and output of an executable is a standard and trivial practice in Linux operations. When launching a process, the user can trivially redirect the output of the process away from stdout via the > operator and can redirect input away from stdin using the < operator.

But that only works if stdin or stdout are switching before the process is launched. What if we have a pre-existing process that we would like to change a file handles for, but we would like to avoid restarting the process? For example, say you have a cron script execute sudo my_command from an environment where you can't provide input (perhaps you meant to use gksudo instead). You might be able to kill the sudo process, but perhaps when sudo exits the script will proceed with undesirable results. You could kill the script too, but assume that you very badly don't want to abort the script in a semi-completed state. (Obviously a well-written script shouldn't have this sort of behavior, but the assumption is that we are in an unusual situation.) The ideal solution would be to redirect input into the hanging sudo process allowing it to succeed and your script to continue.

Thankfully, we can perform redirection on existing processes by explicitly manipulating the existing file descriptors. The method for doing so is fairly straight forward:

  1. Determine which file descriptor you need to change.
  2. Attach to the target process via gdb.
  3. Create a file (to redirect output) or named pipe (to redirect input).
  4. Use gdb to point the desired file descriptor to the file or pipe.
  5. If applicable, send the necessary content through the pipe.

In terminal A, find the PID of the process, call it TARGET_PID. First, list the target's existing file descriptors:

$ ls -l /proc/TARGET_PID/fd/

When we are done we will double check this list to ensure we made the changes we wanted to.

Now you need to determine which file descriptor (hereon "FD") you want to change. Remember, we can only manipulate existing FDs, not add new ones. (For those who don't know: FD 0 is stdin (standard input), FD 1 is stdout (standard output), FD 2 is stderr (standard error output). These are the the base FDs that every process will have, your target process may or may not have more.) Examples:

  • To change the input for a typical terminal program you likely need to to change stdin.
  • To change output file X to a different file Y, you need to find which FD on the list is linked to X.
  • For sudo, to change the input that accepts the user password you actually need to change FD 4, which should point to /dev/tty or something similar.

We'll call the the FD number that you want to change TARGET_FD.

For using a named pipe: First create the pipe using

$ mkfifo /my/file/name

We'll call this path TARGET_FILE. Then provide input to the pipe, or else gdb will not be able to open it in a later step. Provide the content by, for example, echo MyContent > TARGET_FILE from a separate terminal or as a backgrounded process. MyContent should be the content you want to send the process.

For using a normal file: Create an output file called TARGET_FILE.

Attach gdb to the process:

$ gdb -p TARGET_PID

Within gdb, close the file descriptor using the close() system call:

(gdb) p close(TARGET_FD)
$1 = 0

The right-hand side of the output is the return value of the call we just executed. A value of 0 means that all went well.

Now create an FD using the open() system call. This must be done after "close()", because file descriptors are issued sequentially from the lowest unused non-negative integer and we are making use of the fact that once we delete TARGET_FD it is now the lowest unused file descriptor, so the next one created will use the same number.

(gdb) p open("TARGET_FILE",0600)
$2 = TARGET_FD

If the right-hand side number is equal to TARGET_FD, that means we just successfully created an FD and it got the same FD that we just closed, which is perfect. Remember, if you are using a named pipe, this step may (will?) hang if there is no output going into the named pipe.

Now quit gdb:

(gdb) q

At this point, you should be done. If you are redirecting output, the redirection should be under way. If you are redirecting input, the first input should be consumed from the pipe and you can continue to provide input as necessary by sending it into the pipe; when you are done simply delete the pipe using rm.

We can verify hat we were successful by checking the target process's FDs. Run ls -l /proc/TARGET_PID/fd/ again and compare the output against the output from the first time. If all went well then TARGET_FD should be changed to point at TARGET_FILE.

FoxyProxy, Firefox 3.5, and DNS Leaking

[Update: Jan. 24, 2010] The DNS leaking problem described in this article applied to FoxyProxy v2.14. On Jan. 12, FoxyProxy v2.17 fixed the problem.


FoxyProxy is a popular Firefox extension that enables users to, setup, easily manage, and quickly switch between multiple proxy configurations. One of the most common uses of a proxy server is for security/privacy. By establishing an encrypted connection (usually via SSH) with a proxy server on a trusted network, you can have your web traffic go through an encrypted "pipe" to that server and have that server send and receive web requests on your behalf, relaying data to and from you only through that pipe. By doing this you eliminate the risk that someone on your current network could see your HTTP traffic. Maybe you don't trust other clients on the network, maybe you don't trust the gateway, it doesn't matter -- your HTTP(S) traffic is encrypted and shielded from prying eyes. (Readers unfamiliar with the concept of using HTTP proxies through SSH tunnels are encouraged to research the matter, there are many well-written tutorials available.)

There are many other popular uses of proxy servers, but the application of encrypted web traffic is of concern in this case for the following reason: A key problem that arises when using web proxy servers is the issue of handling DNS requests. DNS does not go through the HTTP protocol, so even if HTTP is being forwarded to a proxy the DNS requests may not be. If DNS requests are sent out over the network like normal then eavesdroppers can still read them. So although the actual traffic may be encrypted, the DNS queries behave normally and may cause the same problems that using an encrypted tunnel was designed to avoid in the first place. A situation in which HTTP traffic is encrypted but DNS is not is referred to as "DNS leaking". When using a proxy for the benefit of security or privacy, DNS leaking may be just as bad as non-encrypted traffic.

Solving the DNS leaking problem is simple. One type of proxy, SOCKS5, can forward DNS requests as well as HTTP(S) data. A user simply needs to tell their browser to use the SOCKS5 proxy for both HTTP(S) and DNS and then both will be routed through the encrypted stream, allowing the user to surf the web with greatly strengthened security and privacy.

However, Firefox users who use FoxyProxy at the moment will encounter a problem when using DNS forwarding to a SOCKS5 proxy. When using FoxyProxy, DNS leaking occurs even when it is configured not to, which has made many users very upset. Initially many people thought the problem was with Firefox 3.5, but others confirmed it was only present with FoxyProxy installed. Unfortunately, however, not everyone is convinced that this is FoxyProxy-related behavior and I have not found anyone who has presented a solution yet. I plan to do both.

This is the basic setup for my tests:

  • I set up an SSH server.
  • I established an SSH connection and used the built-in SOCKS5 functionality of the SSH server daemon:
    $ ssh username@myserver -D localhost:8080
    
    (For the non-SSH inclined: This command forwarded all traffic my client sends to itself on port 8080 through the SSH connection to the SSH server, which then acts as a SOCKS5 proxy and sends the data on to the destination.)
  • I used Wireshark to monitor all packets, specifically DNS requests, sent or received my network's interface. Note that DNS requests tunneled over the SSH connection to the SOCKS5 proxy are not visible to the packet sniffer.
  • I monitored my Firefox configuration in about:config. (All network proxy-related settings are under the filter network.proxy.)
  • I used Firefox v3.5.5 and FoxyProxy v2.14.

Using this I was able to monitor all DNS requests while I experimented with Firefox and FoxyProxy using a SOCKS5 proxy. I did a base test with no proxy configuration, a test using Firefox's included proxy management, and a test using Foxyproxy for proxy management.

Using no proxy

Starting with a default configuration (SSH SOCKS connection established but no proxy settings configured to use it) I visited several websites such as google.com, yahoo.com, and schneier.com. This was the simple base test.

I checked showmyip.com to get my IP address.

The relevant about:config settings:

network.proxy.socks               ""
network.proxy.socks_port          0
network.proxy.socks_remote_dns    false
network.proxy.type                0

Via Wireshark I watched as the websites generated normal DNS requests over the standard network.

Using Firefox to configure proxy settings

I restarted Firefox to avoid any cached DNS entries. Then, without FoxyProxy installed, I setup my SOCKS5 proxy. (Note that FoxyProxy replaces the standard Firefox proxy editor, so it is impossible to not use FoxyProxy when it is installed.)

Under Firefox's Preferences/Tools (depending on your operating system) I went to the "Advanced" tab, "Network" sub-tab, and opened "Settings". I chose "Manual proxy configuration" and entered "localhost" for the SOCKS Host and "8080" for the port.

Unfortunately, Firefox v3.5 does not support a GUI method of enabling DNS SOCKS5 proxying, so I had to manually go to about:config and enable it by setting network.proxy.socks_remote_dns to "true".

I checked showmyip.com to ensure that my IP address displayed as coming from the server and not my client. It did show as coming from the server, so Firefox was using the proxy.

The final about:config settings were:

network.proxy.socks               localhost
network.proxy.socks_port          8080
network.proxy.socks_remote_dns    true
network.proxy.type                1

I visited the same websites. Via Wireshark, I did not see any DNS requests sent over the standard network. Firefox channeled both the HTTP and DNS data through the SSH tunnel perfectly.

Using FoxyProxy to configure proxy settings

I reset all the about:config settings back to their defaults. Then I installed FoxyProxy Standard v2.14. I went to FoxyProxy's options and, under the "Proxies" tab, created a new proxy entry whch I named "SSH SOCKS5". I set it to connect to "localhost" on port 8080. As well, I check-marked the "SOCKS proxy?" box and selected "SOCKS v5". I went to the "Global Options" tab and checked the box "Use SOCKS proxy for DNS lookups". To let this take effect, I had to restart Firefox.

When Firefox had restarted, I went to the Tools > FoxyProxy menu and selected to "Use proxy SSH SOCKS5 for all URLS". I checked showmyip.com to ensure that my IP address displayed as coming from the server and not my client. It did show as coming from the server, so Firefox was using the proxy.

I checked about:config:

network.proxy.socks               ""
network.proxy.socks_port          0
network.proxy.socks_remote_dns    false
network.proxy.type                0

The configuration was the same as the default, so apparently FoxyProxy does not adjust about:config to do its work.

Watching the DNS requests via Wireshark, I watched as all the website visits generated DNS requests over the normal network. Complete and thorough DNS leaking. And I would like to emphasize that I had selected "Use SOCKS proxy for DNS lookups", which is FoxyProxy's option to address the DNS leaking issue.

Fixing DNS leaking

There was no question about it, FoxyProxy caused the DNS leaking in my test. I wanted to solve the problem so I fiddled with about:config.

In about:config I manually set network.proxy.type to 1. I verified my IP address was from the server via showmyip.com.

The new about:config:

network.proxy.socks               
network.proxy.socks_port          0
network.proxy.socks_remote_dns    false
network.proxy.type                1

I watched for DNS requests again via Wireshark. I saw none. It seemed that just manually setting network.proxy.type to 1 fixed the FoxyProxy DNS leaking problem.

I also tried other about:config settings, such as manually changing network.proxy.socks_remote_dns to "true", but that didn't work. The above was the only change in about:config that I found that fixed the problem.

Summary

I repeated the results above three times in different orders on different computers on both Linux and Windows to ensure I made no configuration mistakes and to verify that the behavior was consistent and cross-platform. All the tests yielded the same results. Here is the final summary:

  • Firefox v3.5 does not suffer from DNS leaking by itself.
  • DNS leaking occurs when FoxyProxy is managing the proxies.
  • FoxyProxy does not suffer from DNS leaking when network.proxy.type is manually set to 1.

It is obvious that FoxyProxy does not adjust about:config in order to configure proxy settings, but I do not know why. Many Firefox extensions adjust about:config in order to accomplish their goals and I know of no reason they should not. It's possible that FoxyProxy has not had a need to do so before, but in light of this serious problem that may need to change. The quickest/simplest solution for FoxyProxy may to set network.proxy.type to 1 if the currently enabled proxy is SOCKS5 and if the global options for FoxyProxy (or the about:config for Firefox) are set to enable DNS forwarding.

However, although this seems to indicate that FoxyProxy has made a mistake, I don't know that FoxyProxy is the party at fault. Clearly FoxyProxy does not have to alter about:config in order to change the other proxy settings, so why must network.proxy.type be set in about:config in order for DNS forwarding to work? Note that network.proxy.type isn't related to DNS forwarding, it just specifies which type of proxy is enabled. For all I know someone implemented a hack in Firefox that checks about:config when it shouldn't. Of course, I don't know that and I don't know if this is expected behavior from Firefox or not. It could be that FoxyProxy isn't setting whatever hidden configuration for DNS forwarding that exists on the same plane as the other invisible proxy settings it uses. Or maybe FoxyProxy is relying on an unreliable hack in order to avoid changing about:config. I don't know about any of that. What I do know is that Firefox by itself does not have this DNS leaking problem, FoxyProxy does, and a simple solution exists.

Again, I am certainly not the first person to note this problem, but a) I have seen many people blame Firefox for this bug, and b) I have not yet seen anyone else mention the solution that I noted above.

I leave it to someone with more time and knowledge about these software projects to determine which project should have which bug report filed. This needs to be fixed permanently.

Three Tips for the Linux Terminal

  • By Brad Conte, January 11, 2009
  • Post Categories: General Tech

The power of Linux lies in the tools it uses, and the shell is an essential tool. If you spend a lot of time in a terminal, you likely value anything that makes the experience smoother. Here are a couple tips to help make the terminal experience as smooth as possible.

Interact with the X clipboard

Before I discovered xclip, one of the most annoying things about being in a terminal was my lack of access to the X clipboard. Some terminal/shell combinations work well with a standard desktop environment, but "highlight-and-middle-click" a) isn't always feasible, and b) doesn't always work. Thankfully, xclip makes it easier.

xclip can output from and write to the clipboard from the standard input. The "-i" and "-o" arguments tell xclip whether you are inputting or outputting clipboard contents, respectively.

Example:

$ pwd
/some/long/path/you/dont/want/to/retype
$ pwd | xclip -i

You may now "Control-V" paste that path where ever you choose. Another example:

wget `xclip -o`

This will download the file from the URL that is in the clipboard. I have found that "pasting" the contents of xclip into the shell using backticks (aka, `) a very convenient work-flow. The result of `xclip -o` is basically having the code right were you would have pasted it in the shell.

You can use xclip to read from and write to the different X clipboards, which allows you to interact with the clipboards for pasting via middle-click or Control-V. The "middle-click" clipboard is selected with the arguments -selection primary and the "Control-V" clipboard is selected with the arguments -selection clipboard.

I might suggest aliasing "xclip" to xclip -clipboard X, where X is your preferred clipboard to operate on.

Use "head" and "tail" more powerfully

You probably know how to use "head" to extract the first N lines from a file and "tail" to extract the last N lines of a file. While this is useful, it's often just as useful to extract the complement of those selections, namely, everything except the first N lines or everything except the last N lines.

The head and tail utilities are powerful enough to accommodate those needs. head allows you to extract either a finite quantity of text from the top or everything but a finite quantity of text from the bottom, and tail allows you to extract a finite quantity of text from the bottom or everything but a finite quantity of text from the top.

  • head -n N - by default outputs the first N lines (equivalent to using +N). Using -N outputs all but the last N lines.
  • tail -n N - by default outputs the last N lines (equivalent to using -N). Using +N outputs lines starting on the Nth.

Example:

tmp $ cat example
1
2
3
4
5
/tmp $ head -n 2 example
1
2
/tmp $ tail -n 2 example
4
5
/tmp $ head -n -2 example
1
2
3
/tmp $ tail -n +3 example
3
4
5

Note that tail's complement mode requires you to specify the first line number to include in the output, so if you want all but the top N lines actually specify the argument N+1.

Change directories with the directory stack

The Bash shell (as well as others, like Zsh) have a built-in "back" feature for navigating directories. The built-in pushd command will put your current working path at the top of a shell-maintained stack and allow you to change to another directory. To go back to the saved path you can use popd. This one is more commonly known than the first two, but worth including because it's so incredibly convenient.

Example:

$ pwd
/some/long/path/you/dont/want/to/retype
$ pushd /some/other/path/
/some/long/path/you/dont/want/to/retype
$ pwd
/some/other/path/
[...]
$ popd
/some/big/long/directory/you/dont/want/to/retype
$ pwd
/some/long/path/you/dont/want/to/retype

Since "pushd" stores the directories in a stack, you can push multiple directories onto the stack and later pop them off in the reverse order you pushed them. It's basically the standard "cd" only with a "back" feature. Speaking of which, the command cd - will always take you back to the previous directory you were in.

Bash Path Hashing

  • By Brad Conte, September 5, 2008
  • Post Categories: General Tech

Bash is a Unix shell of many features. One of those features, which is both useful and potentially annoying, is the path hash table.

The environment variable we are probably most familiar with is PATH -- it contains all the paths that should be searched when we specify an executable to run. Bash starts with the first path specified in PATH and looks to see if there is an executable present, if there is not it checks the second directory, and so on until it either finds an executable with the specified name or it runs out of paths to search.

The path of an executable practically never changes. "ls" is always in the same place, as is "cat", and every other executable you might have. Thus, to save time, the first time Bash executes an executable it saves the path it finds the executable in, this way Bash can avoid searching for the executable again. The first time you execute "ls", Bash has to search for it. The second through bajillionth times you execute it, Bash will just look up the full path from the table -- much more efficient. It is important to note that every instance of bash maintains its own path hash table -- thus if you have multiple terminals open they will not share path tables.

However, if the actual desired path of an executable changes, problems can arise. By default, Bash will not look for an executable again. If an executable changes paths, Bash will not find it. If an executable is duplicated to a new path, Bash will execute the original executable even if the new one is in a path of higher priority (namely, listed first in PATH). As can be imagined, this can cause quite a headache in these rare scenarios if one is unfamiliar with Bash path hashing. If you create a script, execute it, then edit it and copy it somewhere else (say from your personal "bin" to a global "bin"), the old version will still execute. If you simply move an executable between paths, bash will not find the new executable despite the fact that it seemingly should. Observe the following example, in which I create "myscript" in my home bin, run it, copy it to a global bin, run "myscript" again, and execute the one in my home bin again.

Example of Bash storing the full path of a script and not recognizing a new one.
(This image can serve as a stand-alone explanation -- feel free to redistribute it.)

Thankfully the hash table is easily manipulated -- by the built-in shell command "hash" no less. There are four switches that are likely of the most interest to you:

  • -d [name] will delete a specific entry from the table
  • -r will remove all entries from the table
  • -t [name] will list the save path for the given executable name
  • -l will list all existing executable -> path entries in the table

Using this tool you can view existing hashes and remove old stale ones as you see fit. To solve an out-of-date path hash problem, simple removing it will suffice. The next time the command is run the path hash will be re-updated. For more details, I refer you to the following key excerpts from the Bash man page:

       If the name is neither a shell function nor a builtin, and contains  no
       slashes,  bash  searches  each element of the PATH for a directory con-
       taining an executable file by that name.  Bash uses  a  hash  table  to
       remember  the  full pathnames of executable files (see hash under SHELL
       BUILTIN COMMANDS below).  A full search of the directories in  PATH  is
       performed  only  if the command is not found in the hash table.  If the
       search is unsuccessful, the shell prints an error message  and  returns
       an exit status of 127.

[...]

       hash [-lr] [-p filename] [-dt] [name]
              For  each  name, the full file name of the command is determined
              by searching the directories in $PATH and remembered.  If the -p
              option is supplied, no path search is performed, and filename is
              used as the full file name of the command.  The -r option causes
              the  shell  to  forget  all remembered locations.  The -d option
              causes the shell to forget the remembered location of each name.
              If  the  -t  option is supplied, the full pathname to which each
              name corresponds is printed.  If  multiple  name  arguments  are
              supplied  with  -t,  the  name is printed before the hashed full
              pathname.  The -l option causes output to be displayed in a for-
              mat  that may be reused as input.  If no arguments are given, or
              if only -l is supplied, information about remembered commands is
              printed.   The  return status is true unless a name is not found
              or an invalid option is supplied.

How to Review a Linux Distribution

The GNU/Linux project has begot the most versatile operating system environment ever. It's free, open source and designed to be modified and built upon by its users. The heart of the Linux philosophy is the concept of choice. Linux is about providing its users with the freedom to choose what they want to do and how they want to do it. Because both Linux and the software developed for it is so versatile, there exist hundreds of Linux distributions, each one being nothing more than a different collection of software and configurations running on top of the Linux kernel.

Because of the number of Linux distributions and the degree to which their styles/configurations can vary, a given user usually has the ability to choose a distribution closely suited to their specific needs. To aid the ever-prevalent distro-seeker, many Linux users have sought to provide written reviews of various Linux distros. Unfortunately, many reviews fall short. With the increasing population of Linux users, and the increasing number of Linux distributions, there has been a notable increase in unhelpful reviews.

Which leads to the point of this article, namely, how to review a Linux distro. The idea of reviewing a Linux distro is to inform the reader as to how the specific Linux distro in question is different from the other Linux distros. To do this, a reviewer must provide an analysis of the software and configurations that set the disto in question apart from other distros, putting solid emphasis on the big differences, less emphasis on the minor differences, and no attention to the things that are inherent in all Linux distributions. (If you are writing to inform the writer about Linux, you should do that instead, rather than pretend you are reviewing any specific distro.) This basic principle holds for almost any review of any product. For example, movie reviews tell you why one movie, a four-star action flick, is different than another movie, a two-star romantic comedy. The movie reviewer doesn't spend time explaining the concepts of cut-scenes and character development, it merely explains the aspects of a movie that make it unique.

This article is aimed at the potential Linux distro review writer. It is split into two main sections: things one should review, and things one should not review. Most of the points have sample topics for that point, some of the points and sample topics are more advanced/technical, some are less so. No good review must cover every point and sample topic in this article, this is just a broad list of good ideas. Suit your information for the knowledge level of your target audience.

Good Things to Review

Here's a list of things that make for a good review. This is not an exhaustive list, but it covers a lot of ground.

  • Philosophy

    Every Linux distro has a philosophy, namely, a statement of purpose and a set of guidelines for development that support that purpose. "Linux for human beings," "KISS," "Free, robust, stable," "Design for control," are some philosophies that describe various distros. Anyone using a distro with one of those philosophies would instantly recognize it.

    This topic is not optional, you must state the distro's philosophy and exemplify it throughout your review. Consequentially, it is probably the best place to start your distro review, as what you write thereafter should exemplify the philosophy. Remember, your goal is to show how the distro is unique, and it is the philosophy that dictates why the distro unique. All criticisms or negative points against the distro should be held in light of the distro's philosophy. Is the distro complicated? That might be worth mentioning, but if the distro strives to put user control first and simplicity is of no concern then complexity is not a short-coming of the distro but rather a side-effect of its philosophy. Rather than criticizing a distro for being too complex, instead note that it is designed for experts. Many reviews fail to evaluate a distro for what it is and criticize a distro for lacking things it does not strive to have.

    As boring as it may seem, a quick history of the foundations of the distro might prove very informative. Understanding why a distro was created in the first place can say a lot about what its like today and where it might be tomorrow.

    Sample topics:

    • Who is the distro's intended audience? The newbie, the guru, the average home user, the developer, the server administrator?
    • Does the distro tend to value stability or bleeding edge software? (See later points.)
    • Is the distro designed for users who like to control everything?
    • What is the history of the development of the distro? Why was it created?
    • How robust is a base level install of the distro? (More on this later.)

  • The parent distro

    If the distro in question is itself based on another distro (many are) you should mention the parent distro and how this spin-off/child distro differs from its parent. Some parent-child relationships are distant, some are close. Some spin-off distros maintain their own package repositories and make their own configuration decisions, some are nothing more than the parent distro with a different default window manager and a few extra pre-installed packages. Telling the reader what the parent distro is and how closely the child is related to it is extremely informative for those either familiar with the parent or who will seek to become familiar with the parent.

  • The package manager

    The package manager is the heart of a Linux distro. It is how the user most intimately adds to and subtracts from his system. The package manager is arguably the most important and unique component of a distro. Even though most package managers offer the same rudimentary functions, you should review the package manager. And the word "review" does not imply that you should confirm that there are indeed commands for installing and removing packages. You need to provide enough information that it's clear how the package manager feels to the user. Explain what sets the package manager apart from other package managers. Different distributions make it easier/harder to eliminate orphaned packages or check dependency information, and many have different ideas of what security precautions to take.

    Sample topics:

    • Does the package manager have a GUI?
    • How easy is it to upgrade all the packages in the system?
    • Does it provide clear output informing the user what its doing?
    • What level of security is offered? Package authenticity verification? SSL downloads?
    • Does it ever require that the user perform manual tasks to complete a package installation/integration into the system? (Ie, removing certain configuration files or backups.)
    • How is the package database stored? How easily human-readable is it? How easily backup-able is it?
    • How many tools are a part of the package manager? How many are necessary for day-to-day use?
    • How easy is it to perform a system upgrade?

  • Support/Documentation

    The level of support and/or documentation available for a distro will likely be one of the most significant influences in a user's experience with a distro.

    The interactive support offered by a forum (and the archived support offered by its search function) can be priceless as a user gets started with a distro and has questions as they encounter Linux phenomena they are not familiar with. A wiki can help the user through the install and/or basic system setup, setup common software, configure it the way they want, and fix common bugs. The absence of a wiki can mean many hours of Googling for decentralized information and exasperation, leading to unfixed problems and a poorer user experience. A documentation tree (somewhat rare outside of behemoth distros) can help a user dig deeply into the distro and modify/fix it if they are so inclined.

    Sample topics:

    • Does the distro have an official forum? How large is it?
    • Does the distro have a wiki? How expansive is it? Does it cover the configuration of sudo? What are some of the more obscure articles it has?
    • Does the distro have developer-written/contributed documentation?

  • The Packages

    While the package manager controls the addition/removal of packages to/from the system, it might be silly to overlook reviewing the packages themselves. The packages will be composed of compressed archives, arrainged with file hierarchy and package metadata that handles dependency checking, integrity verification, and the like.

    It is arguable that the layout of the filesystem be included in this topic. Many distros have their own ideas about how the filesystem should be laid out. If the distro has any peculiarities or quirks to how it uses the filesystem tree, its usually related to the compilation destination of packages.

    Sample topics:

    • What type of packaging standard does the system use? How easy is it for a user to create their own packages? (Some users like to assemble their own software and use the package manager to mange their installation. See point on compiling from source below.)
    • Is package downgrading supported?
    • How quickly are packages updated when their official upstream versions are updated? (Don't be to picky here: Are we talking about weeks or months?)
    • Continuing from the Philosophy point and the above sample topic, does the distro lean towards being bleeding edge or being stable? (This has been mentioned twice. Hint.)
    • Do the developers do extensive modification of the original packages (security patches, extra functionality), or are they provided vanilla as-is? (For example, Debian developers tend to modify packages as they see fit, Arch Linux developers only apply very common patches occasionally.)
    • How are the packages compiled by default? (For example, with or without debug information?)
    • How is Gnome/KDE split up? How many packages compose a full Gnome/KDE suit? (For example: Is Kolour Paint an individual package, or only available as part of the larger KDE Graphics package?)
    • Are there any filesystem layout quirks, additions, or reservations? (For example, is /usr/local ever used by packages or is it reserved for the user? Is Firefox installed into /var?)

  • Support for compiling from source

    The package manager's job is (usually) only to manage pre-compiled packages. But some users like to manually compile some packages from source, such as to apply custom patches to a program or to get speed by optimizing the program for their specific machine. Many users don't compile much (if any) from source, but the Linux philosophy is deeply rooted in the distribution and manual compilation of source code, and there are many who like/need to do so. How an operating system lets the user compile/managed compiled programs is important to how the operating system works. Those who rely on it will find their experience very dependent on how manual compilation is handled.

    If there exist any distro-specific methods for handling the compilation of both supported (packages in the repositories) and/or un-supported (packages not in the repositories) software, it should definitely be included in a review. If the distro has any form of a ports-like system, a review without mention of it is by default incomplete.

    And do not ever, ever waste space explaining that you can do "./configure && make && make install", that you can use "checkinstall", or that you can use the "install" program itself -- every distro has those, they aren't worth mentioning in the slightest.

    Sample topics:

    • Do there exist any specific tools for the system that aid the user in compiling programs from source?
    • How are users supposed to recompile the kernel? Is initramfs or initrd used?
    • If a user does manually compile a program/package, how easily can it be managed by the package manager? (This has been mentioned twice. Hint.)
    • How is the presence of different kernel versions handled?
    • Is there risk of having the manually-compiled program conflict with the packages managed by the package manager, or is there a way to keep manually installed non-package programs separate?

  • Release system

    Distros have different styles of updating the end-users installed distro to be the most current version. Some distros have a new, fully-updated version released every 6 months. Some just release all package upgrades on-the-fly and have no concept of different versions. The way the distro handles version releases is significant to the end-user, as usually full version upgrades mean that the developers reserve the right to move stuff around and/or reconfigure things in a noticeable way if they want to, and push those changes through in the next major release. It also means that developers reserve the right to stop providing package upgrades for a given version, resulting in obligatory periodic upgrades.

    Sample topics:

    • What factors decide the version releases (if there are any)?
    • How big are the changes between releases?
    • How often are the releases?
    • How long is there support for old releases?
    • Does evidence seem to suggest that performing a version upgrade is smooth, or is the user better off performing a clean install of the new version?

  • Startup style

    Summarizing how a distro manages startup scripts/daemons is easy: All you have to do is say whether the distro uses System V or BSD-style init scripts. (Users unfamiliar with those can look them up.)

    How the operating system uses run-levels is a more complex subject, yet still an interesting on. Fewer users will care about this level of detail but more advanced users might be able to intuit more about the nature of a distro from it.

  • Robustness of a default install

    Distros vary in what software they provide for a from-disk installation. Some have more, some have less. Some have roughly equal quantities but the theme of the offered software is completely different.

    After you install the distro, how much work must be done to get the system up and running with a graphical desktop and a decent collection of tools? What kind of packages are installed, and what aren't?

    Sample topics:

    • Is a default display manager installed or offered? (Which one?)
    • Does the distro rely heavily on sources from the servers, or can you obtain multiple CD/DVDs for a more robust from-disk install?
    • Does the default install include tools for compiling? (GCC, make, etc.)
    • Does the default install offer Apache for install?
    • What are some of the most specific/obscure programs install/offered by the installer? (For example, if a distro provides Compiz and OpenOffice.org straight from disk, that implies the sort of other packages that are installed.)

The above topics cover a lot of ground. Obviously not every detail need be included in a good review, but many of these points are very critical to how the distro works and failure to address a significant number of them will result in an incomplete review of the distro. If there seem to be too many points to address, remember that you're writing a technical review, not a grand work of fiction. Wordiness can usually be eliminated. If you know what you're talking about and put some thought into what you write, you can cram a lot of information into three sentences.

One key to remember is that that not all of the information may seem important to you, the author of the review, but it is what separates one distribution from another. Obviously the information has to be appropriate for the audience you are writing for; that is a decision faced by every writer.

If you feel unable to address many of these topics in your review then you probably not as familiar with the distro as you should be in order to do a good review of it. Many people want to pop in a CD, install the distro, play with it for 30 minutes, and then write a review about it. Unless you did your research beforehand and you made very good use of those 30 minutes, you're probably not qualified to offer a review. Spend some time learning more about the distro.

Bad Things to Review

Now a list of things you should not review.

  • Plug-n-play drivers

    Do not ever review a distro by saying "I plugged my external wireless card X and it did(n't) work". This is the lamest review you can provide, and is my pet peeve to boot. First of all, almost all distros use the same drivers. Support for hardware is pretty much universal, the same drivers that manage your hardware in one distro probably manage it in most other distros, and this only becomes more true with time. Comparing hardware support between distros is basically pointless. It's even more pointless when your analysis consists of just one generic external wireless card. If it didn't work, odds are you didn't set something up correctly.

    Note that it is obviously acceptable to review the quantity of drivers provided by a distro if they seem to have a noticeable shortage/abundance of them, but most distros will have about the same drivers. Don't mention driver quantity unless its very noticeable.

  • Five minute quirks

    At various times, some distros have what you might call "five minute quirks". These are one or two little problems that need to be ironed out of a fresh install of the system and are easily solvable by a user with the skills that the distro is oriented towards. Sometimes an environment variable needs to be set, sometimes a file needs to be configured, etc. These are bugs not unique to anyone's specific install of the distro, but rather a very large (if not the entire) user base. One of the developers mis-configured something before release, or some default setting doesn't work with the majority of hardware, or something like that.

    Do not write about these little five minute hiccups in your review, they don't count. They only effect you for five minutes and will likely be fixed in one of the upcoming releases. (See the next point.) Usually the distro's forum and/or wiki has an easily accessible guide to tell you how to fix the quirks and after a day of usage, the user will not remember it. At absolute most you should say that "minor quirks exist" and say where/how the site addresses these issues. This could be part of your review of the distro's documentation. Any quirks you fix within five minutes of attempting to do so should not be mentioned. Their fix in the next release will out-date your review.

    What's more, do not write about the stability of a distro based on its five-minute hiccups. Nothing says "I didn't do any actual research about this distro" like a stability evaluation based on the initial five-minute quirks the distro had as of some arbitrary date.

  • Unrelated/temporary bugs

    Do not write about temporary bugs. If the bug is the type that will be fixed within four months, don't use it as basis for your evaluation of the distro's stability. This point bears repeating: Whatever you do, do not base your assessment of a distro's stability on little quirks that will be gone in a couple months. If your readers wanted to read an bug listing, they'd browse bugzilla.org.

    On that note, do not assess the distro, in any way, based on bugs that are not specific to the distro itself. If the developers of a software package X make a mistake, don't blame the developers of the Linux distro in question, it's outside of their control, not their fault, and not their responsibility. It's irrelevant to your review, so ignore it. If distro X has a bug but distro Y does not, it is possibly that distro Y did not push the new version of the software and thus does not have the buggy version -- you can use this to exemplify how "bleeding edge" the distro is. Or its possible that distro Y manually patched the software -- you can use this to exemplify how the developers do (or don't) manually patch packages.

    You can obviously cite the presence of bugs in your analysis of the stability operating system, but don't focus on petty bugs. Instead, focus on the release/testing cycle of packages, the speed of security fixes, and frequency of distro-specific bugs. If this means that you actually have to spend some time using a distro and learning about it before you evaluate its stability, cry me a river. If you aren't willing to do that, don't comment on it's stability because you don't know anything useful about it.

  • Artwork

    No one cares what the default desktop color theme of the distro is. Or the default desktop wallpaper. Or whatever. This is my other distro-review pet peeve. As shocking as this may sound, every desktop environment and window manager in the entire universe, throughout every civilization, all of time, and every dimension, can have its color theme changed. If a user doesn't like the default wallpaper, they will change it. If they don't like the default Gnome color theme, they will change it. Defaults are irrelevant.

    Don't mention the quantity of community artwork, in the sense of backgrounds and color themes. This is directly proportional to the number of users the distro has and thus redundant information. And, realistically, nobody makes a distro decision based on it. And many users routinely become attached to a certain color theme and will take it with them from distro to distro.

  • Installation Process

    This is not a terribly important point, but I think that reviewing the installation process of a distro is almost pointless. Most installers, even text-based ones, guide the user through the installation very clearly. Sure a text interface may look a bit daunting to some, but if one just reads the instructions, one can usually just click-through (or, tab-enter-through) the installation with relative ease.

    This item might warrant a quick mention, perhaps tying into the distro's philosophy (ie, in analyzing whether the distro is oriented towards casual or power users), but unless the installation process is exceptionally complicated or buggy don't devote much space to it. It's a one-time experience by the user, and a determined user won't change their opinion on whether to install it based on the installation graphics. Mention if it's GUI or text based, and move on.

Remember, your goal, as a reviewer, is simple: Review things from the perspective of the average user you are writing for. Don't go into pain-staking detail unless you're aiming to provide an in-depth guide, minor details/quirks are often best left for their own special article and are of interest to few readers. Don't waste time on things that were (likely) very unique to your experience, because almost no one else cares.

Even though this article was somewhat long, your distro review doesn't have to be. A lot of useful information can be packed into one paragraph.

The Practicality of Port Knocking

  • By Brad Conte, May 29, 2008
  • Post Categories: Security

Port knocking is one of those server security topics that seems to come up every now and then, and when it does it always sparks a bit of debate over matters of practicality.

The idea behind port knocking is simple: The administrator of a server sets up a server with an Internet-accessible service. The administrator then closes all the ports on the server, including the ports that the service uses, and starts a daemon that monitors all incoming packets to the server. The daemon then opens the port corresponding to the service when and only when the daemon receives a series of packets on a specific set of ports in the correct order. The packets can be any type -- TCP SYNs, TCP ACKs, UDPs, whatever -- but they must be sent to the server on the correct ports and in the correct order to cause the daemon to open a service's port. Thus a port remains closed until someone "knocks" on the correct ports to cause the daemon to temporarily open it.

(Example: If the required sequence is TCP ACK packets on ports 333, 4444, and 55555, then the sequence of ACKs "27, 333, 4444, 55555, 42" would open the port, whereas the sequence "333, 27, 4444, 55555, 42" would not. I would tend to use TCP ACK packages, because they look the most boring to someone sniffing a network -- more below.)

Thus port knocking adds the security equivalent of another "password" to a service, because a client has to successfully knock on a server's ports in the correct order in order to open a port for business. Given 2^16 possible ports, about 4 possible protocols, and (usually) about 4 necessary correct port guesses, standard port knocking comes out to the equivalent of a ((2^16 * 4) ^ 4) = 2^72 bit key. This isn't too shabby a number, in terms of key size, especially if port knocking is just an extra layer of security for a service.

Thus begins the debate on the security practicality of using port knocking.

The main advantage of port knocking is that it conceals the very existence of a service until the port knock sequence is complete. An attacker cannot attack a service he cannot find, and until he finds out how to properly knock, every port on the host will appear closed to him. Thus port knocking is very helpful in situations where a server administrator wishes to conceal the very existence of a service. Service concealment was the primary motivation behind the development of port knocking.

Another advantage of port knocking is that it allows a port, when it is opened, to only be open for connection from a specific IP address. The port knocking daemon monitors all incoming packets and when it detects the correct "knock" it can not only open up a port but can open a port that accepts packets from the knocking IP only. Thus a WAN-side service can be told to accept connections only from certain IPs, but those IPs can be decided in real time.

Unfortunately, however, port knocking is a very fragile security policy. Since the knocking packets must always be sent in the clear it has the equivalent security of a password sent in cleartext. Anyone sniffing your network on either the client or server (or anyone who can trick the client into sending the knock sequence to a spoofed server, ie via DNS poisoning) end can tell what "knock" is used and replay the packets, effectively negating the security of port knocking. The good news is that unless an attacker is actively looking for a port knock sequence, the knock will look like normal boring network traffic. But if a sniffer is on the lookout for a knock sequence -- especially if they know which server it is destined to -- it's impossible to slip the knock past him.

Also, because port knocking requires the equivalent of a symmetric key problem, port knocking does not scale well to services that must handle many individual connections. The pork knocking "key" must be distributed securely to everyone who needs access to the server's service. Any cryptographer and/or security auditor will tell you that the symmetric key distribution infrastructure for this sort of thing is both annoying and brittle -- the larger the infrastructure to be maintained, the larger the hassle and the larger the potential for single point failure.

Obviously, scalability does not matter to the individual wishing to SSH into his home computer, but it is of major concern to anyone operating a server with more than four or five users, especially if users must be added, revoked, and have their "knocking keys" rotated. This is a classic (annoying) symmetric key distribution problem.

Port knocking is also impractical for popular/busy servers because the popularity of a server contradicts the very goal of port knocking: to conceal the very existence of a service on a server. If an attacker is aware of the nature of his target and knows (or at least has an idea of) what services the server has, he will not be satisfied if his port scan turns up empty. If he expects certain services to exist on the server, and if he is in any way persistent, he will investigate the server until he finds the services he knows exist, thus he will investigate the potential use of port knocking sooner or later. This does not result in instantaneous defeat of course, but the port knock is only a secure as a password sent in cleartext, which is a bad security measure to have to fall back on. The exact situation will dictate how easy the knock will be to sniff.

This quick assessment should make it obvious that the only practical use for port knocking is on small servers. Realistically, the service itself should be secure enough in both configuration and implementation to not require the additional security of port knocking -- it is an Internet service, after all -- but port knocking does add yet another level of plausible security, and the paranoid never underestimate defense in depth.

All things considered, port knocking is not too useful, despite being a fun idea. The only place I've actually observed it in use was by people at DefCon looking for any way to add any level of security to their home computers. But, obviously, those are the "I will because I can" types. Plus, when you're at DefCon you use anything to secure your home server that you can get. But outside of the experimental world, port knocking is only an interesting notion, it never sees wide-scale usage for a reason.

Note that Port knocking is not the same thing as Single Packet Authentication. Port knocking was the initial attempt to gain security by service invisibility. SPA is the more secure successor to port knocking, developed to address key problems in port knocking.

My Sweatshirt

In just four days of wearing my new favorite sweatshirt on campus for the first time I got four separate complements on it. Three people asked me where they could buy it. Since then I've received dozens of compliments, mostly from strangers who, although they don't know me, know an elegant math expression when they see it.

Unfortunately, this sweatshirt is not explicitly for sale anywhere. I had it custom made because I couldn't find anything like it.

My Euler's Identity Sweatshirt

Summary

I love my sweatshirt, I'd sell it on a store like CafePress if I could find one that sells black sweatshirts. Since I can't, the next best thing is to provide the instructions to get one.

The image: You can get the image I used for the screen-printing here. However, the company I used for the screen-printing required you to upload an inverted version of the image (at least as of the last time I used their services), which you can download here. If one doesn't work try the other.

The company: The company I used to screen-print the image is Blue Cotton. I chose the Hanes Pullover sweatshirt (Printing -> Sweats -> Hooded Sweatshirt -> F170 Hanes Pullover Hood), but they do offer other sweatshirts. The Hanes sweatshirt has the advantage of having the lowest polyester content (90% cotton) of any of their sweatshirts, which is a good thing for a screen print and has the side benefit of being very comfortable.

The image sizing: For my sweatshirt, an extra-large, I chose to make the image 9.5 inches wide (be sure you lock the width/height ratio when scaling the image on the sweatshirt) and centered it 2 inches below the collar (you'll have to eyeball this measurement). It took a long time for me to decide on those dimensions, but I finally did and I think they're perfect, which is saying something. You might want to scale the width for sweatshirts of different sizes, my rough guess would be to subtract/add .75 to 1 inches per size smaller/larger.

So all you have to do is go to the website, select your sweatshirt, upload the image, set the width/placement, and order it.

Background for the sweatshirt

I'm a math major, and I like elegance. Euler's Identity is my favorite mathematical expression: it's a simple expression of universal truth via constants. It's fascinating how the five most fundamental constants of math are all uniteable in a single, simple expression.

In addition to math, I like t-shirts. I tend to treat t-shirts as billboards for things that I like -- you won't find a blank shirt in my drawer. As need had it, in winter of '07/'08 I found myself in need of a new sweatshirt. Like the rest of my upper-body apparel, I wanted it to say something interesting. I wanted the sweatshirt to be "nice-ish" so that I could wear it to "nice-ish" events, so I didn't want a busy/complicated design. it had to be simple, yet elegant. Settling on a simple expression of Euler's Identity was easy. My second stipulation was that the sweatshirt had to be black.

I started my search at the obvious place for a weird request like that: CafePress. I found a couple of sweatshirts with Euler's Identity, but a) They had obnoxious brand logos, and b) Cafepress doesn't sell black sweatshirts. After an extensive search of the Internet, it became obvious that no one offered anything like what I was looking for (shocking, considering the obvious demand). So I decided to have the shirt custom screen-printed.

The design of the sweatshirt

Thus I was faced with two tasks. The first task was creating the image I wanted to be screened onto the sweatshirt. The second task was finding someone to screen-print the image. Neither task turned out to be as simple as it might seem.

Creating the image was difficult because the only decoration for the black sweatshirt was going to be the bold, white text of Euler's Identity. Thus the styling of the text was very important. In order to truly do the equation justice, and make the sweatshirt look as simply elegant as possible, the font had to be perfect. Not fancy, but not boring. Just slightly elegant.

Getting the image screen-printed turned out to be difficult because there were two major conditions for the screen-printing: I had to be able to have a transparent background, and I was not going to order in bulk. Most custom screening companies violated the former of these two requirements, and the few that allowed background colors required bulk orders. In addition, some companies didn't offer sweatshirts in black.

The final solution

Finally, however, a solution was reached. I designed the image using the "i" and "pi" symbols generated by a Linux-based LaTeX front-end called eqe. I generated the rest of the equation using characters from the Tahoma font in the Linux-based image editor KolourPaint. I needed a very large size image to achieve a sufficient DPI, which was recommended to be at least 90. Both programs allowed me to create the images at a large size with excellent quality, but not a size quite large enough. So I used Gimp to expand the size and I used its anti-aliasing feature to keep the font's smooth despite being enlarged. The result is shown here. (This is a scaled-down version of the image, for bandwidth reasons. Click the image to download the full-size image, the one you would use for actual screen-printing.)

Euler's Identity
View the inverted version.

I also found a company, called Blue Cotton that allows one to screen-print an image, choose a color to make transparent, and doesn't require a bulk order. (Choose Printing -> Sweats -> Hooded Sweatshirt -> F170 Hanes Pullover Hood.) Thrilled to have a working image and a working company, all I had to do was manually place the image on the sweatshirt. I spent (no joke) probably 3 hours total moving the image around on the shirt, higher, lower, bigger, smaller, trying to get it just the right size and at just the right height. Finally I decided on a 9.5 inch width at 2 inches below the collar, which I am pleased to say is the perfect size/placement (for an XL).

Close-up picture of the screen-printed text on the sweatshirt.

Also, kudos go out to my girlfriend for her extreme patience with me. The sweatshirt was a Christmas present from her to me. Being a nit-picky perfectionist, it was decided that I had actually best do the bulk of the designing. I didn't even finish the design until the end of January. Giving it to me was an excellent idea, and it's one of my all-time favorite articles of clothing.

Cryptography Links

  • By Brad Conte, October 27, 2007
  • Post Categories: Security

This is a list of resources on cryptography knowledge that I've compiled. The goal of this list is to cover the fundamental spectrum of cryptography and to touch on the higher mathematical end.

Index of Links

Pre-Cryptography Concepts

Fundamental Cryptography Concepts

Simple Encryption Algorithms

Encryption Algorithms

One-Way Hashes / Checksums

Cryptanalysis

Online Resources

Encryption Programs

  • TrueCrypt -- Symetric key, disk/virtual disk encryption.
  • GPG -- Public key, multiple encryption options.
  • PGP -- Public key, multiple encryption options.
  • AxCrypt -- Symmetric key, individual file encryption.
  • DriveCrypt -- Symmetric key, whole disk encryption.
  • dsCrypt -- Symmetric key, individual file encryption (stand-alone EXE).
  • Snake Oil Encryption Software -- This isn't an encryption program, but it's a good article on how to evaluate encryption software.

Encryption Libraries

  • Crypto++ -- A C++ library under a custom, permissive license.
  • PolarSSL -- A C library under the GNU GPL license.
  • OpenSSL -- A C++ library under an Apache-style license.
  • Brian Gladman -- C source code for AES, SHA, and HMAC.

Cryptographer Resources

Historical Background

Cryptography Conferences:


Page info:

  • List Started: October 27, 2007
  • Last Content Update: November 15, 2009 - Removed stale links, added more links, reorganized some existing links.

Becoming a "Hacker"

  • By Brad Conte, August 24, 2007
  • Post Categories: Security

Introduction

That word "hacker" carries with it a lot of baggage. The word intrigues teenagers, scares politicians, and causes computer geeks to endlessly debate its exact definition.

Computers have created an deep, complex, intertwined world that few people are truly familiar with. A world in which most of its users can't even begin to comprehend the complexity or functionality of it. Even people who would be labeled as relatively computer literate, in all likelihood, don't really know much about how their computer works. Average Joe hasn't really a clue what his computer actually does or how it actually works beyond the point-and-click GUI he sees. And that's fine, because thousands of man-hours have gone into the design of everything computers do and it's unreasonable to expect Average Joe to understand a large portion of it.

But hackers make a point of knowing what happens inside a computer. What's more, they try to manipulate what happens. The combination of their knowledge and skill sets scares a lot of people (and excites others). Hackers have a knowledge base that many others do not. And they have a skill set that many others do not.

Odds are you're reading this because the concept of "hacking" interests you and you want to learn more about it. If you are an aspiring "hacker" (we'll see if you truly are in a bit), this if for you. For others, this will give an introspective into the mind of the hacker. I assume that the "aspiring hacker" is somewhat familiar with computers but hasn't really touched on hacking and computer security in any depth. My goal here is to provide a first introduction to what you'll need to do to pursue "hacking". The most daunting part of any task is usually just finding a place to start, and giving you that start is my goal.

However, my goal is not to teach you how to hack; it is just to tell you were to start. First I'm going to explain what the concept of hacking entails, so that you know what you're getting into. Next I'm going to lay forth the learning mentality and efforts you're going to need to expend, namely, what you're going to need to do to become a hacker. Third, I'm going to show you what your goal should be, namely, what it is you should eventually arrive at. Last, I'm going to give you a starting point so that you can go off and actually begin your studies, because that's what you probably want anyway.

I'm going to spend a lot more time elaborating on this topic than need be. A lot of good advice for the aspiring hacker could be condensed in just a couple sentences. But all of that has been said and done before, it's my goal to give the exhaustive explanation that covers every relevant conceptual topic. And since every hacker, more or less, follows a similar path of development, this can also give the non-hacker an insight into the mind of a hacker.

Remember, this is only an article about where and how to start learning, it's not my goal to actually give you your first Hacking 101 lesson.

What You're Getting Into

Hacking requires a lot of knowledge and understanding. If you just want to find one computer program you can download so that you can click a couple buttons and impress your friends by breaking some stupid thing on their computer then you aren't aspiring to be a hacker, you're aspiring to be a script kiddie. And script kiddies are to hackers what construction workers are to structural engineers. Script kiddies are out to, usually, amuse themselves. They act without purpose and rarely have any ethic standards.

First you'll need to understand what it is that hackers do. In the beginning, rules to govern the everyday aspects of computing were created. These rules are for networking, web design, programming, graphics design, etc. They govern everything that computers do and dictate how they do it. Most developers live by these rules.

Hackers, however, have what might be likened to a second layer of rules, a layer of rules built on the first layer. At the risk of using a cheesy analogy, the normal rules of computer are can be described by the line from The Matrix where Morphius tells Neo (in the sparring chamber): "Like any rules, they can be bent, others can be broken." That's fairly true of computing. The normal rules of computing can be manipulated, others can be completely bypassed. The realm of hacking lies on the second layer of rules, rules that are based on the first layer and can manipulate the functionality of the first layer.

Note the lack of the phrase "breaking into computers" in my definition of "hacking". Hacking is about learning, changing, and manipulating. Whether you use those skills to break into someone else's computer is a separate, unrelated issue. The mainstream media uses the buzzword "hacker" too narrowly, the roots of the term are far more broad than what modern mainstream usage implies.

For example, take the ARP networking protocol. It doesn't matter if you know what ARP is or how it works, suffice it to say that it's a networking protocol. ARP is a "first level" set of rules that govern how computers communicate on a network. Every network administrator must be familiar with how it works. However, a hacker knows how ARP works and knows how to use those rules to perform a different task than the goal of the original rules. Using the ARP protocol is standard computer networking. Manipulating ARP to do something you want it to is hacking. ARP is a normal rule, but it is a rule you can build from and it is a rule you can manipulate.

Thus in computer security/hacking it should be obvious that it pays to know the normal operating rules. Everything hackers do is based on the normal rules and unless you understand those rules you won't be a decent hacker. You're going to have to study how things work, why they work, learn how they work successfully under normal conditions, learn how they fail under abnormal conditions, and then figure out how to use them abnormally and make them achieve different goals.

So to begin your hacking endeavors you're going to have to do a lot of reading and a lot of question asking. If you already assumed this then you're in the right frame of mind. If you thought you just needed to find those one or two magic programs that would let you crash Windows boxes by the second day, rethink your plans.

How to Learn

In brief, you're going to need to read. A lot. Everyone's learning style is different and there are a lot of perfectly valid learning methods, but all of them will include a lot of reading.

Let me offer this bit of advice: Do not start by reading RFCs or any other sort of extremely technical specification. (RFCs are technical papers describing many common computing standards.) Some people will give you that advice and, frankly, I don't think it helps very much when you're very new to hacking. Instead, start by reading articles/essays/tutorials that are more than a listing of facts and specifications. Read what one person has to say on a topic, then go read technical documentation if you want to know very specific details about it. RFC's and other technical documentation can be complex, and sometimes downright unhelpful to a beginner. They make create resources to consult later, however.

To start your reading, get a browser that has tabs and makes both searching the Internet and searching a rendered web page easy (such as Firefox). Then find an online article about some security topic that interests you and seems roughly at your level of understanding.

There are two things I would like to emphasize from just that last sentence. First, I would like to stress that you find something roughly on your level. You will kill your ambition if you try to understand and dissect concepts too far over your head. Some stuff is very complicated, don't discourage yourself by trying to take it on before you're ready. You'll get frustrated trying to bench press a 500lb weight on your third day at the gym, it may be out of your league at the moment and just isn't worth your time trying to lift. This isn't to say you should forget about any complex topic you come across, you should just make a note to come back to that topic later when you know more and can assimilate information on that complex topic better. Go spend some serious time researching and learning what you need to in order to understand that topic and come back to it when you're ready. If it takes a day before you're ready, great. If it takes a month, that's fine too. The important thing is that you're learning. You don't get any sort of points for reading the articles you originally decided to, you get points for what you learn.

My second point may seem a bit obvious but I do encourage that you start your research online, as opposed to going out and buying a hacker book or magazine. The newer you are to hacking the less you'll know, and the less you know the more likely you are to find heaps of material about what you're looking for online for free anyway, so it may not be worth buying hard copy material. Plus the less you know the more you'll need to look up as you read. It's easier to look stuff up on the fly if you're on the Internet. Every new golfer likes to go out and buy their first new set of clubs, but that mentality doesn't work as well here. You will likely not need to spend a dime for anything (other than a computer and/or basic accessories) for a long time. Most stuff is available, in some form, for free.

Anyway, find an article on something that looks like material you want to learn. Read it and make note of all the words or terms you don't recognize. As you find them, highlight them, right-click them with your mouse, and search Google for them (one of the nice features of Firefox). All of them. If you have to open 20 new tabs from one article then that's fine, remember, you only get points for what you learn, not for whether you finish the initial article you started to read. Sometimes you'll come across a tutorial, paper, or audio lecture that does nothing but provide you with a long list of topics to go research. You may not be able to make much sense of it for months, but that's OK.

Read on all of those topics you just searched Google for. You don't have to read just one article per topic, if the article you read seemed short and didn't satisfy your curiosity then read another article to make sure you're not missing anything important. If those articles themselves have terms you don't know, look those up too. Make it a habit to instinctively always look up terms or concepts you don't know. The number of open tabs you have can balloon quickly but that's OK, you're out to learn.

A word of sympathy to those who find themselves with dozens of open tabs and dozens of topics that need to be researched: It can get very big very fast and you will sometimes have to call it quits on a topic. Don't be a wimp, though, call it quits only when you've truly hit a wall or are getting into a subject that truly bores you. Even if you don't learn the details about a subject, just learn the vague idea behind it. For example, even if you don't fully understand how ARP poisoning works and how to do it yourself, understand what it allows you to do that way you know what it means the next time you hear the term. You can learn how to perform an ARP poison attack some other time, and when you do learn it will be easier if you already understand what it is. Be flexible with what you learn in depth. If something frustrates or bores you, move on to something else -- there's plenty of interesting topics to study. (If everything bores you, you might not be cut out for it after all.)

Which topics should you be pursuing more than others? This is up to you. There are dozens of topics you can research and you'll have to determine for yourself which ones you spend the bulk of your time pursuing. Almost everyone will recommend that you have at least minimal well-rounded knowledge in most fields but that you find some topic that specifically interests you and you pursue it.

As you read more and more articles, you should come across fewer and fewer terms you don't understand. You will always run across unfamiliar terms -- no one knows everything -- but the quantity of those terms should diminish with time. Even if you don't necessarily remember every acronym you read (there are a lot of them) a quick Google search for one you've forgotten and a quick glance down the results page should spark the "Duh, now I remember that," light bulb.

Remember, never hesitate to use Google (or your search engine of choice), no matter what your question is about. If you're new to hacking and seriously trying to learn it would not be unreasonable for you to be executing 15 Google queries an hour during any given period of research. (This wouldn't be unreasonable for a veteran to do either if they're delving into new territory.) Google isn't a sign of weakness. And it's free. Use it. Often.

There will be some questions you have that Google doesn't answer, or doesn't answer to your satisfaction. When this happens, ask someone who might know. The best way to do that is to post your question to a hacker / computer security forum.

At some point you're going to need to start doing some actual hands-on work. All knowledge and no experience makes Jack a condescending pseudo-guru. If you want to be a hacker you're going to have to actually do something sooner or later. Feel free to experiment on your own computers on your own network. Port scan your desktop various ways from your laptop, try an ARP attack against your laptop from your desktop, etc.

And remember, for the sake of all that is good in this world, don't attack computers you don't own. It's tempting, but don't do anything against other computers, you'd be surprised how easily people can get upset. And I've seen enough hackers more competent than you will likely be who got into legal trouble to last me a life time. It seems tempting to try, but it's not worth it.

The obvious question is, "When do I actually start doing hands-on work?" This is up to you. Some people prefer to research to their heart's content before touching any tools, some prefer to experiment with tools as they learn everything. Do whatever helps you learn the most, but always research a concept at least a little bit before you try it out -- I'm not a big fan of "try it then learn it". Otherwise you won't know what you're doing, you might get confused, and you'll waste your time. And, most importantly, if you're really unlucky you'll break something by accident. If you don't know what your tools are doing and how they work when you use them, you're more of a script kiddie.

When you do start doing hands-on work is when you might want to start reading technical manuals. I assumed that you started very new to hacking, but by the time you're attempting to try things you should be able to read and understand RFCs and similar documentation on the subjects that you're experimenting with. Because by then you're more concerned with the more nitty-gritty details of how stuff works, and then the mumbo jumbo in RFCs and other technical documentation should make sense.

I would also advise that you focus more on technical understanding than on technical memorization. Its important to know how you can use ARP for a man-in-the-middle attack. Its much less important to be able to construct an ARP header frame from memory. If you ever need to do that, you can look it up. If you understand how something works, you can get the precise details when you need them.

Where You're Headed

In your endeavors to learn about hacking, eventually you should hit a point where your tools stop dictating what you do and you start dictating what your tools should do. You should be continuously learning and feeling more in control of what you do, and eventually you should hit a point where you know that "I need a program that explicitly allows me to perform this specific function," and you go to Google and type in a precise seven word query looking for such a program. It may not exist so you have to make hack together something using other tools. Or, better yet, you may find yourself writing the tool yourself.

In other words, eventually the worker is going to have to start buying tools that will build what he wants and stop building what his tools will allow him to. He needs to design a building that he likes, then he needs to go find tools that will allow him to do that.

Some people don't understand this concept and live in an infinite loop of never doing anything that their four favorite tools can't do. Don't let this be you. Govern what your tools do, don't let your tools govern what you do. This isn't to say you can't find some nice, multi-functional tools out there, but don't just download a couple programs with pretty interfaces and stick with just those, you'll hold yourself back.

Usually hackers will eventually learn at least one or two programming languages, so that they can write at least small programs themselves. I would advise you to learn at least some programming. Even if you don't do much with it, learn at least one or two languages semi-fluently. Most security-oriented hacking requires knowledge on the level of C, and learning a scripting language is helpful for automating tasks.

A Starting Point

Enough advice. You need to start doing something. Where do you start? My first bit of advice involved finding those first articles of interest to read and branch out from, but how do you find those articles?

Start with the following concepts and vocabulary words. What I provide is a list to get you started researching computer security/hacking. Remember, don't just read one article on each of these topics and call it quits, use these topics to start your Google research from. Read on these topics and everything related to them, then everything related to those topics, and related to those topics, and so on. This list is just to help you figure out what your first Google queries should be. The items on this list were not chosen to give you a comprehensive grasp of hacking but to rather give you a starting point in the most important fields. If you want to hack, you're going to need to branch out into all the different fields from these starting points.

  • Man in the middle
  • SQL injections
  • Packet sniffing
  • ARP poisoning
  • Buffer overflows
  • SSH
  • Public and private key cryptography (RSA, AES, DES, Blowfish)
  • Cryptographic Hashes (MD5, SHA1, SHA2)
  • Proxies
  • DOS attacks
  • Reverse engineering
  • Worm / virus / trojan
  • Router / hub / switch
  • Networking OSI (TCP, IP, UDP, ICMP, ARP protocols)
  • Port scanning (stateful filters, half-open scans, open/closed/filtered ports)

Hackerthreads.org has "start here" thread with a collection of links to actual articles on topics such as these for newbies. If you're looking for material to read on a topic, or material to read in general, you can start there.

When you want to start actually doing stuff, you're going to need tools. The following are interesting/handy tools of the trade. Remember, don't think that you have to stop learning when you start using tools. You can learn a lot from the tools you use. Visit a tool's homepage and read on what it does. More importantly, if the tool offers a complex function/feature, read articles on how it does what it does. Most tools come with their own tutorials/manual on how to use them that explain what the tool does and provides some descriptive information about why what it does works. Read it.

  • wireshark
  • nmap
  • nessus
  • ettercap
  • nemesis
  • hping
  • dsniff
  • Cain & Abel

For a larger list of useful tools, see the list of tools included in Backtrack 2.0 (a Linux-based security-auditing oriented OS) and the tools included in Arudius (the parent OS of Backtrack).

Unfortunately, once you start using tools reality will kick in and you may have to decide which operating system you're going to use. Before now I've said nothing specific to any specific operating system or software, but unfortunately not all tools work for all operating systems. Most of them can be run on both Linux and Windows, and a lot of Linux programs can be run on Windows with Cygwin (which requires Linux knowledge to use effectively), but the deeper you get into actually doing stuff the more OS-specific some things are going to get. I'm not going to officially endorse one operating system over another because an operating system itself just another tool, but I encourage you to do hands-on research and to select your favorite.

A list of recommended operating systems to try would include one or two basic systems from each major family. Linux: Ubuntu/Debian, Fedora, Arch, FreeBSD, and the popular Windows and Mac operating systems. Many more Linux and BSD distros exist, but if you're new to hacking then odds are that you're not looking for the more complex/powerful ones, that was just a beginners list for those who don't know what to try. (If you know which distro you want to use, you don't need a recommended starting point. Use what you already like.) Feel free to hop around operating systems for a time. When you find one you like, and you know why you like it, stick with it. Practically, your operating system will limit what you can do, I advise that if you decide to stick with Windows, you at least give Linux a serious try. There's a lot to be learned from it, and like all Unix-like systems it will inherently be more hacker-friendly. But the choice is yours. Use what works. Best yet, keep more than one OS around and use multiple OSs that work.

Good luck in your studies. Have fun with them. Be responsible with your knowledge. And when you can, contribute back to the global hacking community that's provided you with all the information, articles, and tools you've been able to utilize.

2Wire's Weakened WEP

  • By Brad Conte, July 25, 2007
  • Post Categories: Security
It's a well established fact by now that the security a 64-bit WEP encryption offers a WiFi network is small, in the same sense that the Pacific Ocean is big. This is especially true as of late as recent months have unveiled multiple attacks against WEP that, figuratively, kicked it while it was down, making the popular personal network encryption scheme even more trivially broken by hackers. Various network configurations, network protocol flaws, and mathematical breakthroughs have chipped away at the effectiveness of WEP encryption to the point where it can reasonably be broken within 20 minutes by a skilled attacker.

However, while everyone is out looking for ways to use replay attacks and better mathematical algorithms to break WEP faster, I happened to note that router manufacturer 2WIRE is making on its own odd effort at rendering WEP less effective.

It's becoming a more standard practice these days for ISPs to enable WEP by default when they install a wireless router for a new Internet customer. And what with the popularity of wardriving and P2P file-sharing lawsuits you can't blame them. WEP isn't the most secure wireless encryption solution but it's the easiest one and it makes the ISPs look like they're doing something. But for as insecure as WEP is by nature, 2WIRE does something that compromises the effectiveness of WEP on their wireless routers in a new way.

When the SBC Yahoo! ISP hooks up a customer's Internet service the customer is provided with a 2WIRE wireless router with 64-bit WEP turned on by default and an SSID that starts with "2WIRE". On the outside of the physical router there's a white label with the router's default WEP key printed on it for the convenience of the customer (and tech support). However, 2WIRE makes a crucial flaw with both the default WEP keys and all WEP keys generated by the router. If you're used to dealing with WEP keys, one quick look at just a couple of default WEP keys for 2WIRE routers should tell you there's something wrong.

The keys have no letters. Just numbers. All 2WIRE WEP keys are composed purely of numbers.

This means that every character in the WEP key uses just 10 of its potential 16 values. If you aren't convinced of how significant that is, then let's see how large the actual keyspace (namely, the effective strength) of a 64-bit WEP key is if you exclude the "letter range" from the hexadecimal key. We're only going to examine the brute-force aspects of this.

(Note: The following will assume familiarity with binary and hexadecimal.)

Start with the full 64 bits of the WEP key (as 2WIRE uses 64-bit, rather than 128-bit, WEP by default). The first 24 bits of the key are merely the IV, which is not an effective part of the key. This was originally done in accordance with the US government's cryptography export regulations which at the time prohibited the export of encryption technology stronger than 40 bits, so this isn't 2WIRE's fault. Thus, by definition, 64-bit WEP only has 40 bits of actual effective keyspace. (Hence the reason 64-bit WEP is often referred to as 40-bit WEP.)

If the entire 40 bit range of keyspace were used, the key would still be relatively small. 40 bits is nothing by todays standards -- no one uses anything less than 128 bits if they're serious about the security of their encrypted data. But 2Wire chips away at even those 40 bits.

The remaining 40 bits are the equivalent of 5 bytes. Each byte is represented by two hexadecimal characters (each hexadecimal character representing four bits of the byte). These 10 hexadecimal characters will compose the final human-readable WEP key. Each hexadecimal character has a range of 16 values (because it is 4 bits in length). However, by restricting a WEP key to be composed of only numbers, each character only has a range of 10 possible values. It only takes log_2 (10) = 3.322 bits to have 10 possible values in binary, thus we only have 3.322 effective bits of key.

Now instead of having 10 * 4 = 40 bits of keyspace, we are left with 10 * 3.322 = 33.22 bits of keyspace. This 17.5% reduction of effective keyspace may not seem that critical, but remember that each bit of keyspace doubles the strength of the key. This is because the strength of the key is expressed as the number of combinations that would be required to successfully guess the key based on its length and value restrictions.

With forty bits of keyspace, we have 2^40 = 1,099,511,627,776 possible combinations. This isn't large in the world of cryptography, but it's annoying enough that likely no one would want to spend a couple weeks breaking it on their computer -- not if the payoff was simply access to a personal network. However, contrast that number of combination with the number of combinations we get from just 33.22 bits of keyspace, namely, 2^33.22 = 10,004,985,324 possible combinations.

Yes, those are both big numbers, but count the commas in them and note that the first is over 100 times larger than the second. Now, assume that a hypothetical attacker wants to launch a brute force attack against a key of 40 bits and a key of 33.22 bits. Further assume that his computer can make 500,000 attempts per second, which is not unreasonable for a home computer. (Remember, if if the attacker has a weak laptop on site he may still have a powerful desktop he can use remotely to do his hard work.) So the hypothetical attacker captures a few encrypted packets from your network then goes to work brute forcing them.

With his assumed computing speed, it would take 25.5 days to brute force the 40 bit key, but only 5.6 hours to brute force the 33.22 bit key. And those are the worst-case scenarios, note that we are assuming that the correct key is the very last one the attacker guesses. Statistically the attacker has a 50% chance of guessing correctly half way through.

Now is that 40 - 33.22 = 6.78 bits of difference in keyspace looking more important? An attacker started needing to devote nearly a full uninterrupted month of computer processing time to the attack and has downgraded to just needing to leave his computer working while he goes out to a movie.

In summary:
  • 2WIRE made a decision to only use numerals in their customers' default 64-bit WEP network setup.
  • In doing so, the necessary time for launching a brute force attack against such a network is decreased from about 3.6 weeks to about 5.5 hours.
  • These networks are easily identifiable via a default SSID that starts with "2WIRE".
  • Despite the advantages this compromise of keyspace gives the attacker, there are still many faster (but more complicated) ways to break a WEP network.
In conclusion, allow me to compare this new ways we now have of attacking a 2WIRE WEP network with the traditional way that works against all WEP networks.

The traditional way:
  • is faster (20 minutes possible break times)
  • requires active attack (attacker must collect packets for a long time)
  • requires more specific hardware/firmware (requires injection mode)
  • is more complicated / less reliable
The 2WIRE-specific brute force way:
  • is longer
  • does not require active attacks, a few packets can be saved and attacked later
  • requires less specific hardware/firmware (only monitor mode required, this is very standard)
There's no question about it, 2WIRE WEP is significantly worse than standard WEP.

However, I would speculate that it's unlikely that this weakness will turn out to be of much, if any, consequence. The thing is, for as bad as 2WIRE WEP is, WEP's inherent weaknesses are worse. Serious attackers will have better ways to get into encrypted networks, so they're unlikely to care about this. The people that would be best off using this tool would be the less-serious attackers, but there exist no automated tools (yet) for launching this specific attack -- and odds are probably decent that there never will exist them. This flaw is too specific and overshadowed by too many greater flaws to receive much attention. Writing the tool, though, would be trivial -- the aircrack-ng suite already contains all the necessary functionality.

This kind of security bungle would make any security engineer cringe. Such a poor configuration would never fly (I hope) at the U.S. DoD or the NSA, but I don't think it impacts Joe Schmoe that much.

SBC Yahoo!/2WIRE got off easy with this poor decision because the serious weakness they introduced was none weaker than weaknesses that already existed. They definitely dodged the bullet on this one. However, had WEP not been so critically broken before 2WIRE's mistake came on the scene, I guarantee that much more attention would've been focused on their flaw.

Finally, 2WIRE's decision to only use numbers in WEP keys itself is somewhat puzzling. I don't know if all 2WIRE routers are this way or if SBC Yahoo! made a deal with 2WIRE for this functionality in order to ease up on tech support calls. Regardless, my guess is that it was a tech support problem having to do with letters being in the keys. Average consumers probably naturally associated security codes with numbers and were getting confused to find letters amist their WEP keys.

And in the end, that's the heart of security, tradeoffs. Some are good, some are bad. Depending on the resources SBC Yahoo!/2WIRE saves because of the decision, it may even be worth it. I just know they got lucky.

Another interesting factoid for the archives of 802.11 wireless security.