Bash Path Hashing

  • By Brad Conte, September 5, 2008
  • Post Categories: General Tech

Bash is a Unix shell of many features. One of those features, which is both useful and potentially annoying, is the path hash table.

The environment variable we are probably most familiar with is PATH -- it contains all the paths that should be searched when we specify an executable to run. Bash starts with the first path specified in PATH and looks to see if there is an executable present, if there is not it checks the second directory, and so on until it either finds an executable with the specified name or it runs out of paths to search.

The path of an executable practically never changes. "ls" is always in the same place, as is "cat", and every other executable you might have. Thus, to save time, the first time Bash executes an executable it saves the path it finds the executable in, this way Bash can avoid searching for the executable again. The first time you execute "ls", Bash has to search for it. The second through bajillionth times you execute it, Bash will just look up the full path from the table -- much more efficient. It is important to note that every instance of bash maintains its own path hash table -- thus if you have multiple terminals open they will not share path tables.

However, if the actual desired path of an executable changes, problems can arise. By default, Bash will not look for an executable again. If an executable changes paths, Bash will not find it. If an executable is duplicated to a new path, Bash will execute the original executable even if the new one is in a path of higher priority (namely, listed first in PATH). As can be imagined, this can cause quite a headache in these rare scenarios if one is unfamiliar with Bash path hashing. If you create a script, execute it, then edit it and copy it somewhere else (say from your personal "bin" to a global "bin"), the old version will still execute. If you simply move an executable between paths, bash will not find the new executable despite the fact that it seemingly should. Observe the following example, in which I create "myscript" in my home bin, run it, copy it to a global bin, run "myscript" again, and execute the one in my home bin again.

Example of Bash storing the full path of a script and not recognizing a new one.
(This image can serve as a stand-alone explanation -- feel free to redistribute it.)

Thankfully the hash table is easily manipulated -- by the built-in shell command "hash" no less. There are four switches that are likely of the most interest to you:

  • -d [name] will delete a specific entry from the table
  • -r will remove all entries from the table
  • -t [name] will list the save path for the given executable name
  • -l will list all existing executable -> path entries in the table

Using this tool you can view existing hashes and remove old stale ones as you see fit. To solve an out-of-date path hash problem, simple removing it will suffice. The next time the command is run the path hash will be re-updated. For more details, I refer you to the following key excerpts from the Bash man page:

       If the name is neither a shell function nor a builtin, and contains  no
       slashes,  bash  searches  each element of the PATH for a directory con-
       taining an executable file by that name.  Bash uses  a  hash  table  to
       remember  the  full pathnames of executable files (see hash under SHELL
       BUILTIN COMMANDS below).  A full search of the directories in  PATH  is
       performed  only  if the command is not found in the hash table.  If the
       search is unsuccessful, the shell prints an error message  and  returns
       an exit status of 127.

[...]

       hash [-lr] [-p filename] [-dt] [name]
              For  each  name, the full file name of the command is determined
              by searching the directories in $PATH and remembered.  If the -p
              option is supplied, no path search is performed, and filename is
              used as the full file name of the command.  The -r option causes
              the  shell  to  forget  all remembered locations.  The -d option
              causes the shell to forget the remembered location of each name.
              If  the  -t  option is supplied, the full pathname to which each
              name corresponds is printed.  If  multiple  name  arguments  are
              supplied  with  -t,  the  name is printed before the hashed full
              pathname.  The -l option causes output to be displayed in a for-
              mat  that may be reused as input.  If no arguments are given, or
              if only -l is supplied, information about remembered commands is
              printed.   The  return status is true unless a name is not found
              or an invalid option is supplied.