<< Chapter < Page | Chapter >> Page > |
Properties
Hash functions are designed to be fast and to yield few hash collisions in expected input domains. In hash tables and data processing, collisions inhibit the distinguishing of data, making records more costly to find.
A hash function must be deterministic, i.e. if two hashes generated by some hash function are different, then the two inputs were different in some way.
Hash functions are usually not injective, i.e. the computed hash value may be the same for different input values. This is because it is usually a requirement that the hash value can be stored in fewer bits than the data being hashed. It is a design goal of hash functions to minimize the likelihood of such a hash collision occurring.
A desirable property of a hash function is the mixing property: a small change in the input (e.g. one bit) should cause a large change in the output (e.g. about half of the bits). This is called the avalanche effect.
Typical hash functions have an infinite domain, such as byte strings of arbitrary length, and a finite range, such as bit sequences of some fixed length. In certain cases, hash functions can be designed with one-to-one mapping between identically sized domain and range. Hash functions that are one-to-one are also called permutations. Reversibility is achieved by using a series of reversible "mixing" operations on the function input.
Applications
Because of the variety of applications for hash functions (details below), they are often tailored to the application. For example, cryptographic hash functions assume the existence of an adversary who can deliberately try to find inputs with the same hash value. A well designed cryptographic hash function is a "one-way" operation: there is no practical way to calculate a particular data input that will result in a desired hash value, so it is also very difficult to forge. Functions intended for cryptographic hashing, such as MD5, are commonly used as stock hash functions.
Functions for error detection and correction focus on distinguishing cases in which data has been disturbed by random processes. When hash functions are used for checksums, the relatively small hash value can be used to verify that a data file of any size has not been altered.
Cryptography
A typical cryptographic one-way function is not one-to-one and makes an effective hash function; a typical cryptographic trapdoor function is one-to-one and makes an effective randomization function.
Hash tables
Hash tables, a major application for hash functions, enable fast lookup of a data record given its key. (Note: Keys are not usually secret as in cryptography, but both are used to "unlock" or access information.) For example, keys in an English dictionary would be English words, and their associated records would contain definitions. In this case, the hash function must map alphabetic strings to indexes for the hash table's internal array.
The ideal for a hash table's hash function is to map each key to a unique index, because this guarantees access to each data record in the first probe into the table. However, this is often impossible or impractical.
Notification Switch
Would you like to follow the 'Data structures and algorithms' conversation and receive update notifications?