0.8 Hashing (Page 9/13)

Data structures and algorithms Page 9 / 13

Allocate the new hash table, but leave the old hash table and check both tables during lookups.
Each time an insertion is performed, add that element to the new table and also move k elements from the old table to the new table.
When all elements are removed from the old table, deallocate it.

To ensure that the old table will be completely copied over before the new table itself needs to be enlarged, it's necessary to increase the size of the table by a factor of at least (k + 1)/k during the resizing.

Linear hashing is a hash table algorithm that permits incremental hash table expansion. It is implemented using a single hash table, but with two possible look-up functions.

Another way to decrease the cost of table resizing is to choose a hash function in such a way that the hashes of most values do not change when the table is resized. This approach, called consistent hashing, is prevalent in disk-based and distributed hashes, where resizing is prohibitively costly.

Ordered retrieval issue

Hash tables store data in pseudo-random locations, so accessing the data in a sorted manner is a very time consuming operation. Other data structures such as self-balancing binary search trees generally operate more slowly (since their lookup time is O(log n)) and are rather more complex to implement than hash tables but maintain a sorted data structure at all times. See a comparison of hash tables and self-balancing binary search trees.

Problems with hash tables

Although hash table lookups use constant time on average, the time spent can be significant. Evaluating a good hash function can be a slow operation. In particular, if simple array indexing can be used instead, this is usually faster.

Hash tables in general exhibit poor locality of reference—that is, the data to be accessed is distributed seemingly at random in memory. Because hash tables cause access patterns that jump around, this can trigger microprocessor cache misses that cause long delays. Compact data structures such as arrays, searched with linear search, may be faster if the table is relatively small and keys are cheap to compare, such as with simple integer keys. According to Moore's Law, cache sizes are growing exponentially and so what is considered "small" may be increasing. The optimal performance point varies from system to system; for example, a trial on Parrot shows that its hash tables outperform linear search in all but the most trivial cases (one to three entries).

More significantly, hash tables are more difficult and error-prone to write and use. Hash tables require the design of an effective hash function for each key type, which in some situations is more difficult and time-consuming to design and debug than the simple comparison function required for a self-balancing binary search tree. In open-addressed hash tables it's fairly easy to create a poor hash function.

Additionally, in some applications, a black hat with knowledge of the hash function may be able to supply information to a hash which creates worst-case behavior by causing excessive collisions, resulting in very poor performance (i.e., a denial of service attack). In critical applications, either universal hashing can be used or a data structure with better worst-case guarantees may be preferable.

<< Chapter < Page Page > Chapter >>

Read also:

Get Jobilize Job Search Mobile App in your pocket Now!

100% Free Mobile Applications
Receive real-time job alerts and never miss the right job again

Source: OpenStax, Data structures and algorithms. OpenStax CNX. Jul 29, 2009 Download for free at http://cnx.org/content/col10765/1.1

Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Data structures and algorithms' conversation and receive update notifications?

Ask

	Social Media By Brenna Fike Start Quiz
	Real Estate Finance By Mldelatte Start Quiz
	27 AP 27 Reproductive System Essay By OpenStax Start Flashcards
	13 Biology 13 Modern Understandings of Inheritance By OpenStax Start Quiz
	3 Psychology MCQ 2009 2 Exam By John Gabrieli Start Exam
	24 AP Key Terms 24 Metabolism Nutrition By OpenStax Start Key Terms
©flickr:	Biology 1 By Jill Zerressen Start Quiz
	OOP with Java - Quiz 1 By Vongkol HENG Start Quiz
©flickr: BK	Self Confidence By Miranda Reising Start Quiz
	Computer System Engineering By Robert Morris Start Exam