Wednesday, August 4, 2021

How does Java HashMap or LinkedHahsMap handles collisions?

Prior to Java 8, HashMap and all other hash table based Map implementation classes in Java handle collision by chaining, i.e. they use linked list to store map entries which ended in the same bucket due to a collision. If a key end up in the same bucket location where entry is already stored then this entry is just added at the head of the linked list there. In the worst case this degrades the performance of the get() method of HashMap to O(n) from O(1). In order to address this issue in the case of frequent HashMap collisions, Java8 has started using a balanced tree instead of a linked list for storing collided entries. This also means that in the worst case you will get a performance boost from O(n) to O(log n).

The threshold of switching to the balanced tree is defined as TREEIFY_THRESHOLD constant in java.util.HashMap JDK 8 code.  Currently, its values are 8, which means if there are more than 8 elements in the same bucket then HashMap will use a tree instead of a linked list to hold them in the same bucket. 

This change is a continuation of efforts to improve the most used classes. If you remember earlier in JDK 7 they have also introduced a change so that empty ArrayList and HashMap will take less memory by postponing the allocation of the underlying array until an element is added.




This is a dynamic feature which means HashMap will initially use the linked list but when the number of entries crosses a certain threshold it will replace the linked list with a balanced binary tree. Also, this feature will not available to all hash table based classes in Java e.g. Hashtable will not have this feature because of its legacy nature and given that this feature can change the traditional legacy iteration order of Hashtable. Similarly, WeakHashMap will also not include this feature.

So far (until JDK 8) only ConcurrentHashMap, LinkedHashMap and HashMap will use the balanced tree in case of a frequent collision. This is a dynamic feature which means HashMap will initially use the linked list but when the number of entries crosses a certain threshold it will replace the linked list with a balanced binary tree.
How HashMap handles Collision in Java


When does collision occur in HashMap?

There are several classes in JDK which are based upon the hash table data structure like



Underlying working of all these Maps is pretty much the same as discussed in How does HashMap internally works in Java, except some minor differences in their specific behaviors. Since the hash table data structure is subject to collision all these implementations are required to handle the collision.

A collision occurs when a hash function returns the same bucket location for two different keys. Since all hash-based Map class e.g. HashMap uses equals() and hashCode() contract to find the bucket. HashMap calls the hashCode() method to compute the hash value which is used to find the bucket location as shown in the below code snippet from the HashMap class of JDK 1.7 (jkd1.7.0_60) update.

Ignoring the first two lines, which was the performance improvement done for String keys in JDK 7, you can see that the computation of hash is totally based upon the hashCode method.

A collision will occur when two different keys have the same hashCode, which can happen because two unequal objects in Java can have the same hashCode.

How LinkedHahsMap and Map handles collision in Java


Summary

1) HashMap handles collision by using a linked list to store map entries ended up in same array location or bucket location.

2) From Java 8 onwards, HashMap, ConcurrentHashMap, and LinkedHashMap will use the balanced tree in place of linked list to handle frequently hash collisions. The idea is to switch to the balanced tree once the number of items in a hash bucket grows beyond a certain threshold. This will improve the worst-case get() method performance from O(n) to O(log n).

3) By switching from linked list to balanced tree for handling collision, the iteration order of HashMap will change. This is Ok because HashMap doesn't provide any guarantee on iteration order and any code which depends upon that are likely to break.

4) Legacy class Hashtable which exists in JDK from Java 1 will not use the balanced binary tree to handle frequent hash collision to keep its iteration order intact. This was decided to avoid breaking many legacy Java application which depends upon iteration order of Hashtable.

5) Apart from Hashtable, WeakHashMap and IdentityHashMap will also continue to use the linked list for handling collision even in the case of frequent collisions.

6) Collision in HashMap is possible because hash function uses hashCode() of key object and equals() and hashCode() contract doesn't guarantee different hashCode for different objects. Remember, they guarantee same hash code for the equal object but not the vice-versa.

7) A collision will occur on Hashtable or HashMap when hashCode() method of two different key objects will return same values.


That's all about how HashMap in Java handles collisions. In general, this method is called chaining because all objects stored in the same bucket are chained as a linked list. In general, all hash table based classes in Java e.g. HashMap, HashSet, LinkedHashSet, LinkedHashMap, ConcurrentHsahMap, Hashtable, IdentityHashMap and WeakHashMaap uses linked list to handle collisions.

From JDK 8, a balanced tree will be used to implement chaining instead of linked list to improve worst-case performance of HashMap from O(n) to O(log n) for HashMap, LinkedHashMap, and ConcurrentHashMap. Since HashSet internally uses HashMap and LinkedHashSet internally uses LinkedHashMap they will also benefit from this performance improvement.

10 comments:

  1. What is the threshold value to exceed for switching from linked list to balanced tree as data structure

    ReplyDelete
  2. In an HashMap the key is an object, that contains hashCode() and equals(Object) methods.

    When you insert a new entry on the Map, it checks whether the hashCode is already known. Then, it will iterate through all objects with this hashcode, and test their equality with .equals(). If an equal object is found, the new value replace the old one. If not, it will create a new entry in the map.

    Usually, talking about maps, you use collision when two objects have the same hashCode but they are different. They are internally stored in a list

    As we know that the hash collisions makes a huge impact on HashMap performance. When multiple hashCode() values end up in the same bucket, values are placed in the linked list. In worst case, when all keys are mapped to the same bucket, thus degenerating hash map to linked list – from O(1) to O(n) in lookup time.

    when the number of element in the bucket reaches to threshold (currently:

    TREEIFY_THRESHOLD = 8

    This is bin count threshold for using a tree rather than list for a bin.) Java 8 HashMap replaces the linked list with the balanced tree with hash code as a branching variable.So Earlier the colliding keys were simply appended to the linked list.If the two hash codes are different but ended up in the same bucket then one of them is considered bigger and goes to the right of the tree and other one to the left. But when both the hash codes are equal, then HashMap assumes that the keys are comparable and the key compares to determine the direction so that some order can be maintained. It is a good practice to make the keys comparable.

    Making keys comparable the HashMap constructs the balanced tree in case of high collision.

    ReplyDelete
  3. @Anonymous, very good question and @Saral, equally good answer. The hreshold switch from linked list to balanced tree as data structure is defined in java.util.HashMap class as TREEIFY_THRESHOLD = 8. You can check the code, especially put() method which does that conversion. You can open a class from JDK in Eclipse by using shortcut Ctrl + T and in Netbeans by using Ctrl + O open a type. For more shortcuts you can also check my post 30 essential Eclipse Shortcuts for Java Programmers

    ReplyDelete
  4. You mentioned that "By switching from linked list to balanced tree for handling collision, the iteration order of HashMap will change."
    As per my understanding LinkedHashMap maintains order. What happens in case of LinkedHashMap, when switching from linked list to balanced tree for handling collision occurs?

    ReplyDelete
  5. @Pritiviraj, LinkedHashMap's iteration order will not change because its a documented behavior and the class guaranteed that. On the other Iteration order of HashMap and Hashtable are not guaranteed, it's undocumented behavior.

    Due to legacy nature of Hashtable it's not touched and its iteration order will remain same, as many Java application rely on that, but for HashMap it will change.

    It's a good lesson on why Java programmer's should not use the un-documented behaviors or implementation details

    ReplyDelete
  6. So how it will work when key element is not implementing comparable?
    Will it fail run time?

    ReplyDelete
  7. @Parth, There is no requirement for key to implement Comparable in-order to be stored in HashMap, only equals() and hashCode() is required. Comparable is required in case of Sorted collection e.g. TreeMap or TreeSet.

    Coming back to your question if key doesn't follow equals() and hashCode() contract than Map' invariant will not hold e.g. not allowing duplicate values and retrieving values from a key which has same content but a different object.

    In the event of no overriding of equals() and hashCode() default implementation from java.lang.Object will be used.

    ReplyDelete
  8. @Javin
    I was referring to @Saral's explanation above, which read as
    "If the two hash codes are different but ended up in the same bucket then one of them is considered bigger and goes to the right of the tree and other one to the left. But when both the hash codes are equal, then HashMap assumes that the keys are comparable and the key compares to determine the direction so that some order can be maintained. It is a good practice to make the keys comparable."

    So how treenode is structured for elements with same hashcode but equals() is false? Does it rely only hashcode?

    ReplyDelete
  9. What is the difference between add() and set() method in ArrayList

    ReplyDelete
  10. Without using comparable class as key in HashMap, it convert Linked List to Tree Node. Still I don't know, how it convert it into tree. What is the comparison ground when it convert linked list to tree.

    ReplyDelete