Java Collections: Java HashMap Internals

If somebody were to compile a list of most frequently asked Java interview questions of all times, the question that would certainly appear is – how does HashMap work? This question has become so important over the past decade or so that any candidate fumbling on this one would undoubtedly have to different line of work or at the very least a different programming language to work with. For the interviewer this question serves a nice big filter. Any candidate not fine enough to prepare for this question is surely not serious about the job and is not worth spending time on. So, without further ado let us see what makes this question so important.

A Gentle Introduction to HashMap

If you are coming to Java from one of the other innovative high level programming languages such as Python or JavaScript, you might be surprised at the richness of Java Collections. Almost any conceivable data structure is already implemented there. This is in stark contrast to some of the other languages which provide only Arrays (or Lists) and Hashtables. The C programming language is even more minimalistic as it provides only arrays. This explains why coding  hashtables is a favorite time killer for C programmers. In reality, all sophisticated data structures can be built using basic arrays or hashtables.

An array is designed to store objects in a contiguous memory area. The objects are indexed by integers starting from zero. Some typed languages such as C++ and Java impose the restriction that all elements in the array be of the same type, other languages such as Python allow you to mix various data types in the array. The index based approach that array takes is very fast because calculating the memory address of the next element is a matter of incrementing an integer. The drawback of the arrays is that they require contiguous memory area and when you want to store a large amount of data this can take a severe toll on your servers’ resources.

A hashtable is pretty much like an array in that it is also designed to store and retrieve objects efficiently. But the approach that hashtable takes is remarkably different from the arrays. In fact, the amortized complexity of get and put operations in a hashtable is no worse than that of array. A hashtable does not require contiguous memory locations, making it a more valuable data structure when a large amount needs to be stored and  retrieved efficiently. Hashtable achieves this feat using the magic of hash function. A hash function has the property that its output is deterministic, meaning every time we give it a number X the output is another number Y. This property is utilized to calculate the memory address of an object.

HashMap is the implementation of hashtable and is part of the Java Collection library. For the history buffs, there is an older hashtable implementation called Hashtable. HashMap is the preferred and more modern implementation.

equals() and hashcode()

This is another all time favorite interview question. The equals and hashcode are methods defined in the Object class. Therefore, every Java object has a default implementation of these two methods. Java also has equality check in the form ==. A == checks if the LHS object has the same memory  address as the RHS object. For example
By default the equals() method relies on == to check for equality. This is obviously not always desirable. For example an Integer with value 155 is identical and practically the same as that another Integer with value 155 sitting in a different location. This is precisely the reason that many a times you would want to override the equals() method. Consider the code below and its output –

The output is

As is obvious from the above code, by default the primitive 155 are equal but their corresponding Integer objects are not.
As pointed earlier, the situation can be remedied by overriding the equals method.While most of the times it would be intuitive to override equals method you  should always remember that every time you override equals method, you should also override hashcode. Although the reasoning behind this profound statement will become clearer in the rest of the post, you need to remember that two objects which are equal as determined by equals() will have the same hascode() but two unequal objects may have the same hashcode (This is one of the prime causes for hash collisions) .

Inside HashMap

  1. A HashMap implements the Map interface.The basic components of a HashMap are an array and a linked list. A Java HashMap is an array of linked list. The linked list comprises of objects that have the same hashcode. The array is usually referred to as the table and comprises of the first Entry object of the linkedlist. The linkedlist is usually termed as a bucket and comprises of Entry objects as demonstrated in code below –

The static Entry class defines an object that knows about the next element in the bucket and is aware of its own hash value. The hash value determines the bucket in which the element is placed.All the objects in the same bucket have the same hashcode but they are not necessarily equal.

The put() method

The put method adds a new element to a HashMap. The following well commented code explains the steps in greater detail, but broadly the steps involved are –

  1. Calculate the hash value of hashcode of the incoming object (The object that needs to be added to the HashMap)
  2. Calculate the index of the array where this object must go using the hash value created in step #1 . This steps selects the bucket the incoming object must go to.
  3. Start iterating the relevant bucket.
  4. If there is already a value corresponding to the incoming key, replace the existing value with new value and return old value.
  5. For efficiency reasons add the incoming element to the beginning of the bucket.

The following code explains the above steps symbolically –

The get() method

The get method is functionally converse of the put method. The get method take the key as argument and retrieves the element corresponding to the key. If no element can be found the method returns null. Although, get() is functionally converse of put(), the steps involved are somewhat similar –

  1. Calculate using hash() method the hash value corresponding to the incoming key parameter.
  2. Calculate the index in the array (a.k.a table) corresponding to the hash value created in step #1.
  3. Do a linear search in the bucket for the key in the bucket identified in step #2. The linear search relies on the keys being equal. This equality test relies on the equals() method.
  4. Return the value corresponding to the key for the element found in step #3.

Conclusion

This article examined one of the oft-discussed topics in Java programming and interviews. The HashMap implementation requires a sound understanding of data structures, Java’s object model and Java’s memory model. A HashMap delivers constant time put and get operation which is extremely desirable. Learning HashMap internals is especially important for students having Java as their first programming language because when they move to other languages such as C, they may be required to hand craft their own hashtables pretty soon.

Leave a comment

Your email address will not be published.