Java Collections: Differences between TreeSet, LinkedHashSet and HashSet

To effectively use Java Sets a Java programmer must understand the difference between TreeSet, LinkedHashSet and HashSet in Java. All three of them implement the Set interface and therefore don’t allow duplicate elements. Still, there are a few subtle differences in performance and usage patterns that every Java programmer must be aware of.

Differences between TreeSet, LinkedHashSet and HashSet in Java

There are various aspects that we will use to point out the differences. In a particular problem solving situation some of the aspects will be more important than others. Pick the implementation which is most favorable and efficient in your particular situation.

Comparison Methodology

A Set does not allow duplicate elements. Therefore, any Set implementation needs to adopt a comparison methodology to determine whether two elements are equal or not.

HashSet and LinkedHashSet use the equals() method from Object class to determine if two objects are duplicate of (i.e. equal to) each other or not.

On the other hand TreeSet uses compareTo() method to determine whether two objects are equal to each other or not.

Performance

As indicated earlier in the introductory lesson on Java Set, the most important operations on any Set are – add, remove and contains.

The HashSet has best performance for all these operations, LinkedHashSet is only marginally but a TreeSet is much less desirable from a performance angle. LinkedHashSet is slightly worse than HashSet because it maintains an additional doubly linked list.

In terms of asymptotic complexity for the three operations add, remove and contains – HashSet and LinkedHashSet have Ο(1) and TreeSet has Ο(log(n)). This means that for HashSet and LinkedHashSet add, remove, contains take the same time immaterial of how many elements are present in them. For TreeSet it is lograthimic meaning that time taken for the critical operations increases proportional to the log of size of TreeSet.

Underlying Data Structure

All Set implementations need a mechanism to ensure that duplicates are not allowed. The HashSet uses HashMap and the LinkedHashSet uses a doubly linked list in addition to HashMap. The TreeMap has an additional constraint that it must retain the elements in sorted order and therefore it uses a TreeMap.

Ordering

HashSet is an unordered collection. Meaning it stores the elements in no particular order. This is a direct consequence of using hash table as the underlying data structure.

A LinkedHashSet is an ordered collection that maintains the insertion order of elements. This is achieved by using a doubly linked list in addition to hash table.

TreeSet is an ordered collection that maintains its elements in sorted order. The sorting order is governed either by a Comparator object or the natural ordering of elements.

null Elements

Both HashSet and LinkedHashSet allow exactly one null element to be inserted. TreeSet on the other hand does not allow any null element.

Similarities between TreeSet, LinkedHashSet and HashSet in Java

Having looked at differences between the three Set implementations, let’s quickly look at similarities.

No Duplicates

None of the three implementations allow duplicates to be inserted. This property follows directly from the definition of a Set.

Single Threaded

All three implementations of Set discussed here are intended to be used in a single threaded environment and are not safe to be used in a multi-threaded programs. The benefit that these implementations gain by being single threaded is that they don’t incur any performance overhead arising out of a locking mechanism.

Fail-Fast Iterators

The iterators returned by all three implementations discussed here are fail-fast. This implies that they may (and it is just a ‘MAY’ not a must) throw a ConcurrentModificationException if the Set is modified outside the current thread. Note that the iterators make no guarantees that this exception will always be thrown if the Set is modified concurrently.

When to Use HashSet, TreeSet and LinkedHashSet

  1. Use HashSet if you want the most efficient collection that rejects duplicates.
  2. Use LinkedHashSet if you want the most efficient collection that rejects duplicates and also allows you to iterate the elements in the order they were inserted.
  3. TreeSet works best if you want to maintain a collection that keeps elements in sorted order and rejects duplicates.

We have looked at the differences between three Set implementations. We will next turn to a real life example involving HashSet, LinkedHashSet and TreeSet.This will help you choose a Set implementation in your problems.

Leave a comment

Your email address will not be published. Required fields are marked *