Coding Raptor

Programmers' Nest

You are here: Home / java / core-java / Java Collections: Sets

January 15, 2017

Java Collections: Sets

A Set is a Java Collection that does not allow duplicate elements.  A Java Set is embodiment of a Mathematical Set, which comprises of zero or more distinct, possibly infinite distinct unordered elements. Like all other collections, Sets in Java comprise of an interface aptly named Set and a number of concrete implementations.

Differences Between Java Sets and Sets in Mathematics

A Java Set models as closely as possible a mathematical set. Nevertheless, there are a few noteworthy differences

  1. A mathematical set is by definition unordered. Therefore, it is not possible to identify elements of a mathematical set based on the index. A Java Set may or may not be unordered. A few implementations such as HashSet and TreeSet are unordered, others such as LinkedHashSet are not.
  2. A mathematical set may contain heterogeneous types of elements. For example,
    1
    { "CodingRaptor.com is #", 1 }

    is a perfectly valid mathematical set. On the other hand, Java being a typed language cannot allow variables of distinct types to be the member of the same set.
  3. A mathematical set may be an infinite set. For example, a set comprising all positive numbers is infinite and is a perfectly valid mathematical set. Of course, no programming language can accommodate infinite elements.
  4. A Java Set uses
    element1.equals(element2)
    

    to determine whether two elements are distinct or not. A mathematical notion of equality is based on the = operator.

  5. A mathematical set is allowed to have a maximum of one null element, denoted by ∅. A Java Set may or may not allow null elements. This behavior depends on the concrete implementation.

Java Set Implementations

We have already seen that the Set interface does not define any new methods. There are two sub-interfaces of java.util.Set interface viz. SortedSet and NavigableSet. It should be noted here that although Set interface does not introduce any new methods but imposes new contracts implicitly. For example, the add method of a Set will reject duplicates.

The Java Collections framework provides four concrete, ready-to-use implementations of the Set interface.

  1. HashSet
  2. LinkedHashSet
  3. TreeSet
  4. EnumSet

HashSet

HashSet is the most often used Set. Internally it uses a hash table (a HashMap) to ensure that duplicates are not allowed. The other salient features of  HashSet are –

  1. Θ(1) or constant time performance for add, remove, contains and size operations. This is the core value proposition of HashSet. Anytime you want a bag like data structure of unique elements with best add and remove performance, HashSet should be used.
  2. Ο(n) or linear time performance for iterating over all elements. Again, this is the most efficient asymptotic performance for iterating over all elements. Note that iteration performance is dependent on the capacity rather than size of HashSet.
  3. The overall performance of a HashSet is dependent on initial capacity and load factor.
  4. HashSet is unordered. Therefore, order of insertion into the data structure says nothing about the relative position of elements.
  5. HashSet permits null value to be added exactly once. By default the hashCode() returns 0 for null.
  6. HashSet is not synchronized.  This means that in a multi-threaded environment a HashSet may be altered by one thread while other threads are concurrently reading or writing the same HashSet.
  7. The iterators of HashSet are fail-fast. This means that in a multi-threaded environment if a HashSet is modified in a different thread while it is being used in the current thread, a ConcurrentModificationException is thrown. Throwing of this exception partly solves multiple reader writer without using a lock as it notifies to other threads that the underlying data structures has been modified and they need to begin their operations afresh. Note that HashSet does not provide hard guarantees around ConcurrentModificationException. Therefore, while using HashSet you should be prepared for this exception but you cannot depend on this exception for correctness of programs.

LinkedHashSet

A LinkedHashSet is a sub-class of LinkedHashSet and has just one purpose over and beyond what is served by HashSet. A LinkedHashSet preserves the insertion order of the elements. Therefore, it is possible to iterate over the elements of a LinkedHashSet in the order they were inserted in. Since is LinkedHashSet a HashSet, all other properties of HashSet mentioned in the previous section also apply to LinkedHashSet.

A LinkedHashSet maintains the insertion order of the elements by also keeping the elements in a doubly linked list in addition to being kept in HashMap. This extra data structure results in slight dip in performance for add, remove and contains operations as the changes need to be made at two places. The asymptotic complexity remains the same.

Iteration, on the other hand, is slightly more efficient in LinkedHashSet than in HashSet. This is because iterating a doubly linked list is proportional to the actual size (i.e. number of elements). Iterating a HashMap (and therefore HashSet) on the other hand is proportional to the capacity of the hashtable. Overall performance of LinkedHashSet pretty much like HashSet is dependent on initial capacity and load factor. But in LinkedHashSet the performance hit for having a high initial capacity is pretty low.

TreeSet

A TreeSet is an ordered Set and therefore has very different characteristic than a HashSet or LinkedHashSet. A TreeSet maintains the ordering (either natural or as defined by a Comparator) elements. In practical terms this translates into the elements being accessed in ascending order during iteration.

Salient features of TreeSet –

  1. A TreeSet keeps its element sorted. This sorting may be based on natural ordering or a Comparator instance.
  2. TreeSet uses a Red-Black Tree underneath the same way as a HashSet uses HashMap. (The underlying implementation is JDK dependent and is not part of any specification. Oracle’s JDK uses Red-Black Tree, other JDKs may use other Trees).
  3. lg(n) or logarithmic time performance for add, remove, contains and size operations. This complexity is the result of using a balanced tree as the underlying implementation.
  4. TreeSet is not synchronized.  This means that in a multi-threaded environment a TreeSet may be altered by one thread while other threads are concurrently reading or writing the same TreeSet.
  5. The iterators of TreeSet are fail-fast. This means that in a multi-threaded environment if a TreeSet is modified in a different thread while it is being used in the current thread, a ConcurrentModificationException is thrown. Throwing of this exception partly solves multiple reader writer without using a lock as it notifies to other threads that the underlying data structures has been modified and they need to begin their operations afresh. Note that TreeSet does not provide hard guarantees around ConcurrentModificationException. Therefore, while using TreeSet you should be prepared for this exception but you cannot depend on this exception for correctness of programs.
  6. A TreeSet does not allow null element.

EnumSet

An EnumSet, introduced in JDK 1.5, is a special kind of Set where all elements are of the same enum type.

The salient features of EnumSet are as follows –

  1. EnumSet is abstract and cannot be instantiated using new operator.
  2. All the elements of EnumSet must be of the same enum type.
  3. The elements of an EnumSet must be specified at the creation time. This is usually done by passing arguments to the of method of an EnumSet.
  4. All operations such as add, remove, contains are Ο(1). This is the biggest selling point of EnumSet. From a performance angle this is the most efficient Set, although it is not as versatile as others and cannot be used in every situation.
  5. EnumSet is internally implemented as Bit Vectors.
  6. EnumSet is not synchronized and therefore unsafe in a multi-threaded environment.
  7. The iterators of EnumSet are not fail-safe but weakly consistent. Therefore, a concurrent modification will never throw an exception.
  8. null values are not permitted in an EnumSet.

We are social

Spread the word
Facebooktwittergoogle_plusredditpinterestlinkedintumblrmailFacebooktwittergoogle_plusredditpinterestlinkedintumblrmail

Follow CodingRaptor
Facebooktwittergoogle_plusrssFacebooktwittergoogle_plusrss
Post Views: 52

Share with your friends on other avenues:

  • Twitter
  • Facebook
  • Google
  • Reddit
  • Pinterest
  • Tumblr
  • Print
  • LinkedIn
  • Email
  • More
  • Pocket
  • Skype

Related

Article by P T / core-java, java / collections, hashset, java, java set, linkedhashset, mathematical set, treeset Leave a Comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Also Check These Out!

  • Factors in R
  • Basic Data Structures V: Arrays in R
  • Basic Data Structures IV: Lists in R
  • Basic Data Structures III: Data Frames in R
  • Basic Data Structures II: Matrices in R
  • Basic Data Structures I: Vectors in R
  • Atomic or Basic Data Types and Scalars in R
  • Getting Started with R: Packages, Help & Workspace (Part 2)
  • Getting Started with R: Installing and Running R (Part 1)
  • Floating-Point in Java: Representation, Comparison, Equality & A Few Shockers
  • Java Ternary Operator a.k.a Conditional Operator
  • Arrays In JavaScript
  • Coding Raptor Turns 1
  • Java EnumMap with Example
  • LinkedHashMap Example: LRU Cache Implementation
  • LinkedHashMap with Example
  • Java TreeMap with Example
  • Java Collections: Java HashMap Internals
  • Java HashMap with Example
  • Java Map
  • Java Collections: EnumSet with Example
  • Java Collections: Problem Solving with Java Sets
  • Java Collections: Differences between TreeSet, LinkedHashSet and HashSet
  • Java Collections: Sets
  • Using Lists in Java
  • Java Collections: Using Lists
  • Java Collections: Interfaces
  • Java Collections: General Overview
  • Debugging in Netbeans
  • Learn Basics of Core Java in 49 Lessons
  • Varargs in Java
  • Exceptions in Java VIII: Creating Custom Exceptions
  • Exceptions in Java VII: throw and throws in Java
  • Exceptions in Java VI: Types and Hierarchy
  • Exceptions in Java V: try with Resources
  • Exceptions in Java IV: Summary of Rules for try-catch-finally
  • Exceptions in Java iii: Control Flow and Multiple try, catch, finally
  • Exceptions in Java II: try, catch, finally
  • Exceptions in Java I: Introduction
  • Strings in Java VI: More String Methods
  • String in Java V: The intern Method
  • String in Java IV: Important String Methods
  • Strings in Java III: StringBuilder and StringBuffer
  • Strings in Java II: Immutability
  • Strings in Java I: What is a String?
  • Arrays in Java V: Iteration
  • Arrays in Java IV: Manipulating Individual Elements
  • Arrays in Java III: Multidimensional Arrays
  • Arrays in Java II: Declaring, Initializing, Instantiating Arrays
  • Arrays in Java I: What is an Array?
  • Using Environment Variables as Inputs
  • Command Line Arguments in Java
  • static import in Java
  • The import Statement in Java
  • Packages in Java
  • Non Access Modifiers in Java III: abstract, transient, volatile, native
  • Non Access Modifiers in Java II: final, synchronized
  • Non Access Modifiers in Java I: static and strictfp
  • Access Modifiers in Java
  • Looping Construct: for Loop
  • Looping Construct: while and do-while
  • Statements, Expressions and Code Blocks
  • Operators in Java
  • Conditional Statements in Java
  • Reference Data Types
  • Literals in Java
  • Primitive Data Types in Java
  • Variables in Java
  • Compiling and Running Java from Command Line
  • First Java Program: Hello World!
  • Java IDEs and Editors III: Sublime Text and Editplus
  • Java IDEs and Editors II: Eclipse IDE
  • Java IDEs and Editors I: Netbeans
  • Setting up Java Development Environment
  • Java’s Program Execution Model and WORA: Compilation & Interpretation
  • Inside JVM 101: What does JVM do and Memory Areas
  • JVM, JRE, JDK and JIT explained
  • Features of Java and White Paper Buzzwords
  • Beginning Java: History in Brief
Facebooktwittergoogle_plusrssFacebooktwittergoogle_plusrss

Copyright © 2017 · Coding Raptor

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.