Friday, July 30, 2021

Java ArrayList and HashMap Performance Improvement in Java

For a long time, one reason for me to update to the newer Java version was always bug fix and performance improvement. Apart from major changes like Generics in Java 1.5 and Lambdas in Java 8, there are so many small improvements, performance optimization which just goes under the radar, one such change is creating empty ArrayList and HashMap with size zero in JDK 1.7.0_40 update. Many Java developer doesn't even know about these changes, part of the blame lies on Java developers like me, as I hardly read release notes of minor Java updates. 

Sometimes these changes are done as part of bug fixes and other times as minor optimization, but given the popularity of ArrayList and HashMap in Java applications impact of this simple Java optimization is huge.

If you are running on Java 1.6 or an earlier version of Java 1.7, you can open the code of java.util.ArrayList and check that, currently empty ArrayList is initialized with Object array of size 10.

If you create several temporary lists in your program, which remains uninitialized, due to any reason then you are not only losing memory but also losing performance by giving your garbage collector more work.

The same is true for empty HashMap, which was initialized by the default initial capacity of 16. These changes are the result of observation made by Nathan Reynolds, an Architect at Oracle, which apparently analyzed 670 Java heap dumps from different Java programs to find out memory hogs.


Change in ArrayList on Java 7 update 40

As I said, when you create an empty ArrayList, without specifying any initial capacity i.e. by using the new ArrayList(), Java creates an Object array of default size 10 to hold objects. This memory is allocated eagerly, even before you have added any object, which means, if 100K list is created during application runtime, say for storing order details of each order in a transaction processing system, and 10% of them will remain empty then you are going to lose significant memory.



By the way, it's not just memory, it’s also an extra workload for the Garbage collectors. If you are working in high-frequency trading application development, where every ounce of performance matters or just cares enough for the performance of your Java application, you will appreciate this saving.

Now let's see the actual change :


java.util.ArrayList code from JDK 1.6.30

Here is the code snippet from java.util.ArrayList class from jdk1.6.30 to create an empty ArrayList :

/**
  * Constructs an empty list with an initial capacity of ten.
  */

public ArrayList() {
   this(10);
}

You can see that it's calling another constructor of java.util.ArrayList with initial capacity 10, which allocates array.

public ArrayList(int initialCapacity) {
  super();

  if (initialCapacity < 0)
     throw new IllegalArgumentException("Illegal Capacity: "+ initialCapacity);

  this.elementData = new Object[initialCapacity];

}

You can see the array allocate at the last line of the constructor (highlighted by amber).

Java ArrayList and Hashmap Performance Improvement in JDK 1.7

java.util.ArrayList code from JDK 1.7.0._40

This version of ArrayList has an empty shared array, a static variable that is shared by all instances of the ArrayList class.

/**
  * Shared empty array instance used for empty instances.
  */

private static final Object[] EMPTY_ELEMENTDATA = {};

and now look at the change made in no-argument constructor of java.util.ArrayList class

/**
 * Constructs an empty list with an initial capacity of ten.
 */
public ArrayList() {
  super();
  this.elementData = EMPTY_ELEMENTDATA;
}

You can see that, instead of constructor chaining, elementDate is assigned an empty array. This immediately saves memory hogged by an object array of size 10. By the way, how many of you have noticed the same comment "Constructs an empty list with an initial capacity of ten/" in both the version? Yes, they forget to update the comment, and that's one of the reasons, why code comments are bad? they quickly lose relevance, as no compiler is there to verify the correctness of a comment.

Change in HashMap on JDK 7 updated 40

A similar change has been made on java.util.HashMap class, earlier it was used to initialized by default size of 16, but now its initialized by an empty table.

java.util.HashMap code from jdk1.6.30

Here is the code for creating empty HashMap in Java 6, you can see that table instance variable is initialized by an Entry array of default initial size 16, highlighted by red :

   /**
     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
     */
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        threshold = (int) (DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
        table = new Entry[DEFAULT_INITIAL_CAPACITY];
        init();
    }

java.util.HashMap code from jdk1.7.0._40

In this version a special shared empty table has created, it's static final variable, so that all instance of HashMap can share it. Initialization of table is also moved out of constructor to the same line, where table is declared. Here is code snippet from Java 1.7 update 40 :

      /**
     * An empty table instance to share when the table is not inflated.
     */

    static final Entry<?,?>[] EMPTY_TABLE = {};

    /**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */

    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

This saves memory hogged by an Entry array of size 16. Actual initialization of table is now moved into put(K,V) and putAll(K,V), where inflateTable() method is called to allocate memory, as seen below :

  public V put(K key, V value) {
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);
        }
        .....
  }

These change is also documented as part of bug JDK-8011200 - (coll) Optimize empty ArrayList and HashMap, and they have also done a performance test to ensure no side effect on  JDK performance.


That's all about this optimization of empty ArrayList and HashMap in JDK 7, no doubts this is going to save a lot of memory and also reduce garbage collection. Take away from this post is to pay attention on any core library change made on minor Java updates, as you could potentially improve the performance of your Java application, just by switching to new JVM. 

Don't think that because you are not using new features of JDK 7, it's not necessary for you and your project to update to a newer Java version. In every Java release, several bugs are fixed and optimizations are done,  and everyone should take advantage of that.


3 comments:

  1. I would focus on this blog rather going through change logs :D as there is nice explanation here...

    ReplyDelete
  2. The comment on the ArrayList default constructor isn’t wrong, it still has a default capacity of ten. The only difference is that the creation of an array using the default capacity is deferred to the first add() operation. But it still holds that if you know in advance that ten is not a suitable capacity for your use case you should use a different constructor.

    ReplyDelete
  3. fOR HashMap... In JAVA 8..
    /**
    * The default initial capacity - MUST be a power of two.
    */
    static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

    ReplyDelete