Wednesday, July 23, 2014

Java ArrayList and HashMap Performance Improvement in JDK 7

From long time one reason for me to update to newer Java version was always bug fix and performance improvement. Apart from major changes like Generics in Java 1.5 and Lambdas in Java 8, there are so many small improvements, performance optimization which just goes under radar, one of such change is creating empty ArrayList and HashMap with size zero in JDK 1.7.0_40 update. Many Java developer doesn't even know about these changes, part of the blame lies on Java developers like me, as I hardly read release notes of minor Java updates. Some times these changes are done as part of bug fixes and other time as minor optimization, but given popularity of ArrayList and HashMap in Java application impact of this simple Java optimization is huge.

If you are running on Java 1.6 or earlier version of Java 1.7, you can open code of java.util.ArrayList and check that, currently empty ArrayList is initialized with Object array of size 10.

If you create several temporary list in your program, which remains uninitialized, due to any reason then you are not only losing memory but also losing performance by giving your garbage collector more work.

Same is true for empty HashMap, which was initialized by default initial capacity of 16. This changes are result of observation made by Nathan Reynolds, and Architect at Oracle, which apparently analysed 670 Java heap dumps from different Java programs to find out memory hogs.

Change in ArrayList on Java 7 update 40

As I said, when you create empty ArrayList, without specifying any initial capacity i.e. by using new ArrayList(), Java creates an Object array of default size 10 to hold objects. This memory is allocated eagerly, even before you have added any object, which means, if 100K list is created during application runtime, say for storing order details of each order in a transaction processing system, and 10% of them will remain empty than you are going to lose significant memory.

By the way, it's not just memory, it’s also extra work-load for Garbage collector. If you are working in high frequency trading application development, where every ounce of performance matters or just care enough for performance of your Java application, you will appreciate this saving.

Now let's see the actual change :

java.util.ArrayList code from JDK 1.6.30

Here is the code snippet from java.util.ArrayList class from jdk1.6.30 to create an empty ArrayList :

  * Constructs an empty list with an initial capacity of ten.

public ArrayList() {

You can see that it's calling another constructor of java.util.ArrayList with initial capacity 10, which allocates array.

public ArrayList(int initialCapacity) {

  if (initialCapacity < 0)
     throw new IllegalArgumentException("Illegal Capacity: "+ initialCapacity);

  this.elementData = new Object[initialCapacity];


You can see array allocate at last line of constructor (highlighted by amber).

Java ArrayList and Hashmap Performance Improvement in JDK 1.7

java.util.ArrayList code from JDK 1.7.0._40

This version of ArrayList has an empty shared array, a static variable which is shared by all instances of ArrayList class.

  * Shared empty array instance used for empty instances.

private static final Object[] EMPTY_ELEMENTDATA = {};

and now look at the change made in no-argument constructor of java.util.ArrayList class

 * Constructs an empty list with an initial capacity of ten.
public ArrayList() {
  this.elementData = EMPTY_ELEMENTDATA;

You can see that, instead of constructor chaining, elementDate is assigned an empty array. This immediately save memory hogged by an object array of size 10. By the way, how many of you have noticed the same comment "Constructs an empty list with an initial capacity of ten/" in both the version? Yes, they forget to update the comment, and that's one of the reason, why code comments are bad? they quickly loss relevance, as no compiler is there to verify correctness of a comment.

Change in HashMap on JDK 7 updated 40

Similar change has been made on java.util.HashMap class, earlier it was used to initialized by default size of 16, but now its initialized by empty table.

java.util.HashMap code from jdk1.6.30

Here is the code for creating empty HashMap in Java 6, you can see that table instance variable is initialized by an Entry array of default initial size 16, highlighted by red :

     * Constructs an empty <tt>HashMap</tt> with the default initial capacity
     * (16) and the default load factor (0.75).
    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        table = new Entry[DEFAULT_INITIAL_CAPACITY];

java.util.HashMap code from jdk1.7.0._40

In this version a special shared empty table has created, it's static final variable, so that all instance of HashMap can share it. Initialization of table is also moved out of constructor to the same line, where table is declared. Here is code snippet from Java 1.7 update 40 :

     * An empty table instance to share when the table is not inflated.

    static final Entry<?,?>[] EMPTY_TABLE = {};

     * The table, resized as necessary. Length MUST Always be a power of two.

    transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

This saves memory hogged by an Entry array of size 16. Actual initialization of table is now moved into put(K,V) and putAll(K,V), where inflateTable() method is called to allocate memory, as seen below :

  public V put(K key, V value) {
        if (table == EMPTY_TABLE) {

These change is also documented as part of bug JDK-8011200 - (coll) Optimize empty ArrayList and HashMap, and they have also done a performance test to ensure no side effect on  JDK performance.

That's all about this optimization of empty ArrayList and HashMap in JDK 7, no doubts this is going to save a lot of memory and also reduce garbage collection. Take away from this post is to pay attention on any core library change made on minor Java updates, as you could potentially improve performance of your Java application, just by switching to new JVM. Don't think that because you are not using new features of JDK 7, it's not necessary for you and your project to update to newer Java version. In every Java release, several bugs are fixed and optimizations are done,  and everyone you should take advantage of that.

Further Learning
Understanding the Java Virtual Machine: Memory Management
What's New in Java 8
What's New in Java 9 By Sander Mak
Java Performance The Definitive Guide


rAeviLmAn D said...

I would focus on this blog rather going through change logs :D as there is nice explanation here...

Anonymous said...

The comment on the ArrayList default constructor isn’t wrong, it still has a default capacity of ten. The only difference is that the creation of an array using the default capacity is deferred to the first add() operation. But it still holds that if you know in advance that ten is not a suitable capacity for your use case you should use a different constructor.

mahesh loya said...

fOR HashMap... In JAVA 8..
* The default initial capacity - MUST be a power of two.
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

Post a Comment