String Deduplication of G1 Garbage collector to Save Memory from Duplicate String in Java 8

You might not be aware that Java 8 update 20 has introduced a new feature called "String deduplication" which can be used to save memory from duplicate String objects in Java application, which can improve the performance of your Java application and prevent java.lang.OutOfMemoryError if your application makes heavy use of String. If you have profiled a Java application to check which object is taking the bulk of memory, you will often find char[] object at the top of the list, which is nothing but internal character array used by String object. Some of the tools and profilers might show this as java.lang.String[] as well e.g. Java Flight Recorder, but they are essentially pointing to the same problem i.e. a major portion of memory is occupied with String objects.

Since from Java 7 onward, String has stopped sharing character array with sub-strings, the memory occupied by String object has gone higher, which had made the problem even worse. If you remember, earlier both substring and String share the same character objects (see how Substring works in Java), which was actually a bug that had the potential to cause a serious memory leak. The bug was fixed in JDK 7, but it created this new problem.

The String deduplication is trying to bridge that gap. It reduces the memory footprint of String object on the Java Heap space by taking advantage of the fact that many String objects are identical. Instead of each String object pointing to their own character array, identical String objects can point to the same character array.

Btw, this is not exactly the same as it was before Java 7 update 6, where substring also points to the same character array, but can greatly reduce memory occupied by duplicate String in JVM. Anyway, In this article, you will see how you can enable this feature in Java 8 to reduce memory consumed by duplicate String objects.

Btw, if you are not familiar with new features on Java 8 then I suggest you first go through a comprehensive and up-to-date Java course like The Complete Java MasterClass on Udemy. It's also very affordable and you can buy in just $10 on Udemy sales which happen every now and then.

How to enable String deduplication in Java 8

String deduplication is not enabled by default in Java 8 JVM. You can enable the String deduplication feature by using -XX:+UseStringDeduplication option. Unfortunately, String deduplication is only available for the G1 garbage collector, so if you are not using G1 GC then you cannot use the String deduplication feature.

It means just providing -XX:+UseStringDeduplication will not work, you also need to turn on G1 garbage collector using -XX:+UseG1GC option.

String deduplication also doesn't consider relatively young String for processing. The minimal age of processed String is controlled by -XX:StringDeduplicationAgeThreshold=3 option. The default value of this parameter is 3.

Now, you might be thinking that how does this method compares with the traditional way to reduce memory due to duplicate String e.g. by using intern() method of java.lang.String class? Well, this approach has an advantage because you don't need to write a single line of code. Just enable this feature using JVM parameters and you are done.

If you have ever optimized your code by using String.intern() method then you know that it's not easy. It not only compromise readability by adding additional lines of code without adding any functionality but also increase the size of the code.  Btw, if you are interested in learning more about G1 Garbage collector,  I suggest reading Java Performance Companion by Charlie Hunt, which covers some good information about G1 Garbage Collector.

Use String Deduplication to Save Memory from Duplicate String in Java 8

Important points

Here are some of the important points about String deduplication feature of Java 8:

1) This option is only available from Java 8 Update 20 JDK release.

2) This feature will only work along with G1 garbage collector, it will not work with other garbage collectors e.g. Concurrent Mark Sweep GC.

3) You need to provide both -XX:+UseG1GC and -XX:+StringDeduplication JVM options to enable this feature, first one will enable the G1 garbage collector and the second one will enable the String deduplication feature within G1 GC.

4) You can optionally use -XX:+PrintStringDeduplicationStatistics JVM option to analyze what is happening through the command-line.

5) Not every String is eligible for deduplication, especially young String objects are not visible, but you can control this by using  -XX:StringDeduplicationAgeThreshold=3 option to change when Strings become eligible for deduplication.

6) It is observed in general this feature may decrease heap usage by about 10%, which is very good, considering you don't have to do any coding or refactoring.

7) String deduplication runs as a background task without stopping your application.

That's all about how to use enable String deduplication in Java 8 to reduce memory consumed by duplicate String objects. This is one of the useful features to know about it but unfortunately, it is only available for G1 Garbage Collector. You also need Java 8 Update 20 to use to enable this option. Hopefully, in Java 9, when the G1 Garbage collector will become the default collector, it can use this feature to further improve performance. If we are lucky, we may also see this feature extended for other major garbage collectors like Concurrent Mark Sweep Garbage collector.

If you want to learn more about Java performance, JVM options, and profiling Java application, I suggest reading  Java Performance The Definitive Guide By Scott Oaks, one of the best books to learn and understand tools and techniques of Java performance tuning.

Btw, the more up-to-date  Java Performance Companion by Charlie Hunt has also contained some really good information about G1 Garbage Collector, which is more relevant related to Sring deduplication.

Further Learning
The Complete Java MasterClass
From Collections to Streams in Java 8 Using Lambda Expressions
Refactoring to Java 8 Streams and Lambdas Online Self- Study Workshop
Complete Java SE 8 Developer Bootcamp - OCA Prep Included
What's New in Java 8 -List of Useful Features


Cristian Daniel Ortiz Cuellar said...

Sorry i dont get it. All the String literals are interned right? how can be a duplicate Strings? If i got String name1 = new String("JOHN");String name2 = new String("JOHN"); only 1 String will be on the heap? if i set XX:+UseG1GC -XX:+UseStringDeduplication? Sorry if the question is plain!!

javin paul said...

Hello Christian, yes, but JOHN and JOHNN will each have their own character array even though one is substring of other. In second paragraph of this article I have mentioned that "Since from Java 7 onward, String has stopped sharing character array with sub-strings, the memory occupied by String object has gone higher, which had made the problem even worse."

That's the exact reason of using this flag. Hope this clears your doubt.

Post a Comment