Substring method from String class is one of most used method in Java, and it's also part of an interesting String interview question e.g. How substring works in Java or sometime asked as how does substring creates memory leak in Java. In order to answer these questions, you knowledge of implementation details is required. Recently one of my friend was drilled on substring method in Java during a Java interview, he was using substring() method from long time, and of course all of us has used this, but what surprises him was interviewer's obsession on Java substring, and deep dive till the implementation level. Though String is a special class in Java, and subject of many interview questions e.g. Why char array is better than String for storing password . In this case it was, substring method, which took center stage. Most of us rather just use substring(..), and than forgot. Not every Java programmer go into code, and see how exactly it's working. To get a feel of how his interview was let's start .
Question starts with normal chit chat, and Interviewer ask, "Have you used substring method in Java", and my friend proudly said Yes, lot many times, which brings a smile on interviewer's face. He says well, that’s good. Next question was Can you explain what does substring do? My friend got an opportunity to show off his talent, and how much he knows about Java API; He said substring method is used to get parts of String in Java. It’s defined in java.lang.String class, and it's an overloaded method. One version of substring method takes just beginIndex, and returns part of String started from beginIndex till end, while other takes two parameters, beginIndex and endIndex, and returns part of String starting from beginIndex to endIndex-1. He also stressed that every time you call substring() method in Java, it will return a new String because String is immutable in Java.
Next question was, what will happen if beginIndex is equal to length in substring(int beginIndex), no it won't throw IndexOutOfBoundException instead it will return empty String. Same is the case when beginIndex and endIndex is equal, in case of second method. It will only throw StringIndexBoundException when beginIndex is negative, larger than endIndex or larger than length of String.
So far so good, my friend was happy and interview seems going good, until Interviewee asked him, Do you know how substring works in Java? Most of Java developers fail here, because they don't know how exactly substring method works, until they have not seen the code of java.lang.String. If you look substring method inside String class, you will figure out that it calls String (int offset, int count, char value ) constructor to create new String object. What is interesting here is, value, which is the same character array used to represent original string. So what's wrong with this?
In case If you have still not figured it out, If the original string is very long, and has array of size 1GB, no matter how small a substring is, it will hold 1GB array. This will also stop original string to be garbage collected, in case if doesn't have any live reference. This is clear case of memory leak in Java, where memory is retained even if it's not required. That's how substring method creates memory leak.
How SubString in Java works
Obviously next question from interviewer would be, how do you deal with this problem? Though you can not go, and change Java substring method, you can still make some work around, in case you are creating substring of significant longer String. Simple solution is to trim the string, and keep size of character array according to length of substring. Luckily java.lang.String has constructor to do this, as shown in below example.
// comma separated stock symbols from NYSE String listOfStockSymbolsOnNYSE = getStockSymbolsForNYSE(); //calling String(string) constructor String apple = new String(listOfStockSymbolsOnNYSE.substring(appleStartIndex,appleEndIndex));
If you look code on java.lang.String class, you will see that this constructor trim the array, if it’s bigger than String itself.
Another way to solve this problem is to call intern() method on substring, which will than fetch an existing string from pool or add it if necessary. Since the String in the pool is a real string it only take space as much it requires. It’s also worth noting that sub-strings are not internalized, when you call intern() method on original String. Most developer successfully answers first three questions, which is related to usage of substring, but they get stuck on last two, How substring creates memory leak or How substring works. It's not completely there fault, because what you know is that every time substring() returns new String which is not exactly true, since it’s backed by same character array.
This was the only interview question, which bothers my friend little otherwise, its standard service level company Java interview in India. By the way, he got the call a day after ,even though he struggled little bit on How SubString method works in Java, and that was the reason he shared this interview experience with me.
Update: This issue was actually a bug http://bugs.sun.com/view_bug.do?bug_id=6294060, which is fixed in substring implementation of Java 7. Now, Instead of backing original character array, substring method creates a copy of it. In short, substring method only retains as much data, as it needed. Thanks to Yves Gillet for pointing this.
Related Java tutorials