Tuesday, September 29, 2015

How to remove all elements from ArrayList in Java - Clear vs RemoveAll

Many times we want to reset an ArrayList for the reusing purpose, by resetting we mean clearing it or removing all elements. There are two ways to reset an ArrayList in Java, by using clear() method or calling removeAll(). If your ArrayList is small enough e.g. contains only 10 or 100 elements then you can use any of these two methods without worrying too much, but, if you have a huge list of lots of objects e.g. an ArrayList containing 10M entries, then choice of clear() vs removeAll() can make a huge difference in performance of your Java application. Sometimes it's even better to create a new ArrayList instead of resetting the old one, especially if resetting takes a long time, but this also has a caveat, you need to make sure that old ArrayList is eligible for garbage collection, otherwise there is a huge risk of java.lang.OutOfMemoryError: Java Heap Space.

Coming back to clear() vs removeAll() method, you should always use clear(), because it gives you O(n) performance, while removeAll(Collection c) is worse, it gives O(n^2) performance, that's why you see huge difference in time taken by clearing a large ArrayList by these two methods.

Things will be obvious, when you will run our example program and see the code of clear() and removeAll() method from JDK API. By the way, if you are in doubt, use clear() method and if not then always prefer clear over removeAll in Java.

Clear() vs RemoveAll(Collection c)

In order to compare the performance of both these methods, it's very important to see their code. You can check the source code of the clear() method in java.util.ArrayList class, for convenience I have included it here. This code is from JDK version 1.7.0_40. If you want to learn more about performance measurement and tuning then I strongly suggest reading Java Performance The Definitive Guide By Scott Oaks. It covers Java 7 and some bits of Java 8 as well.

     * Removes all of the elements from this list.  The list will
     * be empty after this call returns.
    public void clear() {
        // clear to let GC do its work
        for (int i = 0; i < size; i++)
            elementData[i] = null;
        size = 0;

You can see that this method loop over ArrayList and assign null to every element to make them eligible for garbage collection, of course if there is no external reference. Similarly, you can check the source code of java.util.AbstractCollection class to look at how removeAll(Collection c) method works, here is snippet:

public boolean removeAll(Collection c) {
        boolean modified = false;
        Iterator it = iterator();
        while (it.hasNext()) {
            if (c.contains(it.next())) {
                modified = true;
        return modified;

This implementation iterate over the collection, checking each element returned by the iterator, in turn, to see if it's contained in the specified collection.  If it's present the it is removed from this collection by using Iterator's remove method. Because of using contains() method, removeAll() performance goes into the range of O(n^2), which is an absolutely NO, especially if you are trying to reset a large ArrayList. Now let's see their performance in action to reset an ArrayList of just 100K entries.

If you are interested more in Java Performance measurement and tuning then I also suggest you take a look at Java Performance The Definitive Guide By Scott Oaks, one of the best book on Java profiling.

clear vs removeAll two ways to remove all elements form ArrayList

Removing all elements from ArrayList with 100K Objects

I have initially tried to run this example with 10M elements but after waiting for more than half an hour to let removeAll() finish, I decided to reduce the number of objects to 100K, even then the difference between the time taken by clear() vs removeAll() is quite significant. The removeAll(Collection c) are taking 10000 times more time than clear to reset.

Actually, the purpose of clear() and removeAll(Collection c) are different in API, clear() method is meant to reset a Collection by removing all elements, while removeAll(Collection c) only removes elements which are present in supplied collection. This method is not designed to remove all elements from a Collection.

So, if your intention is to delete all elements from a Collection, then use clear(), while if you want to remove only some elements, which are present in another Collection, e.g. list of closed orders, then use removeAll() method .

import java.util.ArrayList;

 * Java Program to remove all elements from list in Java and comparing
 * performance of clearn() and removeAll() method.
 * @author Javin Paul
public class ArrayListResetTest {
    private static final int SIZE = 100_000;
    public static void main(String args[]) {
        // Two ArrayList for clear and removeAll
        ArrayList numbers = new ArrayList(SIZE);
        ArrayList integers = new ArrayList(SIZE);
        // Initialize ArrayList with 10M integers
        for (int i = 0; i &lt; SIZE; i++) {
            numbers.add(new Integer(i));
            integers.add(new Integer(i));
        // Empty ArrayList using clear method
        long startTime = System.nanoTime();
        long elapsed = System.nanoTime() - startTime;
        System.out.println("Time taken by clear to empty ArrayList of 1M elements (ns): " + elapsed);
       // Reset ArrayList using removeAll method
        startTime = System.nanoTime();
        long time = System.nanoTime() - startTime;
        System.out.println("Time taken by removeAll to reset ArrayList of 1M elements (ns): " + time);

Time taken by clear to empty ArrayList of 100000 elements (ns): 889619
Time taken by removeAll to reset ArrayList of 100000 elements (ns): 36633112126

Make sure you provide sufficient memory to run this program because it's uses two ArrayList to store Integers, especially if you want to compare the performance of clear() and removeAll() for List with 1M elements. You also need Java 7 to run this program because I am using underscore with the numeric literal feature. If you don't have JDK 7 then just remove underscores from SIZE constants, those are just for improving readability.

That's all about how to reset an ArrayList in Java. We have not only learned two ways to remove all elements from ArrayList but also learned about the difference between clear vs removeAll method. We have seen that removeAll performs poorly when the list is large and that's why you should always prefer clear() over removeAll() in Java.

By the way, if clearing ArrayList is taking significant time, consider using new ArrayList, as Java is pretty fast in creating objects.

Further Learning
Java Fundamentals: The Java Language by Jim Wilson
Java Fundamentals: Collections
Head First Java

Other Java Arraylist tutorials you may like
If you interested to learn more about ArrayList class, one of the most useful class from Java collection framework, then you might like following articles as well:
  • Java - ArrayList and HashMap Performance Improvement in JDK 7 (read more)
  • Java - How to convert ArrayList to Set? (read more)
  • How to sort an ArrayList in reverse order in Java? (solution)
  • How to remove duplicate elements from ArrayList in Java? (solution)
  • How to clone an ArrayList in Java? (solution)
  • How do you convert a Map to List in Java? (solution)
  • Java - Performance comparison of contains() vs binarySearch() (read more)
  • Java - How to initialize an ArrayList with values in Java? (example)
  • Java - The ArrayList Guide (tutorial)
  • The difference between an ArrayList and a Vector in Java? (answer)
  • How to make an ArrayList unmodifiable in Java? (solution)


Anonymous said...

I think these two methods got totally diferent meaning and purpose. Like clear() removes all elements and resets the collection where as removeAll(Collection c) takes a input parameter and removes the elements that we passed to the method. So we use clear() to remove all elements where as removeAll(Collection c) is used when we want a list of elements to be removed.

So the point what i m trying to make is clear() and removeAll() got completely different meaning to be compared. Correct me if i am wrong.

Verhás Péter said...

removeAll() is O(n^2) but not for the reason you mention. Calling 'contains()' is O(n) as an average. When you call 'al.removeAll(al)' on an array list on itself the call to 'contains()' finds the first element in one iteration, and that is O(1). That would make the it still O(n). The quadratic performance comes from the array nature of array list: each time an element is removed the rest of the elements are copied one position down. That is an O(n) operation.

Post a Comment