Monday, July 26, 2021

Binary Search vs Contains Performance in Java List - Example

There are two ways to search an element in a List class, by using contains() method or by using Collections.binarySearch() method. There are two versions of binarySearch() method, one which takes a List and Comparator and other which takes a List and Comparable. This method searches the specified list for the specified object using the binary search algorithm. The list must be sorted into ascending order according to the natural ordering of its elements (as by the sort(List) method) prior to making this call. If List is not sorted, then results are undefined. If the List contains multiple elements equal to the specified object, there is no guarantee which one will be returned. This method runs in log(n) time for a "random access" list (which provides near-constant-time positional access).

If the specified list does not implement the RandomAccess interface and is large, this method will do an iterator-based binary search that performs O(n) link traversals and O(log n) element comparisons. In the end this method returns the index of the search key, if it is contained in the list; otherwise, (-(insertion point) - 1).

The insertion point is defined as the point at which the key would be inserted into the list: the index of the first element greater than the key, or list.size() if all elements in the list are less than the specified key.

This means that return value will be >= 0 if and only if the key is found. Since common implementation of List interface e.g. ArrayList, Vector, CopyOnWriteArrayList and Stack implements RandomAccess interface, they can be used for performing binary search, but there are other implementations like LinkedList, which doesn't implement java.util.RandomAccess, hence not suitable for binary search operation.

Since binary search can only be performed in sorted list, you also need to sort your collection before doing search, which may potentially impact performance, especially if your List is big and not in sorted order already.





Java Code for contains() Vs binarySearch()

Binary search vs contains in Java ListHere is our program to find objects using binary search in Java List. We have a list of Integer with 1M records and uses both contains() and binarySearch() method to search an element.


import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/** 
  * Java program to perform binary search in Java collection e.g List, Set
  * @author Javin Paul
  */
public class BinarySearchTest {
    public static final Logger logger = LoggerFactory.getLogger(BinarySearchTest.class);
   
    public static void main(String args[]) {       
  
        //creating List
        List<Integer> numbers = new ArrayList<Integer>(1000000); //List of 1M records
       
        //initializing List
        for(int i =0; i<numbers.size(); i++){
            numbers.add(new Integer(i));
        }
      
        //performing contains search
        long startTime = System.nanoTime();
        boolean isExist = numbers.contains(new Integer(1000000));
        long totalTime = System.nanoTime() - startTime;
      
       
        logger.info("Time to search 1Mth Record using contains() is {} 
                  nano seconds", totalTime);
       
        //performing binary search
        startTime = System.nanoTime();
        Collections.sort(numbers);  // List needs to be sorted for Binary Search
        Integer number = Collections.binarySearch(numbers, new Integer(1000000));
        totalTime = System.nanoTime() - startTime;
       
        logger.info("Time to search 1Mth Record using binary search is {} 
                 nano seconds", totalTime);
    }
  
}

Ouput:
2013-06-04 23:23:17,834 0    [main] INFO  test.BinarySearchTest  
- Time to search 1Mth Record using contains() is 51404 nano seconds
2013-06-04 23:23:17,849 15   [main] INFO  test.BinarySearchTest  
- Time to search 1Mth Record using binary search is 554261 nano seconds

From the output, you can see that contains() method is almost 10 times faster than the binary search, which means it makes sense to use contains() for searching objects in List, especially for those which implement the RandomAccess interface like ArrayList.

9 comments:

  1. I would do the sorting prior start time to be more justified.
    But yes "contains ' is faster then "binarySearch" for ArrayList,
    Below my output :
    Time to search 1Mth Record using contains() is {} nano seconds 84585
    Time to search 1Mth Record using binary search is {} nano seconds 1283703

    ReplyDelete
  2. for searching in a list., we can indexOf() also, This gives better timing than contains()..

    correct me if I am wrong...

    ReplyDelete
  3. @Anonymous: if you check the source code of ArrayList, contains method internally call indexOf method only. So time for both of them will same. I think. (other than a extra method call.)

    This is a good tutorial.
    Additional info: time complexity of contains and binarySerach depends on the given list. If it is already ordered list, i will go for BinarySearch(which will take O(logn) time). If the input list is not ordered, then do contains, which will do a liner search(take time O(n)).

    ReplyDelete
  4. Well, I think that this is not fare relatively to binarySearch() as you measure it's time in conjunction with sorting invocation which takes additional O(NlogN) time. It would be more convenient if you test these methods on a sorted list or take a timestamp not before sorting a list but just before invoking of binarySearch(). You will get antipodal results.

    ReplyDelete
  5. I don't agree "contains() method is almost 10 times faster than binary search" infact the binary search is always many times faster than contains()
    Reason : contains uses traditional linear search algorithm so the complexity is O(n) whereas the Binary search has complexity of O(logn).

    Your code is giving incorrect result because the list (numbers) you are filling is not actually filled.
    for(int i =0; i numbers = new ArrayList(capacity);
    // List of 1M records //initializing List
    for (int i = 0; i < capacity; i++) {
    numbers.add(new Integer(i));
    }

    long startTime = System.nanoTime();
    Collections.sort(numbers);
    long totalTime = System.nanoTime() - startTime;
    logger.info("Time to sort 1 Million Records is {} nano seconds", totalTime);

    // performing contains search
    startTime = System.nanoTime();
    boolean isExist = numbers.contains(new Integer(1000000));

    totalTime = System.nanoTime() - startTime;
    logger.info("Time to search 1Mth Record using contains() is {} nano seconds", totalTime);
    // performing binary search
    startTime = System.nanoTime();
    // List needs to be sorted for Binary Search
    Integer number = Collections.binarySearch(numbers, new Integer(1000000));
    totalTime = System.nanoTime() - startTime;
    logger.info("Time to search 1Mth Record using binary search is {} nano seconds", totalTime);
    }
    }

    The output of the above code is :
    [main] INFO BinarySearchTest - Time to sort 1 Million Records is 19167309 nano seconds
    [main] INFO BinarySearchTest - Time to search 1Mth Record using contains() is 4889023 nano seconds
    [main] INFO BinarySearchTest - Time to search 1Mth Record using binary search is 38720 nano seconds

    So you can clearly see the BinarySearch is more than 100 times faster than linear search in this case. Actually it depends upon the size of Array you are searching on.

    ReplyDelete
  6. Unfortunately this is a misleading article. You can not compare contains and binarySearch with just few records. Test both of them in 100k or million records then only you will see the actual difference.

    ReplyDelete
  7. Smit Jain is correct, there is no record in the list actually because list size is zero, therefore it never populates the list.
    But here is catch, record time only for Collections.binarySearch(numbers, new Integer(1000000)); then only you will see the actual difference. The sorting is taking time and therefore you see different result from binary search.
    Binary search is good only when you already have sorted list else it will take almost same amount of time what linear search is taking.

    ReplyDelete
  8. That is not correct at all, when the list is sorted binary search is much faster than contain
    when you need to sort it first, then contain method is faster.

    ReplyDelete
  9. Just check below //initializing List
    This idiot is not even adding any value to the list.

    And moreover binary search is preferred where you got to search many times for something in same list, and that's why you first sort it and search many times and then overall you will find difference here.

    ReplyDelete