Saturday, July 13, 2019

Bucket Sort in Java with Example - How Algorithm Works

In recent years, one of the questions I have increasingly seen in programming job interviews is about constant time sorting algorithms like do you know any O(n) sorting algorithm? how do they work? When I first encountered this question, I had no idea whether we can sort in constant time because even some of the fastest sorting algorithms like QuickSort or MergeSort takes O(N log N) time for sorting on their average case. After some research, mainly by going through the classic CLRS book and this DS and Algorithms course by Tim Buchalka and Goran Lochert on Udemy, I come to know that there indeed are some constant time or linear time sorting algorithms like bucket sort, counting sort, and radix sort, which can sort an array in O(n) time but they work with only a special set of input.

The idea of bucket sort is quite simple, you distribute the elements of an array into a number of buckets and then sort those individual buckets by a different sorting algorithm or by recursively applying the bucket sorting algorithm.

You might have remembered, how shopkeepers or bank cashiers used to prepare/sort bundles of notes. They usually have a bucket load of cash with different denominations like 5, 10, 20, 50, or 100. They just put all 5 Rs notes into one place, all Rs 10 notes into another place and so on.

This way all notes are sorted in O(n) time because you don't need to sort individual buckets, they are all same notes there.

Anyway, for bucket sort to work at its super fast speed, there are multiple prerequisites. First, the hash function that is used to partition the elements must be very good and must produce ordered hash: if the i < j then hash(i) < hash(j).

If this is not the case then you cannot concatenate individual buckets in O(n) time. Second, the elements to be sorted must be uniformly distributed. If you still have trouble understanding Bucket sort or want to explore other linear time sorting algorithms, I also suggest you go through the Data Structures and Algorithms: Deep Dive Using Java course on Udemy. It's fantastic and I suggest every Java programmer go through this to improve their data structure and Algorithmic skills.




Bucket Sort vs Counting Sort

If you keep these prerequisites aside, bucket sort is actually very good considering that counting sort is reasonably equal to its upper bound and counting sort is also super fast.

The particular distinction for bucket sort is that it uses a hash function to partition the keys of the input array, so that multiple keys may hash to the same bucket.  Hence each bucket must effectively be a growable list; similar to Radix sort, another O(n) sorting algorithms.

Many programmers get confused between Counting Sort and Bucket sort as they work quite similar, but the most important difference between them is that Bucket sort uses a hash function to distribute keys while Counting sort creates a bucket for each key.

In our example, I have used JDK's Collections.sort() method to sort each bucket. This is to indicate that the bucket sort algorithm does not specify which sorting algorithm should be used to sort individual buckets.

A programmer may choose to recursively use bucket sort on each bucket until the input array is sorted, similar to radix sort. Anyway, whichever sorting method you use to sort individual buckets, the time complexity of bucket sort will still tend towards O(n).

I think there are perhaps greater similarities between radix sort and bucket sort than there are between counting sort and bucket sort. You can also read Introduction to Algorithm by Thomas H. Cormen to learn more about the difference between these two constant time sorting algorithms.

How to implement Bucket Sort in Java





Bucket Sort - FAQ

Now, let's see some of the frequently asked questions about Bucket sort in programming job interviews.

1. Is the bucket sort, a stable algorithm?
Bucket sort is not a stable sorting algorithm. If you remember,  A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input array to be sorted. Some sorting algorithms are stable by nature like Insertion sort, Merge Sort, or Bubble Sort, etc. In the case of Bucket sort, input sorts themselves by moving into a bucket and their order are not guaranteed to be retained.


2. Is the bucket sort in-place algorithm?
Bucket sort is also not an in-place sorting algorithm because you create buckets which can be array, linked list, or hashtable and they take additional spaces. In the worst of the good cases (sequential values, but no repetition) the additional space needed is as big as the original array.


3. What is the time complexity of Bucket sort?
The time complexity of bucket sort in the O(n) in the best and average case and O(n^2) in the worst case. The creation of bucket will take O(1) time and assume moving elements into bucket will take O(1) time e.g. in the case of hashtable and linked list.

The main step is to sort elements on individual elements. This step also takes O(n) time on average if all numbers are uniformly distributed (please refer CLRS book for more details). The last step to concatenate all bucket will also take O(n) time as there will be n items in all buckets


4. What is the time complexity of Bucket sort in the worst case?
The time complexity of bucket sort is O(n^2) in the worst case i.e. when all elements at allocated to the same bucket. Since individual buckets are sorted using another algorithm, if only a single bucket needs to be sorted, bucket sort will take on the complexity of the inner sorting algorithm. This is why bucket sort is only useful when the input is uniformly distributed in a range. This way they will not end up in the same bucket.


5. What is the space complexity of Bucket sort?
The space complexity of the bucket sort algorithm is O(n) because even in the worst of the good cases (sequential values, but no repetition) the additional space needed is as big as the original array. Please go through Algorithms and Data Structures - Part 1 and 2 courses on Pluralsight for more details on complexity analysis of various sorting algorithms in best, average, and worst case.


It's an excellent course but you would need a Plurlasight membership to access this course, which cost around $29 per month or $299 per year. This may seem a bit expensive at first but its well worth because it provides access to more than 5000 online courses on Pluralsight which you can use to learn any latest technology.

I am a Pluralsight member and I also suggest you join it if you can invest that much money in a year for your learning.  Anyway, even if you don't have Plurlasight membership, you can still access this course for free by signing up for the 10-day free trial which provides 200 minutes of access to all of their courses.



Java Program to implement Bucket Sort in Java

Problem Statement:
Given an unordered list of integers, rearrange them in the natural order.

Sample Input: {8,5,3,1,9,6,0,7,4,2,5}
Sample Output: {0,1,2,3,4,5,6,7,8,9,5}


Here is our complete Java program to implement bucket sort in Java. This program sorts an integer array using bucket sort algorithm.


Program to implement Bucket sort in Java
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;

/*
 * Java Program sort an integer array using radix sort algorithm.
 * input: [80, 50, 30, 10, 90, 60, 0, 70, 40, 20, 50]
 * output: [0, 10, 20, 30, 40, 50, 50, 60, 70, 80, 90]
 * 
 * Time Complexity of Solution:
 *     Best Case O(n); Average Case O(n); Worst Case O(n^2).
 * 
 */

public class BuckeSort {

  public static void main(String[] args) {

    System.out.println("Bucket sort in Java");
    int[] input = { 80, 50, 30, 10, 90, 60, 0, 70, 40, 20, 50 };

    System.out.println("integer array before sorting");
    System.out.println(Arrays.toString(input));

    // sorting array using radix Sort Algorithm
    bucketSort(input);

    System.out.println("integer array after sorting using bucket sort algorithm");
    System.out.println(Arrays.toString(input));

  }

  /**
   * 
   * @param input
   */
  public static void bucketSort(int[] input) {
    // get hash codes
    final int[] code = hash(input);
    
    // create and initialize buckets to ArrayList: O(n)
    List[] buckets = new List[code[1]];
    for (int i = 0; i < code[1]; i++) {
      buckets[i] = new ArrayList();
    }
    
    // distribute data into buckets: O(n)
    for (int i : input) {
      buckets[hash(i, code)].add(i);
    }
    
    // sort each bucket O(n)
    for (List bucket : buckets) {
      Collections.sort(bucket);
    }
    
    int ndx = 0;
    // merge the buckets: O(n)
    for (int b = 0; b < buckets.length; b++) {
      for (int v : buckets[b]) {
        input[ndx++] = v;
      }
    }
  }

  /**
   * 
   * @param input
   * @return an array containing hash of input
   */
  private static int[] hash(int[] input) {
    int m = input[0];
    for (int i = 1; i < input.length; i++) {
      if (m < input[i]) {
        m = input[i];
      }
    }
    return new int[] { m, (int) Math.sqrt(input.length) };
  }

  /**
   * 
   * @param i
   * @param code
   * @return
   */
  private static int hash(int i, int[] code) {
    return (int) ((double) i / code[0] * (code[1] - 1));
  }

}


Output
Bucket sort in Java
integer array before sorting
[80, 50, 30, 10, 90, 60, 0, 70, 40, 20, 50]
integer array after sorting using bucket sort algorithm
[0, 10, 20, 30, 40, 50, 50, 60, 70, 80, 90]



Bucket Sort - Important points to remember

Here are some important points about bucket sort you should remember, this is useful from both interview and understanding point of view and Interviewer expects that you know about them when you say that you know bucket sort.

1) Bucket Sort is also known as bin sort because you create bins or buckets to sort inputs.

2) Bucket sort is only useful when the input is uniformly distributed over a range like coins, numbers 1 to 100 etc.

3) You can use a linked list or array as a bucket. The choice of data structure will affect the insertion time e.g. if you use linked list then adding on the head could take O(1) time. You can also use hash tables as buckets.

4) The bucket sort is one of the rare O(n) sorting algorithm i.e. time complexity of Bucket sort is the liner in best and average case and not NLogN like Quicksort or Mergesort.

5) Bucket sort is not a stable sorting algorithm because in a stable algorithm if two input is same they retain their place in sorted order and in the bucket it depends upon how you sort the individual bucket. Though, bucket sort can be made stable, known as radix sort, which we'll see in future articles.

6) You can sort the elements inside individual buckets either by recursively calling the bucket sort or using a separate sorting algorithm like insertion sort, bubble sort, or quicksort.

7) Is bucket sort an in-place sorting algorithm? No, it's not an in-place sorting algorithm. The whole idea is that input sorts themselves as they are moved to the buckets. In the worst of the good cases (sequential values, but no repetition) the additional space needed is as big as the original array.

8) The worst case complexity of bucket sort, when all elements will end up in the same bucket is O(n^2) because then it has to be sorted by a different sorting algorithm.

9) The space complexity of bucket sort is O(n) because even to sort sequential values, without repetition, you need an array as big as the original array.


That's all about bucket sort in Java. It's one of the interesting sorting algorithms which gives you O(n) performance in the best and average case. Bucket sort should only be used when the input is uniformly distributed over a range e.g. numbers up to 1 to 1000.

You should also remember that Bucket sort is not a stable algorithm hence it's not guaranteed that equal keys on input array will retain their places.

It is also not an in-place algorithm, as it will require additional space as big as the original array in the worst case. You can also refer following resources for more details on bucket sort and other O(n) sorting algorithms like Counting sort, Radix sort etc.


Further Learning
Data Structures and Algorithms: Deep Dive Using Java
Algorithms and Data Structures - Part 1 and 2
Data Structures in Java: An Interview Refresher


Related Data Structure and Algorithm Interview Questions from Javarevisited Blog
  • Top 15 Data Structure and Algorithm Interview Questions (see here)
  • Top 20 String coding interview questions (see here)
  • 133 core Java interview questions of the last 5 years (see here)
  • Top 30 Array Coding Interview Questions with Answers (see here)
  • Top 30 linked list coding interview questions (see here)
  • Top 50 Java Programs from Coding Interviews (see here)
  • Top 5 books on Programming/Coding Interviews (list)
  • 50+ Data Structure and Algorithms Problems from Interviews (questions)
  • Iterative PreOrder traversal in a binary tree (solution)
  • How to count the number of leaf nodes in a given binary tree in Java? (solution)
  • Recursive InOrder traversal Algorithm (solution)
  • 10 Free Data Structure and Algorithm Courses for Programmers (courses)
  • 100+ Data Structure Coding Problems from Interviews (questions)

Thanks for reading this article so far. If you like this article then please share with your friends and colleagues. If you have any question or doubt then please let us know and I'll try to find an answer for you.

P. S. - If you are looking for some Free Algorithms courses to improve your understanding of Data Structure and Algorithms, then you should also check this list of free Algorithms courses to start your journey towards becoming a better programmer.

5 comments :

financial engineer said...

nice.
for compilation,
change: List[] buckets
to: List<Integer>[] buckets

Javin Paul said...

@financial engineer, yes, you are correct, the Integer along with angle bracket is lost in formatting.

Shaffy Singla said...

or you can change this peace of code
for (int v : buckets[b]) {
input[ndx++] = v;

}
to
for(Object v : buckets[b])
input[ndx++] = (Integer) v;

as there is a type mismatch error

gurnoor singh said...

Thank you for the detailed explanation.
Can you please explain the hash functions? (or point me in the right direction to read the basics). More specifically:
- how do we know the function "hash(int i, int[] code)" satisfies the condition "hash(i) > hash(j) if i>j"
- In the first hash function, how do we know that the number of buckets we need is sqrt(input.length)?

vinay said...

can you please answer gurnoor queries . i have the same doubt . also for all numbers hash is getting calculated as 0. so all numbers are going to first bucket.

Post a Comment