Wednesday, August 17, 2016

Bucket Sort Java Implementation

Bucket Sort

Bucket sort is mainly useful when input is uniformly distributed over a range. For example, consider the following problem.
Sort a large set of floating point numbers which are in range from 0.0 to 1.0 and are uniformly distributed across the range. How do we sort the numbers efficiently?

A simple way is to apply a comparison based sorting algorithm. The lower bound for Comparison based sorting algorithm (Merge Sort, Heap Sort, Quick-Sort .. etc) is Ω(n Log n), i.e., they cannot do better than nLogn.
Can we sort the array in linear time? Counting sort can not be applied here as we use keys as index in counting sort. Here keys are floating point numbers.

The idea behind the bucket sort:

bucketSort(arr[], n)

1) Create n empty buckets (Or lists).
2) Do following for every array element arr[i].
 a) Insert arr[i] into bucket[n*array[i]]
3) Sort individual buckets using insertion sort.
4) Concatenate all sorted buckets.

So, Bucket sort or bin sort, is a sorting algorithm that works by distributing the elements of an array into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. It is a distribution sort, and is a cousin of radix sort in the most to least significant digit flavor. Bucket sort is a generalization of pigeonhole sort. Bucket sort can be implemented with comparisons and therefore can also be considered a comparison sort algorithm. The computational complexity estimates involve the number of buckets.

Bucket sort works as follows:

Set up an array of initially empty "buckets".
Scatter: Go over the original array, putting each object in its bucket.
Sort each non-empty bucket.
Gather: Visit the buckets in order and put all elements back into the original array.

Optimizations:

A common optimization is to put the unsorted elements of the buckets back in the original array first, then run insertion sort over the complete array; because insertion sort's run-time is based on how far each element is from its final position, the number of comparisons remains relatively small, and the memory hierarchy is better exploited by storing the list contiguously in memory.

Time Complexity: If we assume that insertion in a bucket takes O(1) time then steps 1 and 2 of the above algorithm clearly take O(n) time. The O(1) is easily possible if we use a linked list to represent a bucket (In the following code, C++ vector is used for simplicity). Step 4 also takes O(n) time as there will be n items in all buckets.

Sample Code :

import java.util.Arrays;
/**
 * @author Abhinaw.Tripathi
 *
 */
public class BucketSortApp
{
 public static void sort(int[] a, int maxVal)
 {
     int [] bucket=new int[maxVal+1];

     for (int i=0; i<bucket.length; i++)
     {
        bucket[i]=0;
     }

     for (int i=0; i<a.length; i++)
     {
        bucket[a[i]]++;
     }

     int outPos=0;
     for (int i=0; i<bucket.length; i++)
     {
        for (int j=0; j<bucket[i]; j++)
        {
           a[outPos++]=i;
        }
     }
  }

public static void main(String[] args)
{
 int maxVal=5;
          int [] data= {5,3,0,2,4,1,0,5,2,3,1,4};

     System.out.println("Before: " + Arrays.toString(data));
     sort(data,maxVal);
     System.out.println("After:  " + Arrays.toString(data));
}

}

Result:

Before: [5, 3, 0, 2, 4, 1, 0, 5, 2, 3, 1, 4]
After:  [0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5]


No comments: