Monday, September 15, 2008

Implementing a Heap in Java

A Heap is a data structure storing a collection of objects, allowing you to pick the object of the highest value and insert objects of arbitrary values efficiently.

Simply looking at the highest value takes O(1) time. Removing the highest value - thus making sure that the next highest value will be available for the next call - takes O(log n). Insertion takes O(log n).

Java has an inbuilt collection class - java.util.PriorityQueue that can be used as a Heap. The caveat is that it is the minimum value, not the maximum that can be retrieved efficiently.

So, there are three ways of dealing with the situation.

1. Implement Comparable interface on the object being stored.

For the object type (class) stored in the PriorityQueue, implement the Comparable interface. Comparable interface has one function - [int CompareTo(T t)] that should return -1 if this object is less than t, +1 if this object is greater than t, 0 if the two objects are equal. So what we could do is "flip" this definition and implement a compareTo function that returns -1 if this object is greater than t, +1 if this object is less than t, 0 if they are equal. Thus when the PriorityQueue insert function tries to compare the value of the object being inserted to the current head of the queue, the greater value will be brought to the top.

2. Pass an object derived from Comparator to the PriorityQueue constructor

It is possible when we construct the PriorityQueue to pass it an object derived from Comparator class. This should implement a [int compare T t1, T t2] similar to above.

3. Negate the value being compared inside the object

It is possible to implement the Comparable interface for the PriorityQueue according the the standard definition, but negate the value being stored in the object that is being compared to. For ex:, imagine there is a guest_count field in an Object WebPage, and we have a PriorityQueue of WebPage objects where we should be able to pop the highest visited (guest_count) Web Page quickly. We store the negation of the number of guests in guest_count, so that the usual comparison ends up pushing the highest visited page to the top of the PriorityQueue.

So which method should you use?

#1 is not recommended as it inherently alters the way any collection of the object is sorted. Imagine, somewhere else in the code, you have a vector of WebPage objects, and you call sort() on it. sort() will use the Comparable interface and sort the WebPage objects in descending order, probably not what you intended.

#3 is quite hacky - i consider it clever though, as it involves the minimum amount of work. But it is much more readable to keep the guest_count field positive vs negative.

#2 is my pick as the "reversed comparison" of the object is only used with respect to the PriorityQueue and is thus isolated from the workings of the object in different contexts.

At work I had to use a Heap data structure to iteratively apply an "ad selection" algorithm to the add with the most number of clicks. I ended up using #2 above.

No comments: