如何实现中位数堆
就像最大堆和最小堆一样,我想实现一个中位数堆,用来跟踪一组整数的中位数。这个功能应该有三个主要的操作:
insert(int) // should take O(logN)
int median() // will be the topmost element of the heap. O(1)
int delmedian() // should take O(logN)
我想用数组来实现这个堆,其中数组索引k的孩子节点存储在数组索引2*k和2*k + 1的位置。为了方便起见,数组的元素从索引1开始填充。现在我已经有了一些基础:中位数堆会有两个整数来记录当前插入的数字中,有多少个数字大于当前的中位数(gcm)和多少个数字小于当前的中位数(lcm)。
if abs(gcm-lcm) >= 2 and gcm > lcm we need to swap a[1] with one of its children.
The child chosen should be greater than a[1]. If both are greater,
choose the smaller of two.
对于另一种情况也是如此。我想不出一个算法来处理元素的下沉和上浮。我觉得应该考虑数字与中位数的接近程度,所以可能需要类似这样的逻辑:
private void swim(int k) {
while (k > 1 && absless(k, k/2)) {
exch(k, k/2);
k = k/2;
}
}
不过,我还没有想出完整的解决方案。
6 个回答
一个完全平衡的二叉搜索树(BST)难道不是一个中位数堆吗?确实,即使是红黑树也不总是完全平衡,但对于你的需求来说,它可能已经足够接近了。而且,它的性能保证是对数级别的,也就是log(n)!
AVL树比红黑树更平衡,所以它们更接近真正的中位数堆。
这里有一个用Java实现的中位数堆,开发过程中参考了上面comocomocomocomo的解释。
import java.util.Arrays;
import java.util.Comparator;
import java.util.PriorityQueue;
import java.util.Scanner;
/**
*
* @author BatmanLost
*/
public class MedianHeap {
//stores all the numbers less than the current median in a maxheap, i.e median is the maximum, at the root
private PriorityQueue<Integer> maxheap;
//stores all the numbers greater than the current median in a minheap, i.e median is the minimum, at the root
private PriorityQueue<Integer> minheap;
//comparators for PriorityQueue
private static final maxHeapComparator myMaxHeapComparator = new maxHeapComparator();
private static final minHeapComparator myMinHeapComparator = new minHeapComparator();
/**
* Comparator for the minHeap, smallest number has the highest priority, natural ordering
*/
private static class minHeapComparator implements Comparator<Integer>{
@Override
public int compare(Integer i, Integer j) {
return i>j ? 1 : i==j ? 0 : -1 ;
}
}
/**
* Comparator for the maxHeap, largest number has the highest priority
*/
private static class maxHeapComparator implements Comparator<Integer>{
// opposite to minHeapComparator, invert the return values
@Override
public int compare(Integer i, Integer j) {
return i>j ? -1 : i==j ? 0 : 1 ;
}
}
/**
* Constructor for a MedianHeap, to dynamically generate median.
*/
public MedianHeap(){
// initialize maxheap and minheap with appropriate comparators
maxheap = new PriorityQueue<Integer>(11,myMaxHeapComparator);
minheap = new PriorityQueue<Integer>(11,myMinHeapComparator);
}
/**
* Returns empty if no median i.e, no input
* @return
*/
private boolean isEmpty(){
return maxheap.size() == 0 && minheap.size() == 0 ;
}
/**
* Inserts into MedianHeap to update the median accordingly
* @param n
*/
public void insert(int n){
// initialize if empty
if(isEmpty()){ minheap.add(n);}
else{
//add to the appropriate heap
// if n is less than or equal to current median, add to maxheap
if(Double.compare(n, median()) <= 0){maxheap.add(n);}
// if n is greater than current median, add to min heap
else{minheap.add(n);}
}
// fix the chaos, if any imbalance occurs in the heap sizes
//i.e, absolute difference of sizes is greater than one.
fixChaos();
}
/**
* Re-balances the heap sizes
*/
private void fixChaos(){
//if sizes of heaps differ by 2, then it's a chaos, since median must be the middle element
if( Math.abs( maxheap.size() - minheap.size()) > 1){
//check which one is the culprit and take action by kicking out the root from culprit into victim
if(maxheap.size() > minheap.size()){
minheap.add(maxheap.poll());
}
else{ maxheap.add(minheap.poll());}
}
}
/**
* returns the median of the numbers encountered so far
* @return
*/
public double median(){
//if total size(no. of elements entered) is even, then median iss the average of the 2 middle elements
//i.e, average of the root's of the heaps.
if( maxheap.size() == minheap.size()) {
return ((double)maxheap.peek() + (double)minheap.peek())/2 ;
}
//else median is middle element, i.e, root of the heap with one element more
else if (maxheap.size() > minheap.size()){ return (double)maxheap.peek();}
else{ return (double)minheap.peek();}
}
/**
* String representation of the numbers and median
* @return
*/
public String toString(){
StringBuilder sb = new StringBuilder();
sb.append("\n Median for the numbers : " );
for(int i: maxheap){sb.append(" "+i); }
for(int i: minheap){sb.append(" "+i); }
sb.append(" is " + median()+"\n");
return sb.toString();
}
/**
* Adds all the array elements and returns the median.
* @param array
* @return
*/
public double addArray(int[] array){
for(int i=0; i<array.length ;i++){
insert(array[i]);
}
return median();
}
/**
* Just a test
* @param N
*/
public void test(int N){
int[] array = InputGenerator.randomArray(N);
System.out.println("Input array: \n"+Arrays.toString(array));
addArray(array);
System.out.println("Computed Median is :" + median());
Arrays.sort(array);
System.out.println("Sorted array: \n"+Arrays.toString(array));
if(N%2==0){ System.out.println("Calculated Median is :" + (array[N/2] + array[(N/2)-1])/2.0);}
else{System.out.println("Calculated Median is :" + array[N/2] +"\n");}
}
/**
* Another testing utility
*/
public void printInternal(){
System.out.println("Less than median, max heap:" + maxheap);
System.out.println("Greater than median, min heap:" + minheap);
}
//Inner class to generate input for basic testing
private static class InputGenerator {
public static int[] orderedArray(int N){
int[] array = new int[N];
for(int i=0; i<N; i++){
array[i] = i;
}
return array;
}
public static int[] randomArray(int N){
int[] array = new int[N];
for(int i=0; i<N; i++){
array[i] = (int)(Math.random()*N*N);
}
return array;
}
public static int readInt(String s){
System.out.println(s);
Scanner sc = new Scanner(System.in);
return sc.nextInt();
}
}
public static void main(String[] args){
System.out.println("You got to stop the program MANUALLY!!");
while(true){
MedianHeap testObj = new MedianHeap();
testObj.test(InputGenerator.readInt("Enter size of the array:"));
System.out.println(testObj);
}
}
}
你需要两个堆:一个是最小堆,一个是最大堆。每个堆大约包含一半的数据。在最小堆里的每个元素都大于或等于中位数,而在最大堆里的每个元素都小于或等于中位数。
当最小堆比最大堆多一个元素时,中位数就在最小堆的顶部。而当最大堆比最小堆多一个元素时,中位数就在最大堆的顶部。
如果两个堆的元素数量相同,那么总的元素数量就是偶数。在这种情况下,你需要根据中位数的定义来选择:a) 两个中间元素的平均值;b) 两个中间元素中较大的那个;c) 较小的那个;d) 随机选择其中一个...
每次插入新元素时,要先把新元素和堆顶的元素进行比较,以决定把它放在哪个堆里。如果新元素大于当前的中位数,就放到最小堆里;如果小于当前的中位数,就放到最大堆里。然后你可能需要进行平衡。如果两个堆的大小差距超过一个元素,就从元素多的那个堆中取出最小或最大元素,放到另一个堆里。
为了构建一个中位数堆,我们首先应该用线性时间的算法找到中位数。一旦知道了中位数,就可以根据这个值把元素添加到最小堆和最大堆中。因为中位数会把输入的元素列表分成两半,所以不需要进行平衡。
如果你提取了一个元素,可能需要通过把一个元素从一个堆移动到另一个堆来补偿大小的变化。这样可以确保两个堆始终大小相同,或者只差一个元素。