Is it possible to implement a binary heap that is both a max and a min heap? - c++

I'm trying to implement a binary heap (priority queue) that has the capabilities of both a min heap and a max heap. It needs to have an insert(value), extractMin(), and an extractMax() method. The extract methods remove the value from the heap and return the value.
I was originally using two arrays, called minHeap and maxHeap, one to store the data in a min heap structure, and the other to store the same data in a max heap structure. So when I call extractMin(), it removes and returns the value from minHeap. Then I have to remove that value from maxHeap as well (and vice-versa if I called extractMax()) in order to keep the data set identical in both heaps. And because of the heap-order property, it's guaranteed that I'll find that value in the leaves of the other heap. Searching for that value in the other heap results in a time complexity of O(n) or more precisely, O(n/2) since I'll only be searching the leaves. Not to mention, the percolatingDown() and percolatingUp() methods to restore the heaps after removing values is already O(log n); so in total, the extract methods would be O(n). The problem is, I need the extract methods to be O(log n).
Is there a better way to go about this?
I also thought of this idea but wanted to know what you all think first.
I just finished coding a "median heap" by placing the smaller half of the data in the max heap and the larger half in the min heap. With that structure, I'm able to easily retrieve the median of a given set of values. And I was thinking of using a similar structure of placing the smaller half of the data in the min heap and the larger half in the max heap and using the mean (rather than the median) of all the values to be the deciding factor of whether to place the value in the max or min heap when calling insert(value). I think this might work as the extract methods would stay O(log n).

The simple way is to just use a binary search tree, as M. Shaw recommends.
If you're required to build this on top of binary heaps, then in each heap, alongside each element, store the element's position in the other heap. Every time you move an element in one heap, you can go straight to its position in the other heap and update it. When you perform a delete-min or delete-max, no expensive linear scan in the other heap is required.
For example, if you store std::pairs with first as the element value and second as the position in the other heap, swapping two elements in the min-heap while updating their counterparts in the max-heap might look like this:
swap(minheap[i], minheap[j]);
maxheap[minheap[i].second].second = i;
maxheap[minheap[j].second].second = j;

You can create a hash table for the heap elements, which is shared by two heaps. The table is indexed by the value of the heap element. The value of the hashed bucket can be a struct consisting of the array index in minHeap and maxHeap respectively.
The benefit of this approach is that it is non-intrusive, meaning that the structure of the heap elements remains the same. And you don't have to create heaps side-by-side. You can create one after the other with the usual heap creation precedure.
E.g.,
struct tIndex
{
// Array index of the element in two heaps respectively
size_t minIndex;
size_t maxIndex;
};
std::unordered_map<int, tIndex> m;
Pay attention that any change to the heap may change the underlying array index of existing elements. So when you add/remove an element, or swap two elements, you may need to update its array index in the hash table accordingly.

You're close. The trick is to use another level of indirection. Keep the keys in an array K[i] and store only indices i in the heaps. Also keep two reverse maps: one for the max heap and one for the min. A reverse map is an array of integers R such that R[i] is the location in the min (or max) heap of the index i for key K[i]. In other words, if M[j] is the min (or max) heap, then R[M[j]] = j; Now whenever you do a sifting operation to move elements around in a heap, you must update the respective reverse map at the same time. In fact it works just like the relation above. At every step where you change a heap element M[j] = z, also update the reverse map R[z] = j; This increases run time by only a small constant factor. Now to delete K[i] from the heap, you can find it in constant time: It's at M[R[i]]. Sift it up to the root and remove it.
I know this works (finding a heap object to delete in constant time) because I've implemented it as part of a bigger algorithm. Check out https://github.com/gene-ressler/lulu/blob/master/ext/lulu/pq.c . The larger algorithm is for map marker merging: https://github.com/gene-ressler/lulu/wiki

http://www.geeksforgeeks.org/a-data-structure-question/
Min-Max heap I would say is the answer as pointed by "user2357112" if the most frequent operation is findMin and findMax. BST might be an overkill if we dont really want a completely ordered data structure , the above is a partial ordered data structured. Refer the link posted above.

Related

Best data structure for finding maximum in a 2d matrix with update queries

I have a 2d matrix of doubles. My task is to find the maximum element of the matrix at any point.
Queries will be of 2 types:
Update query : In this query, 2n - 1 elements will be updated, i.e. all elements of row i and column i will be updated. (by updates I mean, change the element, it can be anything, increment or decrement)
Maximum query : Return maximum element in the 2d array.
I came up with a solution by using binary heaps. My idea is to keep a maxheap of n^2 elements implemented using an array, and maintain another array of size n^2 to keep the indices of heap elements. So (i,j)th element in the matrix which is nothing but (i*n + j)th element in the flat array will store the index corresponding to its position in the heap.
So this way, 2n-1 updates will be handled in (2n-1)log(n^2) time. And maximum query can be answered in O(1) time.
I wasn't able to use STL implementation because I have to keep track of heap elements, i.e. upon update query I should know which heap elements are supposed to be updated. STL also doesn't support changing keys.
How do I improve the update query time? Is there some other data structure which can handle these operations faster?
I'd use a STL vector of indices i*n+j. Keep sorted this n^2 sized array using your own compare function. Sorting after update is n^2 O(log n^2). Querying the maximun is asking for the first element in the vector.
Edit
If you're interested only in maximum value, you can keep its position (i,j) cached. When the matrix is updated it will need to be sorted again only if this cached position change.

What is the fastest data structure to search and update list of integer values?

I have to maintain a list of unordered integers , where number of integers are unknown. It may increase or decrease over the time. I need to update this list of integers frequently. I have tried using vector . But it is really slow . Array appears to be faster , but since the length of list is not fixed, it takes significant amount of time to resize it . Please suggest any other option .
Use a hash table, if order of the values in unimportant. Time is O(1). I'm pretty sure you'll find an implementation in the standard template libraries.
Failing that, a splay tree is extremely fast, especially if you want to keep the list ordered: amortized cost of O(ln n) per operation, with a very low constant factor. I think C++ stdlib map is something like this.
Know thy data structures.
If you are interested in Dynamic increments of Arrays size you can do this .
current =0;
x = (int**)malloc(temp * sizeof(int*));
x[current]=(int*)malloc(RequiredLength * sizeof(int));
So add elements to array and when elements are filled in x[current]
You can add more space for elements by doing
x[++current]=(int*)malloc(RequiredLength * sizeof(int));
Doing this you can accommodate for RequiredLength more elements .
You can repeat this upto 1024 times which means 1024*RequiredLength elements can be
accommodated , here it gives you chance to increase size of array whenever you want it .
You can always access the n th element by X[ n / 1024 ][ n % 1024] ;
Considering your comments, it looks like it is std::set or std::unordered_set fits your needs better than std::vector.
If sequential data structures fails to meet requirements, you could try looking at trees (binary, AVL, m-way, red-black ect ...). I would suggest you try to implement AVL tree since it yields a balanced or near balanced binary search tree which would optimize your operation. For more on AVL tree: http://en.wikipedia.org/wiki/AVL_tree
well,deque has no resize cost,but if it's unordered,it's search time is linear ,and its delete and insert operation time in the middle of its self is even worth than vector.
if you don't need search by the value of the number,hashmap or map may be your choice .No resize cost.,then you set the key of the map to number's index,and the value to the number's value.the search and insert operation is better than linear.
std::list is definitely created for such problems, adding and deleting elements in list do not necessitate memory re-allocations like in vector. However, due to the noncontagious memory allocation of the list, searching elements may prove to be a painful experience ofcourse but if you do not search its entries frequently, it can be used.

Data structure for O(log N) find and update, considering small L1 cache

I'm currently working on an embedded device project where I'm running into performance problems. Profiling has located an O(N) operation that I'd like to eliminate.
I basically have two arrays int A[N] and short B[N]. Entries in A are unique and ordered by external constraints. The most common operation is to check if a particular value a appears in A[]. Less frequently, but still common is a change to an element of A[]. The new value is unrelated to the previous value.
Since the most common operation is the find, that's where B[] comes in. It's a sorted array of indices in A[], such that A[B[i]] < A[B[j]] if and only if i<j. That means that I can find values in A using a binary search.
Of course, when I update A[k], I have to find k in B and move it to a new position, to maintain the search order. Since I know the old and new values of A[k], that's just a memmove() of a subset of B[] between the old and new position of k. This is the O(N) operation that I need to fix; since the old and new values of A[k] are essentially random I'm moving on average about N/2 N/3 elements.
I looked into std::make_heap using [](int i, int j) { return A[i] < A[j]; } as the predicate. In that case I can easily make B[0] point to the smallest element of A, and updating B is now a cheap O(log N) rebalancing operation. However, I generally don't need the smallest value of A, I need to find if any given value is present. And that's now a O(N log N) search in B. (Half of my N elements are at heap depth log N, a quarter at (log N)-1, etc), which is no improvement over a dumb O(N) search directly in A.
Considering that std::set has O(log N) insert and find, I'd say that it should be possible to get the same performance here for update and find. But how do I do that? Do I need another order for B? A different type?
B is currently a short [N] because A and B together are about the size of my CPU cache, and my main memory is a lot slower. Going from 6*N to 8*N bytes would not be nice, but still acceptable if my find and update go to O(log N) both.
If the only operations are (1) check if value 'a' belongs to A and (2) update values in A, why don't you use a hash table in place of the sorted array B? Especially if A does not grow or shrink in size and the values only change this would be a much better solution. A hash table does not require significantly more memory than an array. (Alternatively, B should be changed not to a heap but to a binary search tree, that could be self-balancing, e.g. a splay tree or a red-black tree. However, trees require extra memory because of the left- and right-pointers.)
A practical solution that grows memory use from 6N to 8N bytes is to aim for exactly 50% filled hash table, i.e. use a hash table that consists of an array of 2N shorts. I would recommend implementing the Cuckoo Hashing mechanism (see http://en.wikipedia.org/wiki/Cuckoo_hashing). Read the article further and you find that you can get load factors above 50% (i.e. push memory consumption down from 8N towards, say, 7N) by using more hash functions. "Using just three hash functions increases the load to 91%."
From Wikipedia:
A study by Zukowski et al. has shown that cuckoo hashing is much
faster than chained hashing for small, cache-resident hash tables on
modern processors. Kenneth Ross has shown bucketized versions of
cuckoo hashing (variants that use buckets that contain more than one
key) to be faster than conventional methods also for large hash
tables, when space utilization is high. The performance of the
bucketized cuckoo hash table was investigated further by Askitis,
with its performance compared against alternative hashing schemes.
std::set usually provides the O(log(n)) insert and delete by using a binary search tree. This unfortunately uses 3*N space for most pointer based implementations. Assuming word sized data, 1 for data, 2 for pointers to left and right child on each node.
If you have some constant N and can guarantee that ceil(log2(N)) is less than half the word size you can use a fixed length array of tree nodes each 2*N size. Use 1 for data, 1 for the indexes of the two child nodes, stored as the upper and lower half of the word. Whether this would let you use a self balancing binary search tree of some manner depends on your N and word size. For a 16 bit system you only get N = 256, but for 32 its 65k.
Since you have limited N, can't you use std::set<short, cmp, pool_allocator> B with Boost's pool_allocator?

Inserting and removing elements from an array while maintaining the array to be sorted

I'm wondering whether somebody can help me with this problem. I'm using C/C++ to program and I need to do the following:
I am given a sorted array P (biggest first) containing floats. It usually has a very big size.. sometimes holding correlation values from 10 megapixel images. I need to iterate through the array until it is empty. Within the loop there is additional processing taking place.
The gist of the problem is that at the start of the loop, I need to remove the elements with the maximum value from the array, check certain conditions and if they hold, then I need to reinsert the elements into the array but after decreasing their value. However, I want the array to be efficiently sorted after the reinsertion.
Can somebody point me towards a way of doing this? I have tried the naive approach of re-sorting everytime I insert, but that seems really wasteful.
Change the data structure. Repeatedly accessing the largest element, and then quickly inserting new values, in such a way that you can still efficiently repeatedly access the largest element, is a job for a heap, which may be fairly easily created from your array in C++.
BTW, please don't talk about "C/C++". There is no such language. You're instead making vague implications about the style in which you're writing things, most of which will strike experienced programmers as bad.
I would look into the http://www.cplusplus.com/reference/stl/priority_queue/, as it is designed to do just this.
You could use a binary search to determine where to insert the changed value after you removed it from the array. Note that inserting or removing at the front or somewhere in the middle is not very efficient either, as it requires moving all items with a higher index up or down, respectively.
ISTM that you should rather put your changed items into a new array and sort that once, after you finished iterating over the original array. If memory is a problem, and you really have to do things in place, change the values in place and only sort once.
I can't think of a better way to do this. Keeping the array sorted all the time seems rather inefficient.
Since the array is already sorted, you can use a binary search to find the location to insert the updated value. C++ provides std::lower_bound or std::upper_bound for this purpose, C provides bsearch. Just shift all the existing values up by one location in the array and store the new value at the newly cleared spot.
Here's some pseudocode that may work decently if you aren't decreasing the removed values by much:
For example, say you're processing the element with the maximum value in the array, and say the array is sorted in descending order (largest first).
Remove array[0].
Let newVal = array[0] - adjustment, where adjustment is the amount you're decreasing the value by.
Now loop through, adjusting only the values you need to:
Pseudocode:
i = 0
while (newVal < array[i]) {
array[i] = array[i+1];
i++;
}
array[i] = newVal;
swap(array[i], array[i+1]);
Again, if you're not decreasing the removed values by a large amount (relative to the values in the array), this could work fairly efficiently.
Of course, the generally better alternative is to use a more appropriate data structure, such as a heap.
Maybe using another temporary array could help.
This way you can first sort the "changed" elements alone.
And after that just do a regular merge O(n) for the two sub-arrays to the temp array, and copy everything back to the original array.

Fast Algorithm for finding largest values in 2d array

I have a 2D array (an image actually) that is size N x N. I need to find the indices of the M largest values in the array ( M << N x N) . Linearized index or the 2D coords are both fine. The array must remain intact (since it's an image). I can make a copy for scratch, but sorting the array will bugger up the indices.
I'm fine with doing a full pass over the array (ie. O(N^2) is fine). Anyone have a good algorithm for doing this as efficiently as possible?
Selection is sorting's austere sister (repeat this ten times in a row). Selection algorithms are less known than sort algorithms, but nonetheless useful.
You can't do better than O(N^2) (in N) here, since nothing indicates that you must not visit each element of the array.
A good approach is to keep a priority queue made of the M largest elements. This makes something O(N x N x log M).
You traverse the array, enqueuing pairs (elements, index) as you go. The queue keeps its elements sorted by first component.
Once the queue has M elements, instead of enqueuing you now:
Query the min element of the queue
If the current element of the array is greater, insert it into the queue and discard the min element of the queue
Else do nothing.
If M is bigger, sorting the array is preferable.
NOTE: #Andy Finkenstadt makes a good point (in the comments to your question) : you definitely should traverse your array in the "direction of data locality": make sure that you read memory contiguously.
Also, this is trivially parallelizable, the only non parallelizable part is when you merge the queues when joining the sub processes.
You could copy the array into a single dimensioned array of tuples (value, original X, original Y ) and build a basic heap out of it in (O(n) time), provided you implement the heap as an array.
You could then retrieve the M largest tuples in O(M lg n) time and reference their original x and y from the tuple.
If you are going to make a copy of the input array in order to do a sort, that's way worse than just walking linearly through the whole thing to pick out numbers.
So the question is how big is your M? If it is small, you can store results (i.e. structs with 2D indexes and values) in a simple array or a vector. That'll minimize heap operations but when you find a larger value than what's in your vector, you'll have to shift things around.
If you expect M to get really large, then you may need a better data structure like a binary tree (std::set) or use sorted std::deque. std::set will reduce number of times elements must be shifted in memory, while if you use std::deque, it'll do some shifting, but it'll reduce number of times you have to go to the heap significantly, which may give you better performance.
Your problem doesn't use the 2 dimensions in any interesting way, it is easier to consiger the equivalent problem in a 2d array.
There are 2 main ways to solve this problem:
Mantain a set of M largest elements, and iterate through the array. (Using a heap allows you to do this efficiently).
This is simple and is probably better in your case (M << N)
Use selection, (the following algorithm is an adaptation of quicksort):
Create an auxiliary array, containing the indexes [1..N].
Choose an arbritary index (and corresponding value), and partition the index array so that indexes corresponding to elements less go to the left, and bigger elements go to the right.
Repeat the process, binary search style until you narrow down the M largest elements.
This is good for cases with large M. If you want to avoid worst case issues (the same quicksort has) then look at more advanced algorithms, (like median of medians selection)
How many times do you search for the largest value from the array?
If you only search 1 time, then just scan through it keeping the M largest ones.
If you do it many times, just insert the values into a sorted list (probably best implemented as a balanced tree).