What's the diffrence between max-heapify and min-heapify? - heap

Actually I've also heard of the two terms which are-
reheapification upward and reheapification downward. But what is the meaning of max-heapify and min-heapify? Are these two terms(max-heapify and min-heapify) somehow related to reheapification upward and reheapification downward? According to me both reheapification upward and reheapification downward come under max-heapify as well as min-heapify. Please correct me if I'm wrong. Also please tell me the time complexities of all these four terms.

reheapification upward is performed when adding an item to a binary heap. When you add an item, you do the following:
Add the item to the end of the array.
Move it up in the heap to its proper position. This step is reheapification upward. It's also referred to as "bubble up," "sift up," or "swim."
When you remove the root item from a heap, you do the following:
Replace the root item with the last item on the heap.
Move that item down the heap to its proper place. This is reheapification downward. It's also referred to as "bubble down," "sift down," or "sink."
Both operations apply equally to min-heaps and max-heaps.
In a min heap, the function that implements downward reheapfication is often called "min-heapify." In a max heap implementation, that function is often called "max-heapify."
Reheapification upward and reheapfication downward both have complexity O(log n).

Related

Clear mark attribute of a node in extract-min operation of Fibonacci Heap

In the DECREASE-KEY operation of Fibonacci Heap, whenever a node is cut from its parent and added to the root list, its mark attribute is set to FALSE. However, in the EXTRACT-MIN operation, the children of the min-node are added to the root list but their mark attributes aren't set to FALSE. Why is there such inconsistency?
Moreover, in the linking operation where a node is made the child of another node, the mark attribute of the new child is set to FALSE. The EXTRACT-MIN operation performs this linking operation multiple times. But in the amortized analysis of EXTRACT-MIN operation described in the CLRS book, the authors claim that the number of marked nodes doesn't change in EXTRACT-MIN operation. They use m(H) to denote the number of marked nodes both before and after EXTRACT-MIN operation. I am quoting the exact line from the book:
The potential before extracting the minimum node is t(H)+2m(H), and
the potential afterward is at most (D(n)+1)+2m(H).
Here D(n) is the maximum degree of any node in an n-node Fibonacci Heap, t(H) is the number of trees in the Fibonacci Heap and m(H) is the number of marked nodes in the Fibonacci Heap.
Isn't this calculation wrong?
Let's take a step back - why do we need mark bits in a Fibonacci heap in the first place?
The idea behind a Fibonacci heap is to make the DECREASE-KEY operation as fast as possible. To that end, the basic strategy of DECREASE-KEY is to take the element whose key is being decreased and to cut it from its parent if the heap property is violated. Unfortunately, if we do that, then we lose the exponential connection between the order of a tree and the number of nodes in the tree. That's a problem because the COALESCE step of an EXTRACT-MIN operation links trees based on their orders and assumes that each tree's order says something about how many nodes it contains. With that connection broken, all the nice runtime bounds we want go away.
As a compromise, we introduce mark bits. If a node loses a child, it gets marked to indicate "something was lost here." Once a marked node loses a second child, then it gets cut from its parent. Over time, if you do a huge number of cuts in a single tree, eventually the cuts propagate up to root of the tree, decreasing the tree order. That makes the tree behave differently during a COALESCE step.
With that said, the important detail here is that mark bits are only relevant for non-root nodes. If a node is a root of a tree and it loses a child, this immediately changes its order in a way that COALESCE will notice. Therefore, you can essentially ignore the mark bit of any tree root, since it never comes up. It's probably a wise idea to clear a node's mark bit before moving it up to the root list just so you don't have to clear it later, but that's more of an implementation detail than anything else.

Maintaining Heap Property

Brief Background: I am studying the steps for maintaining a heap property when insertion occurs. Here is an interesting problem:
Question: There are two general strategies that could be used to maintain the heap properties:
Make sure that the tree is complete and then fix the ordering or
Make sure the ordering is correct first and then check for completeness.
Which is better (1 or 2)?
Reference: http://www.cs.sfu.ca/CourseCentral/225/johnwill/cmpt225_09heaps.pdf (Heap Insertion - slide 16) written by Dr. John Edgar.
It would be great if you guys can clarify why one of the methods above is better?
With a binary heap implemented as an array, there are in general two ways to implement insertion: top-down or bottom-up. Slide 17 of the linked PDF describes the bottom-up way of doing things. It adds the new item at the end of the heap (bottom-most, left-most position), and bubbles it up. That is an implementation of strategy 1 shown on Slide 16.
From a performance standpoint, this is the better method simply because, on average, it requires fewer iterations to fix the ordering. See Argument for O(1) average-case complexity of heap insertion for a detailed explanation of why that is.
The top-down approach, which corresponds to strategy 2 on Slide 16, requires that every insertion make O(log n) comparisons. This strategy starts at the root and sifts the item down through the heap. If the new item is smaller (in a min-heap) than the node it's being compared against, it replaces the item at that node, and the just-replaced item has to be pushed down. This continues until you reach the bottom of the heap. There is no "early out" possible because you have to end up putting a new item in the heap at the leaf level.
I never really thought of it as making sure of the ordering first, and then ensuring completeness, but that's essentially what the top-down method is doing.
The second strategy requires more iterations per insertion, and also does more work during each iteration to maintain the ordering.

Why is it so slow to add or remove elements in the middle of a vector?

According to Accelerated C++:
To use this strategy, we need a way to remove an element from a vector. The good news is that such a facility exists; the bad news is that removing elements from vectors is slow enough to argue against using this approach for large amounts of input data. If the data we process get really big, performance degrades to an astonishing extent.
For example, if all of our students were to fail, the execution time of the function that we are about to see would grow proportionally to the square of the number of students. That means that for a class of 100 students, the program would take 10,000 times as long to run as it would for one student. The problem is that our input records are stored in a vector, which is optimized for fast random access. One price of that optimization is that it can be expensive to insert or delete elements other than at the end of the vector.
The authors do not explain why the vector would be so slow for 10,000+ students, and why in general it is slow to add or remove elements to the middle of a vector. Could somebody on Stack Overflow come up with a beautiful answer for me?
Take a row of houses: if you build them in a straight line, then finding No. 32 is really easy: just walk along the road about 32 houses' worth, and you're there. But it's not quite so fun to add house No. 31½ in the middle — that's a big construction project with a lot of disruption to husband's/wife's and kids' lives. In the worst case, there is not enough space on the road for another house anyway, so you have to move all the houses to a different street before you even start.
Similarly, vectors store their data contiguously, i.e. in a continuous, sequential block in memory.
This is very good for quickly finding the nth element (as you simply have to trundle along n positions and dereference), but very bad for inserting into the middle as you have to move all the later elements along by one, one at a time.
Other containers are designed to be easy to insert elements, but the trade-off is that they are consequently not quite as easy to find things in. There is no container which is optimal for all operations.
When inserting elements into or removing elements from the middle of a std::vector<T> all elements after the modification point need to moved: when inserting they need to be moved further to the back, when removing they need to be moved forward to close the gap. The background is that std::vector<T> is basically just a contiguous sequence of elements.
Although this operation isn't too bad for certain types it can become comparatively slow. Note, however, that the size of the container needs to be of some sensible size or the cost of moving be significant: for small vectors, inserting into/removing from the middle is probably faster than using other data structures, e.g., lists. Eventually the cost of maintaining a more complex structure does pay off, however.
std::vector allocates memory as one extent. If you need to insert an element in the middle of the extend you have to shift right all elements of the vector that to make a free slot where you will nsert the new element. And moreover if the extend is already full of elements the vector need to allocate a new larger extend and copy all elements from the original extent to the new one.

Algorithms on merging two heaps

As I know, there exists a binomial heap or a so called mergeable heap, which is used to merge two heaps. My question is, instead of merging these heaps into one heap dynamically, if I copy these two heaps into one big array and then perform a heap building procedure, would that be a good approach or not?
Because I don't know how to create one heap using two heaps using by just heap operation. Please tell me if it is not a good way, or if you can, please give me some link, where a binomial heap with merge operation is implemented.
If you think about it, creating one heap by throwing away all the info embedded in the ordering of the other heaps can't possibly be optimal. Worst case, you should add all the items in heap 2 to heap 1, and that will be just half the work of creating a brand new heap from scratch.
But in fact, you can do way better than that. Merging two well-formed heaps involves little more than finding the insertion point for one of the roots in the other heap's tree, and inserting it at that point. No further work is necessary, and you've done no more than ln N work! See here for the detailed algorithm.
It will solve the problem, and it will give you a correct heap - but it will not be efficient.
Creating a [binary] heap of n elements from scratch is O(n), while merging 2 existing binomial heaps is O(logn).
The process of merging 2 binomial heaps is pretty much similar to the merge operation in merge sort. If not knowing the merge - heap procedure is the problem, following steps might help.
repeat steps 1 through 4 until one of the heaps is empty
If the heads (which are binomial trees) of the 2 heaps are of same degree then you assign the head of the heap with greater key as the child of the child of head of heap with smaller key. Consequentely the degree of the head of the latter heap will be increased by 1 and make the head of the former heap the next element of its current head and go to step 2 else if they are of different degree, then go to step 4
If the head and the next binomial tree in the latter heap in step 1 are of same degree, then go to step 3 else go to step 1
Combine the head and its next element in the heap, in the same manner as you did in step 1 and assign the new combined Binomial tree as head and go to step 2.
See which of the 2 heaps have head with lower degree. Assign head of this heap as the head of other heap and delete it from the heap where it was initially present
Brodal queues and Brodal-Okasaki queues (bootstrapped skew binomial heaps) give the best worst-case asymptotic bounds for mergeable heaps, supporting O(1) insert, merge, and findMin, and O(log n) deleteMin. Brodal queues are ephemeral, and support efficient delete and decreaseKey. Brodal-Okasaki queues are confluently persistent (in fact purely functional), but don't support delete or decreaseKey. Unfortunately, Brodal and Okasaki say both these implementations are inefficient in practice, and Brodal considers his queues too complicated to be practical in any case.
Fibonacci heaps give similar amortized (but not worst-case) bounds, and are likely more efficient and practical in an amortized context. Pairing heaps are another good option: according to Wikipedia, their exact bounds are unknown, but they perform very well in practice.

How to keep a large priority queue with the most relevant items?

In an optimization problem I keep in a queue a lot of candidate solutions which I examine according to their priority.
Each time I handle one candidate, it is removed form the queue but it produces several new candidates making the number of cadidates to grow exponentially. To handle this I assign a relevancy to each candidate, whenever a candidate is added to the queue, if there is no more space avaliable, I replace (if appropiate) the least relevant candidate currently in the queue with the new one.
In order to do this efficiently I keep a large (fixed size) array with the candidates and two linked indirect binary heaps: one handles the candidates in decreasing priority order, and the other in ascending relevancy.
This is efficient enough for my purposes and the supplementary space needed is about 4 ints/candidate which is also reasonable. However it is complicated to code, and it doesn't seem optimal.
My question is if you know of a more adequate data structure or of a more natural way to perform this task without losing efficiency.
Here's an efficient solution that doesn't change the time or space complexity over a normal heap:
In a min-heap, every node is less than both its children. In a max-heap, every node is greater than its children. Let's alternate between a min and max property for each level making it: every odd row is less than its children and its grandchildren, and the inverse for even rows. Then finding the smallest node is the same as usual, and finding the largest node requires that we look at the children of the root and take the largest. Bubbling nodes (for insertion) becomes a bit tricker, but it's still the same O(logN) complexity.
Keeping track of capacity and popping the smallest (least relevant) node is the easy part.
EDIT: This appears to be a standard min-max heap! See here for a description. There's a C implementation: header, source and example. Here's an example graph:
(source: chonbuk.ac.kr)
"Optimal" is hard to judge (near impossible) without profiling.
Sometimes a 'dumb' algorithm can be the fastest because intel CPUs are incredibly fast at dumb array scans on contiguous blocks of memory especially if the loop and the data can fit on-chip. By contrast, jumping around following pointers in a larger block of memory that doesn't fit on-chip can be tens or hundreds or times slower.
You may also have the issues when you try to parallelize your code if the 'clever' data structure introduces locking thus preventing multiple threads from progressing simultaneously.
I'd recommend profiling both your current, the min-max approach and a simple array scan (no linked lists = less memory) to see which performs best. Odd as it may seem, I have seen 'clever' algorithms with linked lists beaten by simple array scans in practice often because the simpler approach uses less memory, has a tighter loop and benefits more from CPU optimizations. You also potentially avoid memory allocations and garbage collection issues with a fixed size array holding the candidates.
One option you might want to consider whatever the solution is to prune less frequently and remove more elements each time. For example, removing 100 elements on each prune operation means you only need to prune 100th of the time. That may allow a more asymmetric approach to adding and removing elements.
But overall, just bear in mind that the computer-science approach to optimization isn't always the practical approach to the highest performance on today and tomorrow's hardware.
If you use skip-lists instead of heaps you'll have O(1) time for dequeuing elements while still doing searches in O(logn).
On the other hand a skip list is harder to implement and uses more space than a binary heap.