Algorithms on merging two heaps - c++

As I know, there exists a binomial heap or a so called mergeable heap, which is used to merge two heaps. My question is, instead of merging these heaps into one heap dynamically, if I copy these two heaps into one big array and then perform a heap building procedure, would that be a good approach or not?
Because I don't know how to create one heap using two heaps using by just heap operation. Please tell me if it is not a good way, or if you can, please give me some link, where a binomial heap with merge operation is implemented.

If you think about it, creating one heap by throwing away all the info embedded in the ordering of the other heaps can't possibly be optimal. Worst case, you should add all the items in heap 2 to heap 1, and that will be just half the work of creating a brand new heap from scratch.
But in fact, you can do way better than that. Merging two well-formed heaps involves little more than finding the insertion point for one of the roots in the other heap's tree, and inserting it at that point. No further work is necessary, and you've done no more than ln N work! See here for the detailed algorithm.

It will solve the problem, and it will give you a correct heap - but it will not be efficient.
Creating a [binary] heap of n elements from scratch is O(n), while merging 2 existing binomial heaps is O(logn).

The process of merging 2 binomial heaps is pretty much similar to the merge operation in merge sort. If not knowing the merge - heap procedure is the problem, following steps might help.
repeat steps 1 through 4 until one of the heaps is empty
If the heads (which are binomial trees) of the 2 heaps are of same degree then you assign the head of the heap with greater key as the child of the child of head of heap with smaller key. Consequentely the degree of the head of the latter heap will be increased by 1 and make the head of the former heap the next element of its current head and go to step 2 else if they are of different degree, then go to step 4
If the head and the next binomial tree in the latter heap in step 1 are of same degree, then go to step 3 else go to step 1
Combine the head and its next element in the heap, in the same manner as you did in step 1 and assign the new combined Binomial tree as head and go to step 2.
See which of the 2 heaps have head with lower degree. Assign head of this heap as the head of other heap and delete it from the heap where it was initially present

Brodal queues and Brodal-Okasaki queues (bootstrapped skew binomial heaps) give the best worst-case asymptotic bounds for mergeable heaps, supporting O(1) insert, merge, and findMin, and O(log n) deleteMin. Brodal queues are ephemeral, and support efficient delete and decreaseKey. Brodal-Okasaki queues are confluently persistent (in fact purely functional), but don't support delete or decreaseKey. Unfortunately, Brodal and Okasaki say both these implementations are inefficient in practice, and Brodal considers his queues too complicated to be practical in any case.
Fibonacci heaps give similar amortized (but not worst-case) bounds, and are likely more efficient and practical in an amortized context. Pairing heaps are another good option: according to Wikipedia, their exact bounds are unknown, but they perform very well in practice.

Related

Maintaining Heap Property

Brief Background: I am studying the steps for maintaining a heap property when insertion occurs. Here is an interesting problem:
Question: There are two general strategies that could be used to maintain the heap properties:
Make sure that the tree is complete and then fix the ordering or
Make sure the ordering is correct first and then check for completeness.
Which is better (1 or 2)?
Reference: http://www.cs.sfu.ca/CourseCentral/225/johnwill/cmpt225_09heaps.pdf (Heap Insertion - slide 16) written by Dr. John Edgar.
It would be great if you guys can clarify why one of the methods above is better?
With a binary heap implemented as an array, there are in general two ways to implement insertion: top-down or bottom-up. Slide 17 of the linked PDF describes the bottom-up way of doing things. It adds the new item at the end of the heap (bottom-most, left-most position), and bubbles it up. That is an implementation of strategy 1 shown on Slide 16.
From a performance standpoint, this is the better method simply because, on average, it requires fewer iterations to fix the ordering. See Argument for O(1) average-case complexity of heap insertion for a detailed explanation of why that is.
The top-down approach, which corresponds to strategy 2 on Slide 16, requires that every insertion make O(log n) comparisons. This strategy starts at the root and sifts the item down through the heap. If the new item is smaller (in a min-heap) than the node it's being compared against, it replaces the item at that node, and the just-replaced item has to be pushed down. This continues until you reach the bottom of the heap. There is no "early out" possible because you have to end up putting a new item in the heap at the leaf level.
I never really thought of it as making sure of the ordering first, and then ensuring completeness, but that's essentially what the top-down method is doing.
The second strategy requires more iterations per insertion, and also does more work during each iteration to maintain the ordering.

Is this how I combine two min-heaps together?

I am currently creating a source code to combine two heaps that satisfy the min heap property with the shape invariant of a complete binary tree. However, I'm not sure if what I'm doing is the correct accepted method of merging two heaps satisfying the requirements I laid out.
Here is what I think:
Given two priority queues represented as min heaps, I insert the nodes of the second tree one by one into the first tree and fix the heap property. Then I continue this until all of the nodes in the second tree is in the first tree.
From what I see, this feels like a nlogn algorithm since I have to go through all the elements in the second tree and for every insert it takes about logn time because the height of a complete binary tree is at most logn.. But I think there is a faster way, however I'm not sure what other possible method there is.
I was thinking that I could just insert the entire tree in, but that break the shape invariant and order invariant..Is my method the only way?
In fact building a heap is possible in linear time and standard function std::make_heap guarantees linear time. The method is explained in Wikipedia article about binary heap.
This means that you can simply merge heaps by calling std::make_heap on range containing elements from both heaps. This is asymptotically optimal if heaps are of similar size. There might be a way to exploit preexisting structure to reduce constant factor, but I find it not likely.

How to improve Boost Fibonacci Heap performance

I am implementing the Fast Marching algorithm, which is some kind of continuous Dijkstra. As I read in many papers, the Fibonacci heap is the most adequate heap for this purpose.
However, when profiling with callgrind my code I see that the following function is taking 58% of the execution time:
int popMinIdx () {
const int idx = heap_.top()->getIndex();
heap_.pop();
return idx;
}
Concretely, the pop() is taking 57.67% of the whole execution time.
heap_is defined as follows:
boost::heap::fibonacci_heap<const FMCell *, boost::heap::compare<compare_cells>> heap_;
Is it normal that it takes "that much" time or is there something I can do to improve performance?
Sorry if not enough information is given. I tried to be as brief as possible. I will add more info if needed.
Thank you!
The other answers aren't mentioning the big part: of course pop() takes the majority of your time: it's the only function that performs any real work!
As you may have read, the bounds on the operations of a Fibonacci Heap are amortized bounds. This means that if you perform enough operations in a good sequence, the bounds will average out to that. However, the actual costs are completely hidden.
Every time you insert an element, nothing happens. It is just thrown into the root list. Boom, O(1) time. Every time you merge two trees, its root is just linked into the root list. Boom, O(1) time. But hold on, your structure is not a valid Fibonacci Heap! That's where pop() (or extract-root) comes in: every time this operation is called, the entire Heap is restructured back into a correct shape. The Root is removed, its children are cut to the root list, and then we start merging trees in the root list so that no two trees with the same degree (number of children) exist in the root list.
So all of the work of Insert(e) and Merge(t) is actually delayed until Pop() is called, which then does all the work. What about the other operations?
Delete(e) is beautiful. We perform Decrease-Key(e, -inf) to make the element e become the root. And now we perform Pop()! Again, the work is done by Pop().
Decrease-Key(e, v) does its work by itself: it cuts e to the root list and starts a cutting cascade to put its children into the root list as well (which can cut their childlists too). So Decrease-Key puts a whole lot of elements into the root list. Can you guess which function has to fix that?
TL;DR: Pop() is the work horse of the Fibonacci Heap. All other operations are done efficiently because they create work for the Pop() operation. Pop() gathers the work and performs it in one go (which can take up to O(n)). This is actually really efficient because the "grouped" work can be done faster than each operation separately.
So yes, it is natural that Pop() takes up the majority of your time!
The Fibanacci Heap's pop() has an amortized runtime of O(log n) and worst case of O(n). If your heap is large, it could easily be consuming a majority of the CPU time in your algorithm, especially since most of the other operations you're likely using have O(1) runtimes (insert, top, etc.)
One thing I'd recommend is to try callgrind with your preferred optimization level (such as -O3) with debug info (-g), because the templatized datastructures/containers such as the fibonacci_heap are heavy on the inlined function usage. It could be that most of the CPU cycles you're measuring don't even exist in your optimized executable.

Graph memory implementation

The two ways commonly used to represent a graph in memory are to use either an adjacency list or and adjacency matrix. An adjacency list is implemented using an array of pointers to linked lists. Is there any reason that that is faster than using a vector of vectors? I feel like it should make searching and traversals faster because backtracking would be a lot simpler.
The vector of linked adjacencies is a favorite textbook meme with many variations in practice. Certainly you can use vectors of vectors. What are the differences?
One is that links (double ones anyway) allow edges to be easily added and deleted in constant time. This obviously is important only when the edge set shrinks as well as grows. With vectors for edges, any individual operation may require O(k) where k is the incident edge count.
NB: If the order of edges in adjacency lists is unimportant for your application, you can easily get O(1) deletions with vectors. Just copy the last element to the position of the one to be deleted, then delete the last! Alas, there are many cases (e.g. where you're worried about embedding in the plane) when order of adjacencies is important.
Even if order must be maintained, you can arrange for copying costs to amortize to an average that is O(1) per operation over many operations. Still in some applications this is not good enough, and it requires "deleted" marks (a reserved vertex number suffices) with compaction performed only when the number of marked deletions is a fixed fraction of the vector. The code is tedious and checking for deleted nodes in all operations adds overhead.
Another difference is overhead space. Adjacency list nodes are quite small: Just a node number. Double links may require 4 times the space of the number itself (if the number is 32 bits and both pointers are 64). For a very large graph, a space overhead of 400% is not so good.
Finally, linked data structures that are frequently edited over a long period may easily lead to highly non-contiguous memory accesses. This decreases cache performance compared to linear access through vectors. So here the vector wins.
In most applications, the difference is not worth worrying about. Then again, huge graphs are the way of the modern world.
As others have said, it's a good idea to use a generalized List container for the adjacencies, one that may be quickly implemented either with linked nodes or vectors of nodes. E.g. in Java, you'd use List and implement/profile with both LinkedList and ArrayList to see which works best for your application. NB ArrayList compacts the array on every remove. There is no amortization as described above, although adds are amortized.
There are other variations: Suppose you have a very dense graph, where there's a frequent need to search all edges incident to a given node for one with a certain label. Then you want maps for the adjacencies, where the keys are edge labels. Of course the maps can be hashes or trees or skiplists or whatever you like.
The list goes on. How to implement for efficient vertex deletion? As you might expect, there are alternatives here, too, each with advantages and disadvantages.

How to keep a large priority queue with the most relevant items?

In an optimization problem I keep in a queue a lot of candidate solutions which I examine according to their priority.
Each time I handle one candidate, it is removed form the queue but it produces several new candidates making the number of cadidates to grow exponentially. To handle this I assign a relevancy to each candidate, whenever a candidate is added to the queue, if there is no more space avaliable, I replace (if appropiate) the least relevant candidate currently in the queue with the new one.
In order to do this efficiently I keep a large (fixed size) array with the candidates and two linked indirect binary heaps: one handles the candidates in decreasing priority order, and the other in ascending relevancy.
This is efficient enough for my purposes and the supplementary space needed is about 4 ints/candidate which is also reasonable. However it is complicated to code, and it doesn't seem optimal.
My question is if you know of a more adequate data structure or of a more natural way to perform this task without losing efficiency.
Here's an efficient solution that doesn't change the time or space complexity over a normal heap:
In a min-heap, every node is less than both its children. In a max-heap, every node is greater than its children. Let's alternate between a min and max property for each level making it: every odd row is less than its children and its grandchildren, and the inverse for even rows. Then finding the smallest node is the same as usual, and finding the largest node requires that we look at the children of the root and take the largest. Bubbling nodes (for insertion) becomes a bit tricker, but it's still the same O(logN) complexity.
Keeping track of capacity and popping the smallest (least relevant) node is the easy part.
EDIT: This appears to be a standard min-max heap! See here for a description. There's a C implementation: header, source and example. Here's an example graph:
(source: chonbuk.ac.kr)
"Optimal" is hard to judge (near impossible) without profiling.
Sometimes a 'dumb' algorithm can be the fastest because intel CPUs are incredibly fast at dumb array scans on contiguous blocks of memory especially if the loop and the data can fit on-chip. By contrast, jumping around following pointers in a larger block of memory that doesn't fit on-chip can be tens or hundreds or times slower.
You may also have the issues when you try to parallelize your code if the 'clever' data structure introduces locking thus preventing multiple threads from progressing simultaneously.
I'd recommend profiling both your current, the min-max approach and a simple array scan (no linked lists = less memory) to see which performs best. Odd as it may seem, I have seen 'clever' algorithms with linked lists beaten by simple array scans in practice often because the simpler approach uses less memory, has a tighter loop and benefits more from CPU optimizations. You also potentially avoid memory allocations and garbage collection issues with a fixed size array holding the candidates.
One option you might want to consider whatever the solution is to prune less frequently and remove more elements each time. For example, removing 100 elements on each prune operation means you only need to prune 100th of the time. That may allow a more asymmetric approach to adding and removing elements.
But overall, just bear in mind that the computer-science approach to optimization isn't always the practical approach to the highest performance on today and tomorrow's hardware.
If you use skip-lists instead of heaps you'll have O(1) time for dequeuing elements while still doing searches in O(logn).
On the other hand a skip list is harder to implement and uses more space than a binary heap.