Related
I have a priority heap holding an event queue.
I need to dump this out for the user in order, but without rendering the heap unusable.
Obviously if I was willing to destroy it I could simply dequeue events until it was empty, and add them in order to my sorted list. But then of course the heap is gone. Further, Quicksort is so much faster than a heap sort that I don't have confidence that I can build this sorted list faster than I can make a copy of the heap and sort the copy.
One idea I had was to in fact destroy the heap by dequeueing all its items, but then... replacing the now-empty priority queue with the resulting sorted list, which should maintain the heap property (of cell i being a higher priority than cell i * 2+1 and i * 2+2). So I'm also wondering whether such a heap would perform better than a regular heap.
The easiest solution is just to copy the heap array, and sort the copy. But some sorts do a bad job when given sorted or somewhat-sorted data, and I'm wondering whether the Standard C++ library's sort (or C qsort()) could be trusted to handle sorting a heap as efficiently?
Conversely, while quicksort is far faster than heapsort in the general case, this isn't the general case. The array is already heapified, which is the first half the work of heapsort. It'd be plausible that pulling the heap items out in order (the second half) could be faster than quicksort. So I'm wondering if there's a research result that it is faster to 1) copy heapified array, 2) pull items out in order and place at end of copy, and 3) reverse copy, is typically faster than quicksort. (Or, pull items out in order and place at second new array, and use that as your return value. I suspect the reversal stage may be better than increasing the cache lines needed.)
It looks to me like you're concerned about a performance problem that isn't really a problem. As I understand it, modern C++ implementations of sort use Introsort, which avoids the pathological worst-case times of a naïve Quicksort. And the difference between Quicksort and Heapsort, in the context of generating user output, is not large enough to be a concern.
Just copy the heap and sort it. Or sort the heap directly and output the result, provided of course that doing so doesn't break the heap.
You asked if a sorted heap performs better than a non-sorted heap. No. When adding an item, you still add it as the last node and sift it up. Half of the nodes in a heap are at the leaf level and assuming a uniform distribution of new items, then half of the items you add will end up at the leaf level, requiring no swaps. Worst case is if every item you add ends up being the smallest (in a min-heap), in which case every time you add an item it will take log(n) swaps to move it to the root. Now, if every item added is larger than any other item in the heap, then of course addition is O(1). But that's true regardless of whether the heap was initially created from a sorted array.
Deleting an item from the heap requires that you replace the root item with an item from the leaf level and then sift it down. In a sorted heap, the likelihood that the replacement item will end up back down at the leaf level is very high, which means that adjusting the heap will require the maximum log(n) swaps. A sorted heap almost guarantees that removal will require the maximum number of swaps. In this case, a sorted heap is potentially worse in terms of performance than a heap constructed from a randomly-arranged array.
But all that changes quickly as you begin adding items to and removing items from the heap. The heap becomes "not sorted" fairly quickly.
Over the life of the priority queue, it's highly unlikely that the initial order of items will make any noticeable difference in the performance of your binary heap.
With the usual heap implementation, just sort the heap in place. A sorted list satisfies the heap condition for a min-heap. Alternately if you sort the heap descending, you satisfy the heap condition for a max-heap. And you can always sort it one way and traverse another if that is what you need.
Note that the sort::heap documentation warns about breaking the heap condition. Be careful that you know you haven't if you are changing the heap data in place.
I implemented an algorithm where I make use of an priority queue.
I was motivated by this question:
Transform a std::multimap into std::priority_queue
I am going to store up to 10 million elements with their specific priority value.
I then want to iterate until the queue is empty.
Every time an element is retrieved it is also deleted from the queue.
After this I recalculate the elements pririty value, because of previous iterations it can change.
If the value did increase I am inserting the element againg into the queue.
This happens more often dependent on the progress. (at the first 25% it does not happen, in the next 50% it does happen, in the last 25% it will happen multiple times).
After receiving the next element and not reinserting it, I am going to process it. This for I do not need the priority value of this element but the technical ID of this element.
This was the reason I intuitively had chosen a std::multimap to achieve this, using .begin() to get the first element, .insert() to insert it and .erase() to remove it.
Also, I did not intuitively choose std::priority_queue directly because of other questions to this topic answering that std::priority_queue most likely is used for only single values and not for mapped values.
After reading the link above I reimplemented it using priority queue analogs to the other question from the link.
My runtimes seem to be not that unequal (about an hour on 10 mio elements).
Now I am wondering why std::priority_queue is faster at all.
I actually would expect to be the std::multimap faster because of the many reinsertions.
Maybe the problem is that there are too many reorganizations of the multimap?
To summarize: your runtime profile involves both removing and inserting elements from your abstract priority queue, with you trying to use both a std::priority_queue and a std::multimap as the actual implementation.
Both the insertion into a priority queue and into a multimap have roughly equivalent complexity: logarithmic.
However, there's a big difference with removing the next element from a multimap versus a priority queue. With a priority queue this is going to be a constant-complexity operation. The underlying container is a vector, and you're removing the last element from the vector, which is going to be mostly a nothing-burger.
But with a multimap you're removing the element from one of the extreme ends of the multimap.
The typical underlying implementation of a multimap is a balanced red/black tree. Repeated element removals from one of the extreme ends of a multimap has a good chance of skewing the tree, requiring frequent rebalancing of the entire tree. This is going to be an expensive operation.
This is likely to be the reason why you're seeing a noticeable performance difference.
I think the main difference comes form two facts:
Priority queue has a weaker constraint on the order of elements. It doesn't have to have sorted whole range of keys/priorities. Multimap, has to provide that. Priority queue only have to guarantee the 1st / top element to be largest.
So, while, the theoretical time complexities for the operations on both are the same O(log(size)), I would argue that erase from multimap, and rebalancing the RB-tree performs more operations, it simply has to move around more elements. (NOTE: RB-tree is not mandatory, but very often chosen as underlying container for multimap)
The underlying container of priority queue is contiguous in memory (it's a vector by default).
I suspect the rebalancing is also slower, because RB-tree relies on nodes (vs contiguous memory of vector), which makes it prone to cache misses, although one has to remember that operations on heap are not done in iterative manner, it is hopping through the vector. I guess to be really sure one would have to profile it.
The above points are true for both insertions and erasues. I would say the difference is in the constant factors lost in the big-O notation. This is intuitive thinking.
The abstract, high level explanation for map being slower is that it does more. It keeps the entire structure sorted at all times. This feature comes at a cost. You are not paying that cost if you use a data structure that does not keep all elements sorted.
Algorithmic explanation:
To meet the complexity requirements, a map must be implemented as a node based structure, while priority queue can be implemented as a dynamic array. The implementation of std::map is a balanced (typically red-black) tree, while std::priority_queue is a heap with std::vector as the default underlying container.
Heap insertion is usually quite fast. The average complexity of insertion into a heap is O(1), compared to O(log n) for balanced tree (worst case is the same, though). Creating a priority queue of n elements has worst case complexity of O(n) while creating a balanced tree is O(n log n). See more in depth comparison: Heap vs Binary Search Tree (BST)
Additional, implementation detail:
Arrays usually use CPU cache much more efficiently, than node based structures such as trees or lists. This is because adjacent elements of an array are adjacent in memory (high memory locality) and therefore may fit within a single cache line. Nodes of a linked structure however exist in arbitrary locations (low memory locality) in memory and usually only one or very few are within a single cache line. Modern CPUs are very very fast at calculations but memory speed is a bottle neck. This is why array based algorithms and data structures tend to be significantly faster than node based.
While I agree with both #eerorika and #luk32, it is worth mentioning that in the real world, when using default STL allocator, memory management cost easily out-weights a few data structure maintenance operations such as updating pointers to perform tree rotation. Depending on the implementation the memory allocation itself could involve tree maintenance operation and potentially triggers system-call where it would become even more costly.
In multi-map, there is memory allocation and deallocation associated with each insert() and erase() respectively which often contributes to slowness in a higher order of magnitude than the extra steps in the algorithm.
priority-queue however, by default uses vector which only triggers memory allocation (a much more expansive one though, which involves moving all stored objects to the new memory location) once the capacity is exhausted. In your case pretty much all allocation only happens in the first iteration for priority-queue whereas multi-map keeps paying memory management cost with each insert and erase.
The downside around memory management for map could be mitigated by using a memory-pool based custom allocator. This also gives you cache hit rate comparable to priority queue. It might even out-perform priority-queue when your object is expansive to move or copy.
What is the use of data structure Binary Search Tree, if vector (in sorted order) can support insert,delete and search in log(n) time (using binary search)??
The basic advantage of a tree is that insert and delete in a vector are not O(log(n)) - they are O(n). (They take log(n) comparisons, but n moves.)
The advantage of a vector is that the constant factor can be hugely in their favour (because they tend to be much more cache friendly, and cache misses can cost you a factor of 100 in performance).
Sorted vectors win when
Mostly searching.
Frequent updates but only a few elements in the container.
Objects have efficient move semantics
Trees win when
Lots of updates with many elements in the container.
Object move is expensive.
... and don't forget hashed containers which are O(1) search, and unordered vectors+linear search (which are O(n) for everything, but if small enough are actually fastest).
There won't be much difference in performance between a sorted vector and BST if there are only search operations after some initial insertions/deletions. As
binary search over vector will cost you same as searching a key in BST. In fact I would go for sorted vector in this case as it's more cache friendly.
However, if there are frequent insertions/deletions involved along with searching, then a sorted vector won't be good option as elements need to move back and forth after every insertion and deletion to keep vector sorted.
Theoretically there's impossible to do insert or delete in a sorted vector in O(log(n)). But if you really want the advantage of searching in BST vs vector, here's somethings I can think about:
BST and other tree structures take bulk of small memory allocations of "node", and each node is a fixed small memory chunk. While vector uses a big continuous memory block to hold all the items, and it double (or even triple) the memory usage while re-sizing. So in the system with very limited memory, or in the system where fragmentation happens frequently, it's possible that BST will successfully allocate enough memory chunks for all the nodes, while vector failed to allocate the memory.
I'm trying to implement a priority queue as an sorted array backed minimum binary heap. I'm trying to get the update_key function to run in logarithmic time, but to do this I have to know the position of the item in the array. Is there anyway to do this without the use of a map? If so, how? Thank you
If you really want to be able to change the key of an arbitrary element, a heap is not the best choice of data structure. What it gives you is the combination of:
compact representation (no pointers, just an array and an implicit
indexing scheme)
logarithmic insertion, rebalancing
logarithmic removal of the smallest (largest) element.
O(1) access to the value of the smallest (largest) element. -
A side benefit of 1. is that the lack of pointers means you do substantially fewer calls to malloc/free (new/delete).
A map (represented in the standard library as a balanced binary tree) gives you the middle two of these, adding in
logarithmic find() on any key.
So while you could attach another data structure to your heap, storing pointers in the heap and then making the comparison operator dereference through the pointer, you'd pretty soon find yourself with the complexity in time and space of just using a map in the first place.
Your find key function should operate in log(n) time. Your updating (changing the key) should be constant time. Your remove function should run in log(n) time. Your insert function should be log(n) time.
If these assumptions are true try this:
1) Find your item in your heap (IE: binary search, since it is a sorted array).
2) Update your key (you're just changing a value, constant time)
3) Remove the item from the heap log(n) to reheapify.
4) Insert your item into the heap log(n).
So, you'd have log(n) + 1 + log(n) + log(n) which reduces to log(n).
Note: this is amortized, because if you have to realloc your array, etc... that adds overhead. But you shouldn't do that very often anyway.
That's the tradeoff of the array-backed heap: you get excellent memory use (good locality and minimal overhead), but you lose track of the elements. To solve it, you have to add back some overhead.
One solution would be this. The heap contains objects of type C*. C is a class with an int member heap_index, which is the index of the object in the heap array. Whenever you move an element inside the heap array, you'll have to update its heap_index to set it to the new index.
Update_key (as well as removal of an arbitrary element) is then log(n) time because it takes constant time to find the element (via heap_index), and log(n) time to bubble it into the correct position.
How large does a collection have to be for std::map to outpace a sorted std::vector >?
I've got a system where I need several thousand associative containers, and std::map seems to carry a lot of overhead in terms of CPU cache. I've heard somewhere that for small collections std::vector can be faster -- but I'm wondering where that line is....
EDIT: I'm talking about 5 items or fewer at a time in a given structure. I'm concerned most with execution time, not storage space. I know that questions like this are inherently platform-specific, but I'm looking for a "rule of thumb" to use.
Billy3
It's not really a question of size, but of usage.
A sorted vector works well when the usage pattern is that you read the data, then you do lookups in the data.
A map works well when the usage pattern involves a more or less arbitrary mixture of modifying the data (adding or deleting items) and doing queries on the data.
The reason for this is fairly simple: a map has higher overhead on an individual lookup (thanks to using linked nodes instead of a monolithic block of storage). An insertion or deletion that maintains order, however, has a complexity of only O(lg N). An insertion or deletion that maintains order in a vector has a complexity of O(N) instead.
There are, of course, various hybrid structures that can be helpful to consider as well. For example, even when data is being updated dynamically, you often start with a big bunch of data, and make a relatively small number of changes at a time to it. In this case, you can load your data into memory into a sorted vector, and keep the (small number of) added objects in a separate vector. Since that second vector is normally quite small, you simply don't bother with sorting it. When/if it gets too big, you sort it and merge it with the main data set.
Edit2: (in response to edit in question). If you're talking about 5 items or fewer, you're probably best off ignoring all of the above. Just leave the data unsorted, and do a linear search. For a collection this small, there's effectively almost no difference between a linear search and a binary search. For a linear search you expect to scan half the items on average, giving ~2.5 comparisons. For a binary search you're talking about log2 N, which (if my math is working this time of the morning) works out to ~2.3 -- too small a difference to care about or notice (in fact, a binary search has enough overhead that it could very easily end up slower).
If you say "outspace" you mean consuming more space (aka memory), then it's very likely that vector will always be more efficient (the underlying implementation is an continous memory array with no othe data, where map is a tree, so every data implies using more space). This however depends on how much the vector reserves extra space for future inserts.
When it is about time (and not space), vector will also always be more effective (doing a dichotomic search). But it will be extreamly bad for adding new elements (or removing them).
So : no simple answer ! Look-up the complexities, think about the uses you are going to do. http://www.cplusplus.com/reference/stl/
The main issue with std::map is an issue of cache, as you pointed.
The sorted vector is a well-known approach: Loki::AssocVector.
For very small datasets, the AssocVector should crush the map despite the copy involved during insertion simply because of cache locality. The AssocVector will also outperform the map for read-only usage. Binary search is more efficient there (less pointers to follow).
For all other uses, you'll need to profile...
There is however an hybrid alternative that you might wish to consider: using the Allocator parameter of the map to restrict the memory area where the items are allocated, thus minimizing the locality reference issue (the root of cache misses).
There is also a paradigm shift that you might consider: do you need sorted items, or fast look-up ?
In C++, the only STL-compliant containers for fast-lookup have been implemented in terms of Sorted Associative Containers for years. However the up-coming C++0x features the long awaited unordered_map which could out perform all the above solutions!
EDIT: Seeing as you're talking about 5 items or fewer:
Sorting involves swapping items. When inserting into std::map, that will only involve pointer swaps. Whether a vector or map will be faster depends on how fast it is to swap two elements.
I suggest you profile your application to figure it out.
If you want a simple and general rule, then you're out of luck - you'll need to consider at least the following factors:
Time
How often do you insert new items compared to how often you lookup?
Can you batch inserts of new items?
How expensive is sorting you vector? Vectors of elements that are expensive to swap become very expensive to sort - vectors of pointers take far less.
Memory
How much overhead per allocation does the allocator you're using have? std::map will perform one allocation per item.
How big are your key/value pairs?
How big are your pointers? (32/64 bit)
How fast does you implementation of std::vector grow? (Popular growth factors are 1.5 and 2)
Past a certain size of container and element, the overhead of allocation and tree pointers will become outweighed by the cost of the unused memory at the end of the vector - but by far the easiest way to find out if and when this occurs is by measuring.
It has to be in the millionth items. And even there ...
I am more thinking here to memory usage and memory accesses. Under hundreds of thousands, take whatever you want, there will be no noticeable difference. CPUs are really fast these days, and the bottleneck is memory latency.
But even with millions of items, if your map<> has been build by inserting elements in random order. When you want to traverse your map (in sorted order) you'll end up jumping around randomly in the memory, stalling the CPU for memory to be available, resulting in poor performance.
On the other side, if your millions of items are in a vector, traversing it is really fast, taking advantage of the CPU memory accesses predictions.
As other have written, it depends on your usage.
Edit: I would more question the way to organize your thousands of associative containers than the containers themselves if they contain only 5 items.