fastest non-destructive method to get sorted version of a heap? - c++

I have a priority heap holding an event queue.
I need to dump this out for the user in order, but without rendering the heap unusable.
Obviously if I was willing to destroy it I could simply dequeue events until it was empty, and add them in order to my sorted list. But then of course the heap is gone. Further, Quicksort is so much faster than a heap sort that I don't have confidence that I can build this sorted list faster than I can make a copy of the heap and sort the copy.
One idea I had was to in fact destroy the heap by dequeueing all its items, but then... replacing the now-empty priority queue with the resulting sorted list, which should maintain the heap property (of cell i being a higher priority than cell i * 2+1 and i * 2+2). So I'm also wondering whether such a heap would perform better than a regular heap.
The easiest solution is just to copy the heap array, and sort the copy. But some sorts do a bad job when given sorted or somewhat-sorted data, and I'm wondering whether the Standard C++ library's sort (or C qsort()) could be trusted to handle sorting a heap as efficiently?
Conversely, while quicksort is far faster than heapsort in the general case, this isn't the general case. The array is already heapified, which is the first half the work of heapsort. It'd be plausible that pulling the heap items out in order (the second half) could be faster than quicksort. So I'm wondering if there's a research result that it is faster to 1) copy heapified array, 2) pull items out in order and place at end of copy, and 3) reverse copy, is typically faster than quicksort. (Or, pull items out in order and place at second new array, and use that as your return value. I suspect the reversal stage may be better than increasing the cache lines needed.)

It looks to me like you're concerned about a performance problem that isn't really a problem. As I understand it, modern C++ implementations of sort use Introsort, which avoids the pathological worst-case times of a naïve Quicksort. And the difference between Quicksort and Heapsort, in the context of generating user output, is not large enough to be a concern.
Just copy the heap and sort it. Or sort the heap directly and output the result, provided of course that doing so doesn't break the heap.
You asked if a sorted heap performs better than a non-sorted heap. No. When adding an item, you still add it as the last node and sift it up. Half of the nodes in a heap are at the leaf level and assuming a uniform distribution of new items, then half of the items you add will end up at the leaf level, requiring no swaps. Worst case is if every item you add ends up being the smallest (in a min-heap), in which case every time you add an item it will take log(n) swaps to move it to the root. Now, if every item added is larger than any other item in the heap, then of course addition is O(1). But that's true regardless of whether the heap was initially created from a sorted array.
Deleting an item from the heap requires that you replace the root item with an item from the leaf level and then sift it down. In a sorted heap, the likelihood that the replacement item will end up back down at the leaf level is very high, which means that adjusting the heap will require the maximum log(n) swaps. A sorted heap almost guarantees that removal will require the maximum number of swaps. In this case, a sorted heap is potentially worse in terms of performance than a heap constructed from a randomly-arranged array.
But all that changes quickly as you begin adding items to and removing items from the heap. The heap becomes "not sorted" fairly quickly.
Over the life of the priority queue, it's highly unlikely that the initial order of items will make any noticeable difference in the performance of your binary heap.

With the usual heap implementation, just sort the heap in place. A sorted list satisfies the heap condition for a min-heap. Alternately if you sort the heap descending, you satisfy the heap condition for a max-heap. And you can always sort it one way and traverse another if that is what you need.
Note that the sort::heap documentation warns about breaking the heap condition. Be careful that you know you haven't if you are changing the heap data in place.

Related

What advantages do vectors have over linked lists

I had the following exchange with my professor which wasn't very satisfying. I included my parts of the exchange which should be enough to get my point across.
"For vectors, does the C++ implementation traverse through each element of the old dynamically allocated array and free it?
(Edit: I mean, when resizing and adding elements, either by pushback or resize)
I am especially curious because the book tries to make the case that linked lists are troublesome because of having to traverse each time. Does not seem to me that vectors have a huge advantage in that regard.
The main benefit of the vectors I can see is the convenience and fast accessing but not much more. As in, everytime you try to do something other than accessing, you will be traversing through everything to move and free memory. Is that correct?"
After his reply, I added.
"Professor xxxx,
I went out to test, and in fact, the addresses change if you either resize or push_back, so my assumption that the old addresses are freed is correct. I can only assume that program will have to go to each element to free it, and if that's correct, won't the insertion of new things be costly in terms of time, even more so than traversing linked lists?
Can you kindly correct the following statement If it states any incorrect facts or assumptions. Using vectors in any other way than using arrays (for any other purpose than accessing already stored data), means that linked lists will almost always be faster because unlike linked lists, in vectors not only would you traverse through the elements, you will traverse through them, free them, and then create a whole new array to accommodate new space. That is because the next address after the last element of the current vector could have a pointer variable pointing to it, and using that address will cause an extremely strange behavior that I cannot imagine the poor soul's misery that tries to figure out what had gone wrong."
TL;DR:
Disadvantage of linked lists is traversing, but vector uses (push_back, resize(), etc) most often require the traversing anyway, so how are vectors exactly faster?
There are several things that are faster than you expect:
When a vector reallocates, the original elements are destroyed, not freed, one by one. Their storage is then freed all at once. This is as opposed to a linked list, where each node is allocated and freed individually. But this is somewhat moot because:
A vector batches reallocations. std::vector is specified to have an amortized constant insertion cost, which implies that it avoids reallocating every time you push_back, to the extent that this cost becomes negligible when considering complexity. Typical implementations multiply the vector's capacity by a fixed factor every time it is exceeded, so when performing the costly reallocation it provides room for the next several push_backs. These then do not need to traverse the vector or allocate anything.
A vector is extremely cache-friendly. This makes all sequential operations on a vector blazing fast, and can counter-intuitively outperform a linked list in many cases, especially in long-running applications where memory might become fragmented.
As a complement of the already given answer and comments...
The std::vector has its elements stored contiguously in memory while it is not the case of a linked list.
The direct consequence is that element access is trivial for a std::vector while it's not for a linked list.
For example, if I want to access the nth element of a linked list, I have to iterate through the list until reaching the desired element.
But on the other hand, the linked list will perform better if we want to insert a new element inside.
Indeed, for a linked list, we have to iterate until we reach the desired position, then we just have to change the connections between the previous and the next node so that the new element is inserted in-between.
For a std::vector, you have to relocate every elements after the desired position (and make a reallocation if needed, i.e. if adding a new element exceeds the reserved available space).
So the std::vector is better for element access but is less efficient when inserting an element inside (same thing for removal).

c++ Why std::multimap is slower than std::priority_queue

I implemented an algorithm where I make use of an priority queue.
I was motivated by this question:
Transform a std::multimap into std::priority_queue
I am going to store up to 10 million elements with their specific priority value.
I then want to iterate until the queue is empty.
Every time an element is retrieved it is also deleted from the queue.
After this I recalculate the elements pririty value, because of previous iterations it can change.
If the value did increase I am inserting the element againg into the queue.
This happens more often dependent on the progress. (at the first 25% it does not happen, in the next 50% it does happen, in the last 25% it will happen multiple times).
After receiving the next element and not reinserting it, I am going to process it. This for I do not need the priority value of this element but the technical ID of this element.
This was the reason I intuitively had chosen a std::multimap to achieve this, using .begin() to get the first element, .insert() to insert it and .erase() to remove it.
Also, I did not intuitively choose std::priority_queue directly because of other questions to this topic answering that std::priority_queue most likely is used for only single values and not for mapped values.
After reading the link above I reimplemented it using priority queue analogs to the other question from the link.
My runtimes seem to be not that unequal (about an hour on 10 mio elements).
Now I am wondering why std::priority_queue is faster at all.
I actually would expect to be the std::multimap faster because of the many reinsertions.
Maybe the problem is that there are too many reorganizations of the multimap?
To summarize: your runtime profile involves both removing and inserting elements from your abstract priority queue, with you trying to use both a std::priority_queue and a std::multimap as the actual implementation.
Both the insertion into a priority queue and into a multimap have roughly equivalent complexity: logarithmic.
However, there's a big difference with removing the next element from a multimap versus a priority queue. With a priority queue this is going to be a constant-complexity operation. The underlying container is a vector, and you're removing the last element from the vector, which is going to be mostly a nothing-burger.
But with a multimap you're removing the element from one of the extreme ends of the multimap.
The typical underlying implementation of a multimap is a balanced red/black tree. Repeated element removals from one of the extreme ends of a multimap has a good chance of skewing the tree, requiring frequent rebalancing of the entire tree. This is going to be an expensive operation.
This is likely to be the reason why you're seeing a noticeable performance difference.
I think the main difference comes form two facts:
Priority queue has a weaker constraint on the order of elements. It doesn't have to have sorted whole range of keys/priorities. Multimap, has to provide that. Priority queue only have to guarantee the 1st / top element to be largest.
So, while, the theoretical time complexities for the operations on both are the same O(log(size)), I would argue that erase from multimap, and rebalancing the RB-tree performs more operations, it simply has to move around more elements. (NOTE: RB-tree is not mandatory, but very often chosen as underlying container for multimap)
The underlying container of priority queue is contiguous in memory (it's a vector by default).
I suspect the rebalancing is also slower, because RB-tree relies on nodes (vs contiguous memory of vector), which makes it prone to cache misses, although one has to remember that operations on heap are not done in iterative manner, it is hopping through the vector. I guess to be really sure one would have to profile it.
The above points are true for both insertions and erasues. I would say the difference is in the constant factors lost in the big-O notation. This is intuitive thinking.
The abstract, high level explanation for map being slower is that it does more. It keeps the entire structure sorted at all times. This feature comes at a cost. You are not paying that cost if you use a data structure that does not keep all elements sorted.
Algorithmic explanation:
To meet the complexity requirements, a map must be implemented as a node based structure, while priority queue can be implemented as a dynamic array. The implementation of std::map is a balanced (typically red-black) tree, while std::priority_queue is a heap with std::vector as the default underlying container.
Heap insertion is usually quite fast. The average complexity of insertion into a heap is O(1), compared to O(log n) for balanced tree (worst case is the same, though). Creating a priority queue of n elements has worst case complexity of O(n) while creating a balanced tree is O(n log n). See more in depth comparison: Heap vs Binary Search Tree (BST)
Additional, implementation detail:
Arrays usually use CPU cache much more efficiently, than node based structures such as trees or lists. This is because adjacent elements of an array are adjacent in memory (high memory locality) and therefore may fit within a single cache line. Nodes of a linked structure however exist in arbitrary locations (low memory locality) in memory and usually only one or very few are within a single cache line. Modern CPUs are very very fast at calculations but memory speed is a bottle neck. This is why array based algorithms and data structures tend to be significantly faster than node based.
While I agree with both #eerorika and #luk32, it is worth mentioning that in the real world, when using default STL allocator, memory management cost easily out-weights a few data structure maintenance operations such as updating pointers to perform tree rotation. Depending on the implementation the memory allocation itself could involve tree maintenance operation and potentially triggers system-call where it would become even more costly.
In multi-map, there is memory allocation and deallocation associated with each insert() and erase() respectively which often contributes to slowness in a higher order of magnitude than the extra steps in the algorithm.
priority-queue however, by default uses vector which only triggers memory allocation (a much more expansive one though, which involves moving all stored objects to the new memory location) once the capacity is exhausted. In your case pretty much all allocation only happens in the first iteration for priority-queue whereas multi-map keeps paying memory management cost with each insert and erase.
The downside around memory management for map could be mitigated by using a memory-pool based custom allocator. This also gives you cache hit rate comparable to priority queue. It might even out-perform priority-queue when your object is expansive to move or copy.

Advantage of Binary Search Tree over vector in C++

What is the use of data structure Binary Search Tree, if vector (in sorted order) can support insert,delete and search in log(n) time (using binary search)??
The basic advantage of a tree is that insert and delete in a vector are not O(log(n)) - they are O(n). (They take log(n) comparisons, but n moves.)
The advantage of a vector is that the constant factor can be hugely in their favour (because they tend to be much more cache friendly, and cache misses can cost you a factor of 100 in performance).
Sorted vectors win when
Mostly searching.
Frequent updates but only a few elements in the container.
Objects have efficient move semantics
Trees win when
Lots of updates with many elements in the container.
Object move is expensive.
... and don't forget hashed containers which are O(1) search, and unordered vectors+linear search (which are O(n) for everything, but if small enough are actually fastest).
There won't be much difference in performance between a sorted vector and BST if there are only search operations after some initial insertions/deletions. As
binary search over vector will cost you same as searching a key in BST. In fact I would go for sorted vector in this case as it's more cache friendly.
However, if there are frequent insertions/deletions involved along with searching, then a sorted vector won't be good option as elements need to move back and forth after every insertion and deletion to keep vector sorted.
Theoretically there's impossible to do insert or delete in a sorted vector in O(log(n)). But if you really want the advantage of searching in BST vs vector, here's somethings I can think about:
BST and other tree structures take bulk of small memory allocations of "node", and each node is a fixed small memory chunk. While vector uses a big continuous memory block to hold all the items, and it double (or even triple) the memory usage while re-sizing. So in the system with very limited memory, or in the system where fragmentation happens frequently, it's possible that BST will successfully allocate enough memory chunks for all the nodes, while vector failed to allocate the memory.

What container to choose for fast search/insert with huge amounts of data?

So it's a thought experiment. I want to have a huge collection of structures such as:
struct
{
KeyType key;
ValueType value;
}
And I need fast access by a key and fast insertion of new values.
I would not use std::map cuz it has too big memory overhead for one structure and for huge amounts of data it might be drastical. Right?
So next I would consider using sorted std::vector and binary_search. It's fine for searching, but adding new values to the vector would be too slow. Imagine you need to add a new value to the beginning of the sorted array, you'd have to move data right aaaaaAAAALOT!
What if I use deque? As I know it has O(1) for push_back/push_front, but still O(n) for inserting (as it would have to move data anyway, less data though).
The questions are:
1) Is O(n) of inserting data in deque much faster in real situation than O(n) in vector?
2) What happens when you insert a value to Deque and the bucket it should go into is full?
3) Is there another preferable type of container in case you need to store lots of data and need two fast operations: search and insertion?
Thanks!
I would not use std::map cuz it has too big memory overhead for one structure and for huge amounts of data it might be drastical. Right?
That depends on the size of your structs... the bigger they are the less the overheads are as a proportion of the overall memory use. For example, a std::map implementation might average say 20 bytes of housekeeping data per element (I just made that up - measure on your own system), so if your struct size is in the hundreds of bytes - who cares...? But, if the struct holds 2 ints, it's a big proportion....
So next I would consider using sorted std::vector and binary_search. It's fine for searching, but adding new values to the vector would be too slow. Imagine you need to add a new value to the beginning of the sorted array, you'd have to move data right aaaaaAAAALOT!
Totally unsuitable....
1) Is O(n) of inserting data in deque much faster in real situation than O(n) in vector?
As deque is likely implemented as a vector of fixed-sized arrays, insertion implies a shuffling of all elements towards the nearest end of the container. The shuffling's probably a tiny bit less cache efficient, but if inserting nearer the front of the container it would likely still end up faster.
2) What happens when you insert a value to Deque and the bucket it should go into is full?
As above, it'll need to shuffle, overflowing either:
the last element to become the first element of the next "bucket", moving all those elements along and overflowing into the next bucket, etc.
the first element to become the last element of the previous bucket, moving all those elements along and overflowing into the next bucket, etc.
3) Is there another preferable type of container in case you need to store lots of data and need two fast operations: search and insertion?
unordered_map, which is implemented as a hash map. If you have small objects (e.g. less than 20 or 30 bytes) or a firm cap on the number of elements, you can normally easily outperform unordered_map with custom code, but it's rarely worth the effort unless the table access dominates you application's performance, and that performance is critical.
3) Is there another preferable type of container in case you need to store lots of data and need two fast operations: search and insertion?
Consider using std::unordered_map, which is an implementation of a hash map. Insertion, lookup, and removal are all O(1) in the average case. This assumes that you will only ever look for an item based on its exact key; if your searches can have different constraints then you either need a different structure, or you need multiple maps to map the various keys you will search for to the corresponding object.
This requires that there is an available hash function for KeyType, either as part of the standard library or provided by you.
There's no container which would provide the best of all the worlds to you. Like you are saying you want best lookup/insertion with minimum amount of space needed for storing elements.
Below if the list of containers which you could consider for your implementation:-
VECTOR :-
Strengths:-
1) Space is allocated only for holding data.
2) Good for random access.
3) Container of choice if insertions/deletions are not in the middle of the container.
Weakness:-
1) poor performance if insertions/deletions are at the middle.
2) rellocations happen if reserve is not used properly.
DEQUE:-
Choose deque over vector in case insertions/deletions are at the beginning as well as end of the container.
MAP:-
Disadvantage over vector:-
1) more space is allocated for holding pointers.
Advantages over vector:-
1) better insertions/deletions/lookup as compared to vector.
If std::unordered_map is used then these dictionary operations would be amortized O(1).
Firstly, in order to directly answer your questions:
1) Is O(n) of inserting data in deque much faster in real situation
than O(n) in vector?
The number of elements that have to be moved is (on average) only half compared to vector. However, it can actually perform worse as the data is stored in non-contiguous memory, so copying/moving the same number of elements is much less efficient (it cannot e.g. be implemented in terms of a single memcopy operation).
2) What happens when you insert a value to Deque and the bucket it
should go into is full?
At least for the gnu gcc Libstdc++ implementation, every bucket except the first and last one is always full. I believe, that inserting in the middle means that all elements are moved/copied one slot to the closer end (front or back) and the effect ripples through all buckets until the first or last one is reached.
In summary, the only scenario, where std::deque is consistently better than vector is if you use it as (suprise) a queue (only inserting and removing elements from the front or end) and that's what the implementation is optimized for. It is not optimized for insertions in the middle.
3) Is there another preferable type of container in case you need to
store lots of data and need two fast operations: search and insertion?
As already stated by others: A hash table like std::unordered_map is the data structure you are looking for.
From what I've heard however, std::unordered_map is a slightly suboptimal implementation if it, as it uses buckets in order to resolve hash collisions and those buckets are implemented as linked lists (here is a very interesting talk from Chandler Carruth on the general topic of the performance of different data structures). For random access on big data structures, cache locality should matter a lot less, so this is probably not such a big issue in your case.
Finally I'd like to mention that if your value and key types are small PODs and depending on how big your huge collection is (are we talking about some million or rather billions of elements) and how often you actually have to insert/remove elements, there might still be cases, where a simple std::vector outperforms any other STL container. So as always: if your thought experiment ever becomes reality try out and measure.

Which stl container should I use when doing few inserts?

I don't know my exact numbers but i'll try my best. I have a 10000 element deque thats populated right at the start. Than i scan through each element and lets every 20 elements i'll need to insert an new element. The insert would happen at the current position and maybe one element back.
I don't exactly need to remember the position but i also don't exactly need random access either. I'd like fast inserts. Does deque and vector have a heavy price to pay on insert? Should i use list?
My other option is to have a 2nd deque list and as i go through each element insert it to the other deque list unless i need to do the insert i am talking about. This does need to be fast as its a performance intensive app. But I am using a lot of pointers (each element is a pointer) which is upsetting me but there isn't a way around that so i should assume L1 cache will always miss?
I'd start with std::vector in this case, but use a second std::vector for your mass mutations, reserve() appropriately, then swap() the vectors.
Update
It would take this general form:
std:vector<t_object*> source; // << source already holds 10000 elements
std:vector<t_object*> tmp;
// to minimize reallocations and frees to 1 and 1, if possible.
// if you do not swap or have to grow more, reserving can really work against you.
tmp.reserve(aMeaningfulReserveValue);
while (performingMassMutation) {
// "i scan through each element and lets every 20 elements"
for (twentyElements)
tmp.push_back(source[readPos++]);
// "every 20 elements i'll need to insert an new element"
tmp.push_back(newElement);
}
// approximately 500 iterations later…
source.swap(tmp);
Borealid brought up a good point, which is measure -- execution varies dramatically depending on your std library implementations, data sizes, complexity to copy, and so on.
For raw pointers of a collection this size with my configuration, the vector mass mutation and push_back above was 7 times faster than std::list insertion. push_back was faster than vector's range insertion.
As Emile points out below, std::vector::swap() does not need to move or reallocate elements -- it can just swap out internals (provided the allocators are the same type).
First off, the answer to all performance questions is "benchmark it". Always. Now...
If you don't care about the memory overhead, and you don't need random access, but you do care about having constant-time insertions, list is probably right for you.
std::vector will have constant-time insertions at the end when it has sufficient capacity. When the capacity is exceeded, it needs a linear-time copy. deque is better because it links discrete allocations, avoiding a complete copy and letting you do constant-time insertions at the front as well. Random insertions (every 20 elements) will always be linear time.
As for cache locality, a vector is as good as you can get (contiguous memory), but you said you cared about insertions rather than lookups; in my experience, when that's the case you don't care about how hot the cache gets as you scan through to dump, so list's poor behavior doesn't much matter.
Lists are useful when either you frequently want to insert elements in the middle of the collection, or frequently remove them. Lists are, however, slow to read.
Vectors are very fast to read and very fast when you only want to add or remove elements at the end of the collection, but they are very slow when you insert elements in the middle. This is because it has to move all elements after the desired position by one place, to make room for the new element.
Deques are basically doubly linked lists that can be used as vectors.
If you don't need to insert elements in the middle of the collection (you don't care about the order), I suggest you use vector. If you can approximate the number of elements that will be introduced in the vector from the beginning, you should also use std::vector::reserve to allocate memory necessary from the beginning. The value you pass to reserve doesn't need to be exact, just approximate; if it's smaller than needed, the vector will resize automatically, when necessary.
You can go two ways: list is always an option for random place insertions, however as you allocate every element separately this will cause some performance implications too. The other option of inserting in-place in the deque is not good as well - because you will pay linear time for every insertion. Maybe your idea of inserting in new deque is the best here - you pay twice as much memory, but on the other hand you always do insertion either at the end of the second deque, or one element before that - this all gives constant amortized time, and still you have good caching of the container.
The number of copies done for std::vector/deque ::insert etc is proportional to the number of elements between the insert position and the end of container (the number of elements that need to be shifted to make room). The worst-case for a std::vector is O(N) - when you insert at the front of the container. If you're inserting M elements the worst -case is therefore O(M*N) which isn't great.
There could also be a reallocation involved if the containers capacity is exceeded. You could prevent reallocation by ensuring that sufficient space was ::reserve'd up front.
You're other suggestion - copying to a second std::vector/deque container could be better in that it could always be organised to achieve O(N) complexity, but at the cost of temporarily storing two containers.
Using a std::list would allow you to achieve in-place O(1) inserts, but at the cost of additional memory overhead (storing the list pointers etc) and reduced memory locality (list nodes are not allocated contiguously). You could improve the memory locality by using a pool'd memory allocator (Boost pools maybe?).
Overall you'd have to benchmark to really sort out which is "the fastest" approach.
Hope this helps.
If you need fast inserts in the middle, but don't care about random access, vector and deque are definitely not for you: For those, every time you insert something, all elements between that one and the end have to be moved. Of the built-in containers, list is almost certainly your best bet. However a better data structure for your scenario would probably be a VList because it provides better cache locality, however that's not provided by the C++ standard library. The Wikipedia page links to a C++ implementation, however from a quick view on the interface it doesn't seem to completely STL compatible; I don't know if this is an issue for you.
Of course, in the end the only way to be sure which is the optimal solution is to measure the performance.