std::list sort algorithm runtime - c++

I have a list of elements that is almost in the correct order and the elements are just off by a relatively small amount of places compared to their correct position (e.g. no element that is supposed to be in the front of the list is in the end).
< TL;DR >
Practical origin: I have an incoming stream of UDP-Packages that contain signals all marked with a certain timestamp. Evaluating the data has shown, that the packages have not been send (or received) in the right order, so that the timestamp is not constantly increasing but jittering a bit. To export the data I need to sort it in advance.
< /TL;DR >
I want to use std::list.sort() to sort this list.
What is the sorting algorithm used by std::list.sort() and how is it affected by the fact that the list is almost sorted. I have a "feeling", that a divide-and-conquer based algorithm might profit from it.
Is there a more efficient algorithm for my quite specific problem?

If every element is within k positions of its proper place, then insertion sort will take less than kN comparisons and swaps/moves. It's also very easy to implement.
Compare this to the N*log(N) operations required by merge sort or quick sort to see if that will work better for you.

It is not defined which algorithm uses, allthough it should be around N log N on average, such as quicksort.
If you are appending packets to the end of the "queue" as you consume them, so you want the queue always sorted, then you can expect any new packets correct position to nearly always be near the "end".
Therefore, rather than sorting the entire queue, just insert the packet into the correct position. Start at the back of the queue and compare the existing packets timestamp with those already there, insert it after the first packet with a smaller timestamp (likely to always be the end) or the front if there is no such packet in the event things are greatly out of order.
Alternatively, if you want to add all packets in order and then sort it, Bubble Sort should be fairly optimal because the list should still be nearly sorted already.

On mostly-sorted data Insertion Sort and Bubble Sort are among the most common ones that perform the best.
Live demo
Note also that having a list structure puts an extra constraint on indexed access, so algorithms that require indexed access will perform extra poorly. Therefore insertion sort is an even better fit since it needs only sequential access.

In the case of Visual Studio prior to 2015, a bottom up merge sorting using 26 internal lists is used, following the algorithm shown in this wiki article:
https://en.wikipedia.org/wiki/Merge_sort#Bottom-up_implementation_using_lists
Visual Studio 2015 added support for not having a default allocator. The 26 internal lists initializers could have been expanded to 26 instances of initializers with user specified allocators, specifically: _Myt _Binlist[_MAXBINS+1] = { _Myt(get_allocator()), ... , _Myt(get_allocator())};, but instead someone at Microsoft switched to a top down merge sort based on iterators. This is slower, but has the advantage that it doesn't need special recovery if the compare throws an exception. The author of that change pointed out the fact that if performance is a goal, then it's faster to copy the list to an array or vector, sort the array or vector, then create a new list from the array or vector.
There's a prior thread about this.
`std::list<>::sort()` - why the sudden switch to top-down strategy?
In your case something like an insertion sort on a doubly linked list should be faster, if an out order node is found, remove the from the list, scan backwards or forwards to the proper spot and insert the node back into the list. If you want to used std::list, you can use iterators, erase, and emplace to "move" nodes, but that involves freeing and reallocating a node for each move. It would be faster to implement this with your own doubly linked list, in which case you can just manipulate the links, avoiding the freeing and reallocation of memory if using std::list.

Related

Best data structure/ container in C++ for insertion and deletion

I am looking for the best data structure for C++ in which insertion and deletion can take place very efficiently and fast.
Traversal should also be very easy for this data structure. Which one should i go with?
What about SET in C++??
A linked list provides efficient insertion and deletion of arbitrary elements. Deletion here is deletion by iterator, not by value. Traversal is quite fast.
A dequeue provides efficient insertion and deletion only at the ends, but those are faster than for a linked list, and traversal is faster as well.
A set only makes sense if you want to find elements by their value, e.g. to remove them. Otherwise the overhead of checking for duplicate as well as that of keeping things sorted will be wasted.
It depends on what you want to put into this data structure. If the items are unordered or you care about their order, list<> could be used. If you want them in a sorted order, set<> or multiset<> (the later allows multiple identical elements) could be an alternative.
list<> is typically a double-linked list, so insertion and deletion can be done in constant time, provided you know the position. traversal over all elements is also fast, but accessing a specified element (either by value or by position) could become slow.
set<> and its family are typically binary trees, so insertion, deletion and searching for elements are mostly in logarithmic time (when you know where to insert/delete, it's constant time). Traversal over all elements is also fast.
(Note: boost and C++11 both have data structures based on hash-tables, which could also be an option)
I would say a linked list depending on whether or not you're deletions are specific and often. Iterator about it.
It occurs to me, that you need a tree.
I'm not sure about the exact structure (since you didnt provide in-detail info), but if you can put your data into a binary tree, you can achieve decent speed at searching, deleting and inserting elements ( O(logn) average and O(n) worst case).
Note that I'm talking about the data structure here, you can implement it in different ways.

Queue-like data structure with random access element removal

Is there a data structure like a queue which also supports removal of elements at arbitrary points? Enqueueing and dequeueing occur most frequently, but mid-queue element removal must be similar in speed terms since there may be periods where that is the most common operation. Consistency of performance is more important than absolute speed. Time is more important than memory. Queue length is small, under 1,000 elements at absolute peak load.In case it's not obvious I'll state it explicitly: random insertion is not required.
Have tagged C++ since that is my implementation language, but I'm not using (and don't want to use) any STL or Boost. Pure C or C++ only (I will convert C solutions to a C++ class.)
Edit: I think what I want is a kind of dictionary that also has a queue interface (or a queue that also has a dictionary interface) so that I can do things like this:
Container.enqueue(myObjPtr1);
MyObj *myObjPtr2 = Container.dequeue();
Container.remove(myObjPtr3);
I think that double-link list is exactly what you want (assuming you do not want a priority queue):
Easy and fast adding elements to both ends
Easy and fast removal of elements from anywhere
You can use std::list container, but (in your case) it is difficult to remove an element
from the middle of the list if you only have a pointer (or reference) to the element (wrapped in STL's list element), but
you do not have an iterator. If using iterators (e.g. storing them) is not an option - then implementing a double linked list (even with element counter) should be pretty easy. If you implement your own list - you can directly operate on pointers to elements (each of them contains pointers to both of its neighbours). If you do not want to use Boost or STL this is probably the best option (and the simplest), and you have control of everything (you can even write your own block allocator for list elements to speed up things).
One option is to use an order statistic tree, an augmented tree structure that supports O(log n) random access to each element, along with O(log n) insertion and deletion at arbitrary points. Internally, the order statistic tree is implemented as a balanced binary search treewith extra information associated with it. As a result, lookups are a slower than in a standard dynamic array, but the insertions are much faster.
Hope this helps!
You can use a combination of a linked list and a hash table. In java it is called a LinkedHashSet.
The idea is simple, have a linked list of elements, and also maintain a hash map of (key,nodes), where node is a pointer to the relevant node in the linked list, and key is the key representing this node.
Note that the basic implementation is a set, and some extra work will be needed to make this data structure allow dupes.
This data structure allows you both O(1) head/tail access, and both O(1) access to any element in the list. [all on average armotorized]

Why is inserting multiple elements into a std::set simultaneously faster?

I'm reading through:
"The C++ Standard Library: A Tutorial and Reference by Nicolai M.
Josuttis"
and I'm in the section about Sets & Multisets. I came across a line regarding Inserting and Removing elements:
"Inserting and removing happens faster if, when working with multiple
elements, you use a single call for all elements rather than multiple
calls."
I'm far from a data structures master, but I'm aware that they're implemented with red-black trees. What I don't understand from that is how would the STL implementers write an algorithm to insert multiple elements at once in a faster manner?
Can anyone shed some light on why this quote is true for me?
My first thought was that it might rebalance the tree only after inserting/erasing the whole range. Since the whole operation is inlined in practice, that seems more likely than the number of function calls.
Checking the GCC headers on my local machine, this doesn't seem to be the case - and anyway, I don't know how the tradeoff between reduced rebalancing activity, and potentially increased search times for intermediate inserts to an unbalanced tree, would work out.
Maybe it's considered a QoI issue, but in any case, using the most expressive method is probably best, not just because it saves you writing a for loop and shows your intention most clearly, but because it leaves library writers latitude to optimise more aggressively in the future, without you needing to know and change your code.
There are two reasons:
1) Making a single call for multiple elements instead of N times more calls.
2) The insertion operation checks for each element inserted whether another element exists already in the container with the same value. This can be optimized when insterting multiple elements together.
What you read as you quoted is wrong. Inserting to a std::set is O(log n), unless you use the insert() overload with the position iterator, in which case it is amortized O(n) when the position is valid. But, if you use the range overload with sorted elements then you get O(n) insertion.
Memory management could be a good reason. In this case it could allocate the memory just once. If all elements are called separatelly, all calls try allocate memory separatelly. As I know, most of the set and map implementations try to keep the memory in the same page, or pages near together to minimalize page faults.
I'm not sure about this, but I think that if the number of elements inserted is smaller than the number of elements in the set, then it can be more efficient to sort the inserted range before performing the insertions. This way, all values can be inserted in a single pass over the tree, and duplicates in the inserted range can be easily eliminated (or inserted very fast in the case of a multiset).
Of course, this optimization is only possible if the input iterators allows sorting the input range (i.e. if they are random iterators).

sorting an STL List after all pushbacks or just use Multimap?

We were using a multimap<int,string> to store several hundred thousand items (>300K), when we realized we needed to add more data for analysis. So we created a class that held a few items and the necessary overridden operators for stl and used a multimap<ourStruct,String>. This worked fine and didn't take much longer than before (with some test data), when we then realized an stl <list> would do just fine, as long as we sorted it after we finished adding all the items. To our surprise, we found that adding all items to multimap still easily beats the total time to add all items to list, and then sort.
This doesn't make sense to us EE types, since by our thinking every insert to multimap would have to traverse the list then tack it on to the end, where as with list we would just add on to the end (via push back), then hopefully the sort wouldn't take as long.
One more factoid: we intially did the comparison test with out sorting the list and were thrilled to see significant speed ups in speed using list. Then we added the sort, and were a bit stunned...
Any of the CS gurus out there care to weigh in?
std::multimap uses a balanced tree1, so it does not traverse the entire list when you insert an item. The number of items traversed for an insert is approximately the base 2 logarithm of the number of items in the collection.
Based on what you've said, your best bet would probably be to put your data in a vector, and then sort.
1 Technically, the standard doesn't directly require a balanced tree, but it requires ability to traverse in sorted order, and logarithmic complexity for insertions and deletions in the worst case, and I'm not aware of many other data structures that can meet that requirement.
Removed ref to hash .. Balanced tree is why only an n2 traverse is required.

Should use an insertion sort or construct a heap to improve performance?

We have large (100,000+ elements) ordered vectors of structs (operator < overloaded to provide ordering):
std::vector < MyType > vectorMyTypes;
std::sort(vectorMyType.begin(), vectorMyType.end());
My problem is that we're seeing performance problems when adding new elements to these vectors while preserving sort order. At the moment we're doing something like:
for ( a very large set )
{
vectorMyTypes.push_back(newType);
std::sort(vectorMyType.begin(), vectorMyType.end());
...
ValidateStuff(vectorMyType); // this method expects the vector to be ordered
}
This isn't exactly what our code looks like since I know this example could be optimised in different ways, however it gives you an idea of how performance could be a problem because I'm sorting after every push_back.
I think I essentially have two options to improve performance:
Use a (hand crafted?) insertion sort instead of std::sort to improve the sort performance (insertion sorts on a partially sorted vector are blindingly quick)
Create a heap by using std::make_heap and std::push_heap to maintain the sort order
My questions are:
Should I implement an insertion sort? Is there something in Boost that could help me here?
Should I consider using a heap? How would I do this?
Edit:
Thanks for all your responses. I understand that the example I gave was far from optimal and it doesn't fully represent what I have in my code right now. It was simply there to illustrate the performance bottleneck I was experiencing - perhaps that's why this question isn't seeing many up-votes :)
Many thanks to you Steve, it's often the simplest answers that are the best, and perhaps it was my over analysis of the problem that blinded me to perhaps the most obvious solution. I do like the neat method you outlined to insert directly into a pre-ordered vector.
As I've commented, I'm constrained to using vectors right now, so std::set, std::map, etc aren't an option.
Ordered insertion doesn't need boost:
vectorMyTypes.insert(
std::upper_bound(vectorMyTypes.begin(), vectorMyTypes.end(), newType),
newType);
upper_bound provides a valid insertion point provided that the vector is sorted to start with, so as long as you only ever insert elements in their correct place, you're done. I originally said lower_bound, but if the vector contains multiple equal elements, then upper_bound selects the insertion point which requires less work.
This does have to copy O(n) elements, but you say insertion sort is "blindingly fast", and this is faster. If it's not fast enough, you have to find a way to add items in batches and validate at the end, or else give up on contiguous storage and switch to a container which maintains order, such as set or multiset.
A heap does not maintain order in the underlying container, but is good for a priority queue or similar, because it makes removal of the maximum element fast. You say you want to maintain the vector in order, but if you never actually iterate over the whole collection in order then you might not need it to be fully ordered, and that's when a heap is useful.
According to item 23 of Meyers' Effective STL, you should use a sorted vector if you application use its data structures in 3 phases. From the book, they are :
Setup. Create a new data structure by inserting lots of elements into it. During this phase, almost all operation are insertions and erasure. Lookups are rare on nonexistent
Lookup. Consult the data structure to find specific pieces of information. During this phase, almost all operations are lookups. Insertion and erasures are rare or nonexistent. There are so many lookups, the performance of this phase makes the performance of the other phases incidental.
Reorganize. Modify the content of the data structure. perhaps by erasing all the current data and inserting new data in its place. Behaviorally, this phase is equivalent to phase 1. Once this phase is completed, the application return to phase 2
If your use of your data structure resembles this, you should use a sorted vector, and then use a binary_search as mentionned. If not, a typical associative container should do it, that means a set, multi-set, map or multimap as those structure are ordered by default
Why not just use a binary search to find where to insert the new element? Then you will insert exactly into the required position.
If you need to insert a lot of elements into a sorted sequence, use std::merge, potentially sorting the new elements first:
void add( std::vector<Foo> & oldFoos, const std::vector<Foo> & newFoos ) {
std::vector<Foo> merged;
// precondition: oldFoos _and newFoos_ are sorted
merged.reserve( oldFoos.size() + newFoos.size() ); // only for std::vector
std::merge( oldFoos.begin(), oldFoos.end(),
newFoos.begin(), newFoos.end(),
std::back_inserter( merged );
// apply std::unique, if wanted, here
merged.erase( std::unique( merged.begin(), merged.end() ), merged.end() );
oldFoos.swap( merged ); // commit changes
}
Using a binary search to find the insertion location isn't going to speed up the algorithm much because it will still be O(N) to do the insertion (consider inserting at the beginning of a vector - you have to move every element down one to create the space).
A tree (aka heap) will be O(log(N)) to insert, much better performance.
See http://www.sgi.com/tech/stl/priority_queue.html
Note that a tree will still have worst case O(N) performance for insert unless it is balanced, e.g. an AVL tree.
Why not to use boost::multi_index ?
NOTE: boost::multi_index does not provide memory contiguity, a property of std::vectors by which elements are stored adjacent to one another in a single block of memory.
There are a few things you need to do.
You may want to consider making use of reserve() to avoid excessive re-allocing of the entire vector. If you have knowledge of the size it will grow to, you may gain some performance by doing resrve()s yourself (rather than having the implemetation do them automaticaly using the built in heuristic).
Do a binary search to find the insertion location. Then resize and shift everything following the insertion point up by one to make room.
Consider: do you really want to use a vector? Perhaps a set or map are better.
The advantage of binary search over lower_bound is that if the insertion point is close to the end of the vector you don't have to pay the theta(n) complexity.
If you want insert an element into the "right" position, why do you plan on using sort. Find the position using lower_bound and insert, using, well, `insert' method of the vector. That will still be O(N) to insert new item.
heap is not going to help you, because heap is not sorted. It allows you get get at the smallest element quickly, and then quickly remove it and get next smallest element. However, the data in heap is not stored in sort order, so if you have algorithms that must iterate over data in order, it will not help.
I am afraid you description skimmed to much detail, but it seems like list is just not the right element for the task. std::deque is much better suited for insertion in the middle, and you might also consider std::set. I suggest you explain why you need to keep the data sorted to get more helpful advice.
You might want to consider using a BTree or a Judy Trie.
You don't want to use contiguous memory for large collections, insertions should not take O(n) time;
You want to use at least binary insertion for single elements, multiple elements should be presorted so you can make the search boundaries smaller;
You do not want your data structure wasting memory, so nothing with left and right pointers for each data element.
As others have said I'd probably have created a BTree out of a linked list instead of using a vector. Even if you got past the sorting issue, vectors have the problem of fully reallocating when they need to grow, assuming you don't know your maximum size before hand.
If you are worried about a list allocating on different memory pages and causing cache related performance issues, preallocate your nodes in an array, (pool the objects) and insert these into the list.
You can add a value in your data type that denotes if it is allocated off the heap or from a pool. This way if you detect that your pool runs out of room, you can start allocating off the heap and throw an assert or something to yourself so you know to bump up the pool size (or make this a command line option to set.
Hope this helps, as I see you already have lots of great answers.