Why is inserting multiple elements into a std::set simultaneously faster?

Why is inserting multiple elements into a std::set simultaneously faster? - c++

I'm reading through:
"The C++ Standard Library: A Tutorial and Reference by Nicolai M.
Josuttis"
and I'm in the section about Sets & Multisets. I came across a line regarding Inserting and Removing elements:
"Inserting and removing happens faster if, when working with multiple
elements, you use a single call for all elements rather than multiple
calls."
I'm far from a data structures master, but I'm aware that they're implemented with red-black trees. What I don't understand from that is how would the STL implementers write an algorithm to insert multiple elements at once in a faster manner?
Can anyone shed some light on why this quote is true for me?

My first thought was that it might rebalance the tree only after inserting/erasing the whole range. Since the whole operation is inlined in practice, that seems more likely than the number of function calls.
Checking the GCC headers on my local machine, this doesn't seem to be the case - and anyway, I don't know how the tradeoff between reduced rebalancing activity, and potentially increased search times for intermediate inserts to an unbalanced tree, would work out.
Maybe it's considered a QoI issue, but in any case, using the most expressive method is probably best, not just because it saves you writing a for loop and shows your intention most clearly, but because it leaves library writers latitude to optimise more aggressively in the future, without you needing to know and change your code.

There are two reasons:
1) Making a single call for multiple elements instead of N times more calls.
2) The insertion operation checks for each element inserted whether another element exists already in the container with the same value. This can be optimized when insterting multiple elements together.

What you read as you quoted is wrong. Inserting to a std::set is O(log n), unless you use the insert() overload with the position iterator, in which case it is amortized O(n) when the position is valid. But, if you use the range overload with sorted elements then you get O(n) insertion.

Memory management could be a good reason. In this case it could allocate the memory just once. If all elements are called separatelly, all calls try allocate memory separatelly. As I know, most of the set and map implementations try to keep the memory in the same page, or pages near together to minimalize page faults.

I'm not sure about this, but I think that if the number of elements inserted is smaller than the number of elements in the set, then it can be more efficient to sort the inserted range before performing the insertions. This way, all values can be inserted in a single pass over the tree, and duplicates in the inserted range can be easily eliminated (or inserted very fast in the case of a multiset).
Of course, this optimization is only possible if the input iterators allows sorting the input range (i.e. if they are random iterators).

Related

iterate ordered versus unordered containers

I want to know which data-structures are more efficient for iterating through their elements between std::set, std::map and std::unordered_set, std::unordered_map.
I searched through SO and I found this question. The answers either propose to copy the elements in a std::vector or to use Boost.Container, which IMHO don't answer my question.
My purpose is to keep in a container a big number of unique elements, that most of the time I want to iterate through them. Insertions and extractions are more rare. I want to avoid std::vector in combination with std::unique.

Lets consider set vs unordered_set.
The main difference here is the 'nature' of the iteration, that is the traversal of the set will give you the elements in order while traversing a range in an unordered set will give you a bunch of values in no particular order.
Suppose you want to traverse a range [it1, it2]. If we exclude the lookup time that's needed to find elements it1 and it2 there can be no direct mapping from one case to another since the elements in between are not guarrandeed to be the same even if you've used the same elements to construct the container.
There are cases however where something like this has meaning when e.g. you want to traverse a fixed number of elements (regardless of what they are) or when you need to traverse the whole container. In such cases you need to consider implementation mechanics :
Sets are usually implemented like Red–black trees (a form of binary search trees). Like all binary search trees allow efficient in-order traversal (LRR: left root right) of their elements. That is to traverse you pay the cost of pointer chasing (just like traversing a list).
Unordered sets on the other hand are hash tables and to my knowledge the STL implementation uses hashing with chaining. That means (in a very very high level) that what's used for the structure is a (contiguous) buffer where each element is the head of a chain (list) that contains the elements. The way the elements are layed out across those chains (buckets) and across the buffer will affect the traversal time, however you'll be chasing pointers once again jumping through differents lists as well this time. I don't think it'll vary significantly from the tree case but won't be any better for sure.
In any case micro tuning and benchmarking will give you the answer for your particular application.

The difference does not lie between the ordering or lack of one but in the backing container. If it's a contiguous memory it should be fast to iterate over, due to simple implementation of iterator and cache friendliness.
Unordered containers are usually stored as a vector of vectors (or a similar thing), while ordered containers are implemented using trees, but it is left for implementation after all. This would suggest that iterating over unordered version should be waster. However this is left for implementation after all, and I saw implementations (which bent rules a little to be fair) with different behaviour.
Generally speaking, container performance is quite a complex topic and usually has to be tested in actual application to get reliable answer. There is plenty on implemention-defined stuff that might affect the performance. I'd go with hash_set if I had to go in blind. Copying into a vector might also turn out a good option.
EDIT: As #TonyD said in it's comment, there is a rule, that disallows invalidating iterators during addition of element when the max_load_factor() is not exceeded, this practically rules out backing containers which are contiguous in memory.
Thus, copying everything into a vector seems like even more reasonable option. If you need to remove duplicates, a feasible option might be to use http://en.cppreference.com/w/cpp/algorithm/sort and have dupes easily ignored. I have heard that using vector and sort to have a sorted array (or vector) is quite often a used option in case of need for a container that needs to be sorter and is being iterated over more often than modified.

iterate from fastest to slowest should be : set > map > unordered_set > unordered_map;
set is a little lighter than map, and they are ordered with binary tree rule, so should be faster than unordered_ containers.

Choosing List or Vector for a given scenario in C++

For my application I am using STD vector. I am inserting to the vector at the end, but erasing from vector randomly i.e element can be erased from middle, front anywhere. These two are only requirement, 1)insert at the end 2) erase from anywhere.
So should I use STD List, since erasing does shifting of data. Or I would retain Vector in my code for any reason??
Please give comment, If Vector is the better option, how it would be better that List here?

One key reason to use std::vector over std::list is cache locality. A list is terrible in this regard, because its elements can be (and usually are) fragmented in your memory. This will degrade performance significantly.
Some would recommend using std::vector almost always. In terms of performance, cache locality is often more important than the complexity of insertion or deletion.
Here's a video about Bjarne Stroustrup's opinion regarding subject.

I would refer you to this cheat sheet, and the conclusion would be the list.

A list supports deletion at an arbitrary but known position in constant time.
Finding that position takes linear time, just like modifying a vector.
The only advantage of the list is if you repeatedly erase (or insert) at (approximately) the same position.
If you're erasing more or less at random, chances are that the better memory locality of the vector could win out in the end.
The only way to be sure is to measure and compare.

List is better in this case most probably. The advantage of a list over vector is that it supports deletion at arbitrary position with constant complexity. A vector would only be better choice if you require constant index operation of elements of the container. Still you have to take into consideration how is the element you would like to delete passed to your function for deletion. If you only pass an index, vector will be able to find the element in constant time, while in list you will have to iterate. In this case I would benchmark the two solution, but still I would bet on list performing better.

It depends on many factors and how are you using your data.
One factor: do you need an erase that maintains the order of the collection? or you can live with changing order?
Other factor: what kind of data is in the collection? (numbers: ints/floats) || pointers || objects
Not keeping order
You could continue using vector if the data is basic numbers or pointers, to delete one element you could copy the last element of the vector over the deleted position, then pop_back(). This way you avoid moving all the data.
If using objects, you could still use the same method if the object you need to copy is small.
Keeping order
Maybe List would be your friend here, still, some tests would be advised, depends on size of data, size of list, etc

std::set<T>::insert, duplicate elements

What would be an efficient implementation for a std::set insert member function? Because the data structure sorts elements based on std::less (operator < needs to be defined for the element type), it is conceptually easy to detect a duplicate.
How does it actually work internally? Does it make use of the red-back tree data structure (a mentioned implementation detail in the book of Josuttis)?
Implementations of the standard data structures may vary...
I have a problem where I am forced to have a (generally speaking) sets of integers which should be unique. The length of the sets varies so I am in need of dynamical data structure (based on my narrow knowledge, this narrows things down to list, set). The elements do not necessarily need to be sorted, but there may be no duplicates. Since the candidate sets always have a lot of duplicates (sets are small, up to 64 elements), will trying to insert duplicates into std::set with the insert member function cause a lot of overhead compared to std::list and another algorithm that may not resort to having the elements sorted?
Additional: the output set has a fixed size of 27 elements. Sorry, I forgot this... this works for a special case of the problem. For other cases, the length is arbitrary (lower than the input set).

If you're creating the entire set all at once, you could try using std::vector to hold the elements, std::sort to sort them, and std::unique to prune out the duplicates.

The complexity of std::set::insert is O(log n), or amortized O(1) if you use the "positional" insert and get the position correct (see e.g. http://cplusplus.com/reference/stl/set/insert/).
The underlying mechanism is implementation-dependent. It's often a red-black tree, but this is not mandated. You should look at the source code for your favourite implementation to find out what it's doing.
For small sets, it's possible that e.g. a simple linear search on a vector will be cheaper, due to spatial locality. But the insert itself will require all the following elements to be copied. The only way to know for sure is to profile each option.

When you only have 64 possible values known ahead of time, just take a bit field and flip on the bits for the elements actually seen. That works in n+O(1) steps, and you can't get less than that.
Inserting into a std::set of size m takes O(log(m)) time and comparisons, meaning that using an std::set for this purpose will cost O(n*log(n)) and I wouldn't be surprised if the constant were larger than for simply sorting the input (which requires additional space) and then discarding duplicates.
Doing the same thing with an std::list would take O(n^2) average time, because finding the insertion place in a list needs O(n).
Inserting one element at a time into an std::vector would also take O(n^2) average time – finding the insertion place is doable in O(log(m)), but elements need to me moved to make room. If the number of elements in the final result is much smaller than the input, that drops down to O(n*log(n)), with close to no space overhead.
If you have a C++11 compiler or use boost, you could also use a hash table. I'm not sure about the insertion characteristics, but if the number of elements in the result is small compared to the input size, you'd only need O(n) time – and unlike the bit field, you don't need to know the potential elements or the size of the result a priori (although knowing the size helps, since you can avoid rehashing).

Should I randomly shuffle before inserting into STL set?

I need to insert 10-million strings into a C++ STL set. The strings are sorted. Will I have a pathological problem if I insert the strings in sorted order? Should I randomize first? Or will the G++ STL implementation automatically rebalance for me?

The set implementation typically uses a red-black tree, which will rebalance for you. However, insertion may be faster (or it may not) if you randomise the data before inserting - the only way to be sure is to do a test with your set implementation and specific data. Retrieval times will be the same, either way.

The implementation will re-balance automatically. Given that you know the input is sorted, however, you can give it a bit of assistance: You can supply a "hint" when you do an insertion, and in this case supplying the iterator to the previously inserted item will be exactly the right hint to supply for the next insertion. In this case, each insertion will have amortized constant complexity instead of the logarithmic complexity you'd otherwise expect.

The only question I have: do you really need a set ?
If the data is already sorted and you don't need to insert / delete elements after the creation, a deque would be better:
you'll have the same big-O complexity using a binary search for retrieval
you'll get less memory overhead... and better cache locality
On binary_search: I suspect you need more than a ForwardIterator for a binary search, guess this site is off again :(

http://en.wikipedia.org/wiki/Standard_Template_Library
set: "Implemented using a self-balancing binary search tree."

g++'s libstdc++ uses red black trees for sets and maps.
http://en.wikipedia.org/wiki/Red-black_tree
This is a self balancing tree, and insertions are always O(log n). The C++ standard also requires that all implementations have this characteristic, so in practice, they are almost always red black trees, or something very similar.
So don't worry about the order you put the elements in.

A very cheap and simple solution is to insert from both ends of your collections of strings. That is to say, first add "A", then "ZZZZZ", then "AA", then "ZZZZY", etcetera until you meet in the middle. It doesn't require the hefty cost of shuffling, yet it is likely to sidestep pathological cases.

Maybe 'unordered_set' can be an alternative.

Should use an insertion sort or construct a heap to improve performance?

We have large (100,000+ elements) ordered vectors of structs (operator < overloaded to provide ordering):
std::vector < MyType > vectorMyTypes;
std::sort(vectorMyType.begin(), vectorMyType.end());
My problem is that we're seeing performance problems when adding new elements to these vectors while preserving sort order. At the moment we're doing something like:
for ( a very large set )
{
vectorMyTypes.push_back(newType);
std::sort(vectorMyType.begin(), vectorMyType.end());
...
ValidateStuff(vectorMyType); // this method expects the vector to be ordered
}
This isn't exactly what our code looks like since I know this example could be optimised in different ways, however it gives you an idea of how performance could be a problem because I'm sorting after every push_back.
I think I essentially have two options to improve performance:
Use a (hand crafted?) insertion sort instead of std::sort to improve the sort performance (insertion sorts on a partially sorted vector are blindingly quick)
Create a heap by using std::make_heap and std::push_heap to maintain the sort order
My questions are:
Should I implement an insertion sort? Is there something in Boost that could help me here?
Should I consider using a heap? How would I do this?
Edit:
Thanks for all your responses. I understand that the example I gave was far from optimal and it doesn't fully represent what I have in my code right now. It was simply there to illustrate the performance bottleneck I was experiencing - perhaps that's why this question isn't seeing many up-votes :)
Many thanks to you Steve, it's often the simplest answers that are the best, and perhaps it was my over analysis of the problem that blinded me to perhaps the most obvious solution. I do like the neat method you outlined to insert directly into a pre-ordered vector.
As I've commented, I'm constrained to using vectors right now, so std::set, std::map, etc aren't an option.

Ordered insertion doesn't need boost:
vectorMyTypes.insert(
std::upper_bound(vectorMyTypes.begin(), vectorMyTypes.end(), newType),
newType);
upper_bound provides a valid insertion point provided that the vector is sorted to start with, so as long as you only ever insert elements in their correct place, you're done. I originally said lower_bound, but if the vector contains multiple equal elements, then upper_bound selects the insertion point which requires less work.
This does have to copy O(n) elements, but you say insertion sort is "blindingly fast", and this is faster. If it's not fast enough, you have to find a way to add items in batches and validate at the end, or else give up on contiguous storage and switch to a container which maintains order, such as set or multiset.
A heap does not maintain order in the underlying container, but is good for a priority queue or similar, because it makes removal of the maximum element fast. You say you want to maintain the vector in order, but if you never actually iterate over the whole collection in order then you might not need it to be fully ordered, and that's when a heap is useful.

According to item 23 of Meyers' Effective STL, you should use a sorted vector if you application use its data structures in 3 phases. From the book, they are :
Setup. Create a new data structure by inserting lots of elements into it. During this phase, almost all operation are insertions and erasure. Lookups are rare on nonexistent
Lookup. Consult the data structure to find specific pieces of information. During this phase, almost all operations are lookups. Insertion and erasures are rare or nonexistent. There are so many lookups, the performance of this phase makes the performance of the other phases incidental.
Reorganize. Modify the content of the data structure. perhaps by erasing all the current data and inserting new data in its place. Behaviorally, this phase is equivalent to phase 1. Once this phase is completed, the application return to phase 2
If your use of your data structure resembles this, you should use a sorted vector, and then use a binary_search as mentionned. If not, a typical associative container should do it, that means a set, multi-set, map or multimap as those structure are ordered by default

Why not just use a binary search to find where to insert the new element? Then you will insert exactly into the required position.

If you need to insert a lot of elements into a sorted sequence, use std::merge, potentially sorting the new elements first:
void add( std::vector<Foo> & oldFoos, const std::vector<Foo> & newFoos ) {
std::vector<Foo> merged;
// precondition: oldFoos _and newFoos_ are sorted
merged.reserve( oldFoos.size() + newFoos.size() ); // only for std::vector
std::merge( oldFoos.begin(), oldFoos.end(),
newFoos.begin(), newFoos.end(),
std::back_inserter( merged );
// apply std::unique, if wanted, here
merged.erase( std::unique( merged.begin(), merged.end() ), merged.end() );
oldFoos.swap( merged ); // commit changes
}

Using a binary search to find the insertion location isn't going to speed up the algorithm much because it will still be O(N) to do the insertion (consider inserting at the beginning of a vector - you have to move every element down one to create the space).
A tree (aka heap) will be O(log(N)) to insert, much better performance.
See http://www.sgi.com/tech/stl/priority_queue.html
Note that a tree will still have worst case O(N) performance for insert unless it is balanced, e.g. an AVL tree.

Why not to use boost::multi_index ?
NOTE: boost::multi_index does not provide memory contiguity, a property of std::vectors by which elements are stored adjacent to one another in a single block of memory.

There are a few things you need to do.
You may want to consider making use of reserve() to avoid excessive re-allocing of the entire vector. If you have knowledge of the size it will grow to, you may gain some performance by doing resrve()s yourself (rather than having the implemetation do them automaticaly using the built in heuristic).
Do a binary search to find the insertion location. Then resize and shift everything following the insertion point up by one to make room.
Consider: do you really want to use a vector? Perhaps a set or map are better.
The advantage of binary search over lower_bound is that if the insertion point is close to the end of the vector you don't have to pay the theta(n) complexity.

If you want insert an element into the "right" position, why do you plan on using sort. Find the position using lower_bound and insert, using, well, `insert' method of the vector. That will still be O(N) to insert new item.
heap is not going to help you, because heap is not sorted. It allows you get get at the smallest element quickly, and then quickly remove it and get next smallest element. However, the data in heap is not stored in sort order, so if you have algorithms that must iterate over data in order, it will not help.
I am afraid you description skimmed to much detail, but it seems like list is just not the right element for the task. std::deque is much better suited for insertion in the middle, and you might also consider std::set. I suggest you explain why you need to keep the data sorted to get more helpful advice.

You might want to consider using a BTree or a Judy Trie.
You don't want to use contiguous memory for large collections, insertions should not take O(n) time;
You want to use at least binary insertion for single elements, multiple elements should be presorted so you can make the search boundaries smaller;
You do not want your data structure wasting memory, so nothing with left and right pointers for each data element.

As others have said I'd probably have created a BTree out of a linked list instead of using a vector. Even if you got past the sorting issue, vectors have the problem of fully reallocating when they need to grow, assuming you don't know your maximum size before hand.
If you are worried about a list allocating on different memory pages and causing cache related performance issues, preallocate your nodes in an array, (pool the objects) and insert these into the list.
You can add a value in your data type that denotes if it is allocated off the heap or from a pool. This way if you detect that your pool runs out of room, you can start allocating off the heap and throw an assert or something to yourself so you know to bump up the pool size (or make this a command line option to set.
Hope this helps, as I see you already have lots of great answers.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js