Recently, while working on a programming problem in C++, I came across something interesting. My algorithm used a really large set and would use std::lower_bound on it a great deal of times. However, after submitting my solution, contrary to the math I did on paper to prove that my code was fast enough, it ended up being far too slow. The code looked something like this:
using namespace std;
set<int> s;
int x;
//code code code
set<int>::iterator it = lower_bound(s.begin(),s.end(),x);
However, after getting a hint from a buddy to use set::lower_bound, the algorithm in question worked waaaaaaaay faster than before, and it followed my math. The binary search after changing:
set<int>::iterator it = s.lower_bound(x);
My question is what's the difference between these two? Why does one work much, much faster than the other? Isn't lower_bound supposed to be a binary search function that has the complexity O(log2(n))? In my code it ended up being way slower than that.
std::set is typically implemented as a self-balancing tree with some list-like structure tied into it. Knowing this structure, std::set::lower_bound will traverse the tree knowing the properties of the tree structure. Each step in this just means following a left or right child branch.
std::lower_bound needs to run something akin to a binary search over the data. However since std::set::iterator is bidirectional (not random access), this is much slower, a lot of increments need to be done between the checked elements. The work done between elements is thus much more intense. In this case the algorithm will check the element half way between A and B, then adjust one of A or B, find the element half way between them, and repeat.
After reading the API of std::lower_bound
On non-random-access iterators, the iterator advances produce themselves an additional linear complexity in N on average.
And I think STL set is using non-random-access iterators, so it is not doing a O(log N) binary search if using on STL set
std::lower_bound is a generic binary search algorithm, meant to work with most STL containers. set::lower_bound is designed to work with std::set so it takes advantages of the unique properties of std::set.
As std::set is often implemented as a red-black tree, one can imagine std::lower_bound iterating across all nodes, while set::lower_bound just traverses down the tree.
std::lower_bound always guarantees a O(log n) comparisons, only guarantees O(log n) time if passed a RandomAccessIterator, not just a ForwardIterator which does not provide constant-time std::advance.
The std::set::lower_bound implementation of the same algorithm is able to use internal details of the structure to avoid this problem.
Related
Background
Im building a performance minded application and I came across a place where I have to use std::set. And it works like a charm. But then I started reading into the documentation (which you can find here) and the first thing I noticed was that
Search, removal, and insertion operations have logarithmic complexity. Sets are usually implemented as red-black trees.
The search, removal and insertions makes perfect sense to me as they are using some kind of a tree structure (because the documentation does not guarantee that it uses a Red-Black Tree). But the problem is, why should they?
I made an alternate solution to the std::set of my own and which uses a std::vector to store all the entries. Then I performed some basic benchmarks and here are the results,
Iterations: 100000
// Insertion
VectorSet : 211464us
std::set : 1272864us
// Find/ Lookup
VectorSet : 404264us
std::set : 551464us
// Removal
VectorSet : 254321964us
std::set : 834664us
// Traversal (iterating through all the elements (100000 elements; 100000 iterations)
VectorSet : 2464us
std::set : 4374174264us
According to these results, my implementation (VectorSet) outperformed std::set in both insertions and lookups, and traversal was over 1800000 times. But std::set outperformed my implementation VectorSet by a significant margin (which is understandable as we are dealing with vectors).
I can justify why removal is slower in VectorSet but faster on std::set and why std::set takes so long to iterate through the entries. Some things which affect the performance would be (correct me if im wrong),
Cache misses.
Pointer dereferences.
Better locality.
For the vector being slower in removal,
Finding the element.
Removal of the element.
Possible resize.
Question
As what I can see, using a std::vector to store entries rather than a tree structure performs better in 3/4 instances. And even in the place where std::set performed better, it is still a small amount compared to iterating through. And in my opinion, developers use other aspects (lookups, insertions and iterations) more than removals. Even though these numbers are in the range of nanoseconds, the slightest improvement is better.
So my question is, why does std::set use a tree structure when they can use something like a vector to improve their efficiency?
Note: The container will be filled up with an average of 1000 elements and will be iterated repeatedly throughout the application lifetime and will directly affect the application runtime.
The standard set has some guarantees that you can't provide with your implementation:
inserting/erasing doesn't invalidate other iterators/references/pointers.
inserting/erasing elements has (at most) logarithmic complexity, as opposed to linear in your implementation.
If these don't matter to you, you're welcome to use a sorted vector and binary search. The standard provides std::sort, std::vector and std::binary_search, so you are good to go. The thing to notice is that each container has a specific use case and there is no "one size fits all" container.
The standard also provides unordered_set, which is a hash table. It is often criticized for being slow and causing cache-misses. Well, if that degrades your performance in a way you identified as a bottle-neck, go ahead and use some other hash-set implementation from other libraries. If you believe that you can do better, go ahead. Many projects build their own containers that are more specialized to that project. Could be faster, use less memory, could give different guarantees about iterator invalidation and/or complexity of operations. They all solve different problems.
Another point is that profiling and benchmarking is hard. Make sure you get it right. Performance comparison is usually done at scale (with varying number of input arguments). Picking a constant and relatively small size won't tell the whole story.
I'm doing a little research about searching and sorting algorithms in the Standard library. I couldn't find something about those questions. I hope someone can help me out. You can also send me links if you know some.
Does the searching behavior change if the data is not sorted compared to one which is sorted?
How can I know if it is better to use std::sort() on a vector instead of maybe to copy the vector to an already sorted set? That is just an example. I hoped to find some explanations on the web which ways are the best for searching or sorting, but I didn't.
How can I adapt the behavior of the searching and sorting algorithms to make it more efficient?
Does the searching behavior change if the data is not sorted compared
to one which is sorted?
Depends. If you access your data in a vector/array by position, there's no performance improvement, and there's no need for sorting neither.
Searching can be done linearly, binary, keys, and by hash function.
For small (I guess something below a few dozens of items) and contiguous containers (e.g. a vector) linear search can be the fastest, just because of cache-friendly memory layout.
Binary search has O(log N) complexity which is likely the best you can get... I'm thinking in Information theory. It requires that you sort previously the container. It's useful for frequetly searches in the same container.
A std::set (and its cousin std::map) uses internally a tree, which makes searching O(log N) complexity too. Useful if you search by keys, instead of some criteria of your items. The drawback is that it's a bit slower on building (always keep sorted) than fill a vector an later sort it.
A hashmap or hashtable uses a function for getting the bucket where the item lies. The complexity is something near to O(1), depending on number of items and the function used (collisions issue).
As you see, selecting a type of container depends on how are you going to handle your data. Choose the one that fits your requirements.
How can I know if it is better to use std::sort() on a vector instead
of maybe to copy the vector to an already sorted set?
std::sort changes the container so the result is, obviously, sorted. If you need the original, unordered, container, then make a copy and sort the copy. Sorting the whole of the container is better that "insert-item-so-container-is-always-sorted" for all items, specially with a vector (many memory reallocations); a set/map filling process may be not that slow.
How can I adapt the behavior of the searching and sorting algorithms
to make it more efficient?
It's not clear to me what you mean. But, "The end justifies the means". Again, choose the container that servers best for your data handling.
Does the searching behavior change if the data is not sorted compared to one which is sorted?
No. It depends on the algorithm you choose. General search std::find is O(n), binary search std::lower_bound is O(log n), but it works only on sorted ranges.
How can I know if it is better to use std::sort() on a vector instead of maybe to copy the vector to an already sorted set? That is just an example. I hoped to find some explanations on the web which ways are the best for searching or sorting, but I didn't.
You can write a benchmark and measure. You can sort an std::vector (without duplicated elements) by copying it into an std::set, which maintains sorted order internally. std::set is typically implemented as a red-black tree and has in general high memory fragmentation in contrast to contiguous std::vector. So it is easy to predict the result. Alexander Stepanov discusses (if I remember correctly) this particular example in his lectures available on YouTube.
I have a multiset, implemented as follows:
#include <bits/stdc++.h>
using namespace std;
multiset <int> M;
int numunder(int k){
/*this function must return the number of elements smaller than or equal to k
in M (taking multiplicity into account).
*/
}
At first I thought I could just return M.upper_bound(k)-M.begin()+1. Unfortunately it seems we cannot subtract pointers like that. We ended up having to implement an AVLNodes structure. Is there a way to get this to work taking advantage of the c++ std?
Sticking closely to your proposed M.upper_bound(k)-M.begin()+1 solution (which clearly does not compile, because the multimap iterator is a bidirectional iterator that does not implement operator-), you could use std::distance to get the distance between two multimap iterators to have a correct solution.
Note that this solution will have O(n) complexity, because if the iterator is not a random access iterator, std::distance will just increment the iterator passed in as first parameter, until it finds the iterator passed in as second argument.
I also don't really think that this problem can be solved in better than O(n) complexity with std::multiset.
This can be solved using some policy based data structures avaliable in gcc . You can use the red black tree with information statistics, here is a discussion
Gcc implements multisets as red-black trees. In a binary tree there is no non-trivial way to get the "sorted index" of a node without storing extra info in the node, such as the number of children.
Also know that iterating through the iterators returned by find, upper_bound, etc. will walk the tree, because the iterators are not random access. See https://en.cppreference.com/w/cpp/container/multiset
If you want to only use built-in data structures you could maintain a separate vector that you can perform binary search on. This is more organizational work but if you are only inserting or erasing then it is pretty simple. Anything more complicated probably warrants its own data structure.
When should I choose one over the other?
Are there any pointers that you would recommend for using the right STL containers?
hash_set is an extension that is not part of the C++ standard. Lookups should be O(1) rather than O(log n) for set, so it will be faster in most circumstances.
Another difference will be seen when you iterate through the containers. set will deliver the contents in sorted order, while hash_set will be essentially random (Thanks Lou Franco).
Edit: The C++11 update to the C++ standard introduced unordered_set which should be preferred instead of hash_set. The performance will be similar and is guaranteed by the standard. The "unordered" in the name stresses that iterating it will produce results in no particular order.
stl::set is implemented as a binary search tree.
hashset is implemented as a hash table.
The main issue here is that many people use stl::set thinking it is a hash table with look-up of O(1), which it isn't, and doesn't have. It really has O(log(n)) for look-ups. Other than that, read about binary trees vs hash tables to get a better idea of the data structures.
Another thing to keep in mind is that with hash_set you have to provide the hash function, whereas a set only requires a comparison function ('<') which is easier to define (and predefined for native types).
I don't think anyone has answered the other part of the question yet.
The reason to use hash_set or unordered_set is the usually O(1) lookup time. I say usually because every so often, depending on implementation, a hash may have to be copied to a larger hash array, or a hash bucket may end up containing thousands of entries.
The reason to use a set is if you often need the largest or smallest member of a set. A hash has no order so there is no quick way to find the smallest item. A tree has order, so largest or smallest is very quick. O(log n) for a simple tree, O(1) if it holds pointers to the ends.
A hash_set would be implemented by a hash table, which has mostly O(1) operations, whereas a set is implemented by a tree of some sort (AVL, red black, etc.) which have O(log n) operations, but are in sorted order.
Edit: I had written that trees are O(n). That's completely wrong.
I'm wondering if anyone can recommend a good C++ tree implementation, hopefully one that is
stl compatible if at all possible.
For the record, I've written tree algorithms many times before, and I know it can be fun, but I want to be pragmatic and lazy if at all possible. So an actual link to a working solution is the goal here.
Note: I'm looking for a generic tree, not a balanced tree or a map/set, the structure itself and the connectivity of the tree is important in this case, not only the data within.
So each branch needs to be able to hold arbitrary amounts of data, and each branch should be separately iterateable.
I don't know about your requirements, but wouldn't you be better off with a graph (implementations for example in Boost Graph) if you're interested mostly in the structure and not so much in tree-specific benefits like speed through balancing? You can 'emulate' a tree through a graph, and maybe it'll be (conceptually) closer to what you're looking for.
Take a look at this.
The tree.hh library for C++ provides an STL-like container class for n-ary trees, templated over the data stored at the nodes. Various types of iterators are provided (post-order, pre-order, and others). Where possible the access methods are compatible with the STL or alternative algorithms are available.
HTH
I am going to suggest using std::map instead of a tree.
The complexity characteristics of a tree are:
Insert: O(ln(n))
Removal: O(ln(n))
Find: O(ln(n))
These are the same characteristics the std::map guarantees.
Thus as a result most implementations of std::map use a tree (Red-Black Tree) underneath the covers (though technically this is not required).
If you don't have (key, value) pairs, but simply keys, use std::set. That uses the same Red-Black tree as std::map.
Ok folks, I found another tree library; stlplus.ntree. But haven't tried it out yet.
Let suppose the question is about balanced (in some form, mostly red black tree) binary trees, even if it is not the case.
Balanced binaries trees, like vector, allow to manage some ordering of elements without any need of key (like by inserting elements anywhere in vector), but :
With optimal O(log(n)) or better complexity for all the modification of one element (add/remove at begin, end and before & after any iterator)
With persistance of iterators thru any modifications except direct destruction of the element pointed by the iterator.
Optionally one may support access by index like in vector (with a cost of one size_t by element), with O(log(n)) complexity. If used, iterators will be random.
Optionally order can be enforced by some comparison func, but persistence of iterators allow to use non repeatable comparison scheme (ex: arbitrary car lanes change during traffic jam).
In practice, balanced binary tree have interface of vector, list, double linked list, map, multimap, deque, queue, priority_queue... with attaining theoretic optimal O(log(n)) complexity for all single element operations.
<sarcastic> this is probably why c++ stl does not propose it </sarcastic>
Individuals may not implement general balanced tree by themselves, due to the difficulties to get correct management of balancing, especially during element extraction.
There is no widely available implementation of balanced binary tree because the state of the art red black tree (at this time the best type of balanced tree due to fixed number of costly tree reorganizations during remove) know implementation, slavishly copied by every implementers’ from the initial code of the structure inventor, does not allow iterator persistency. It is probably the reason of the absence of fully functionnal tree template.