map vs unordered_map for few elements - c++

I am trying to choose between map and unordered_map for the following use case:
The key of the map is a pointer.
The most common use case is that there will be a single element in the map.
In general, the max number of elements in the map less than 10.
The map is accessed very often and speed is the most important factor. Changes to the map are infrequent.
While measuring the speed is obviously the correct approach here, this code will be used on several platforms so I'm trying to create a general rule of thumb for choosing between a map and unordered_map based on number of elements. I've seen some posts here that hint that std::map may be faster for a small number elements, but no definition of "small" was given.
Is there a rule of thumb for when to choose between a map and unordered_map based on number of elements? Is another data structure (such as linear search through a vector) even better?

Under the premise that you always need to measure in order to figure out what's more appropriate in terms of performance, if all these things are true:
Changes to the map are not frequent;
The map contains a maximum of 10 elements;
Lookups will be frequent;
You care a lot about performance;
Then I would say you would be better off putting your elements in an std::vector and performing a plain iteration over all your elements to find the one you're looking for.
An std::vector will allocate its elements in a contiguous region of memory, so cache locality is likely to grant you a greater performance - the time required to fetch a cache line from main memory after a cache miss is at least one order of magnitude higher than the time required to access the CPU cache.
Quite interestingly, it seems like Boost's flat_map is ideal for your use case (courtesy of Praetorian):
flat_map is similar to std::map but it's implemented like an ordered vector. (from the online documentation)
So if using Boost is an option for you, you may want to try this one.

I believe for your case of 10 elements or less and usually only one a linear search of an unsorted vector will work best. However, depending on the hash algorithm used the unordered_map may be faster instead.
It should be easy enough for you to benchmark.

Related

Memory efficient std::map alternative

I'm using a std::map to store about 20 million entries. If they were stored without any container overhead, it would take approximately 650MB of memory. However, since they are stored using std::map, it uses up about 15GB of memory (i.e. too much).
The reason I am using an std::map is because I need to find keys that are equal to/larger/smaller than x. This is why something like sparsehash wouldn't work (since, using that, I cannot find keys by comparison).
Is there an alternative to using std::map (or ordered maps in general) that would result in less memory usage?
EDIT: Writing performance is much more important than reading performance. It will probably only read ~10 entries, but I don't know which entries it will read.
One alternative would be to use flat_map from Boost.Containers: that supports the same interface as std::map, but is backed by a sorted contiguous array (think std::vector) instead of a tree. Or hand-roll your own solution based on the same idea.
Its performance characteristic is of course different, due to the different back-end. It's up to you to evaluate whether it's usable in your case.
Are you writing on-the-fly or one time before the lookup is done? If the later is the case, you shouldn't need a map, you could use std::vector and one-time sort.
You could just insert everything unsorted to the vector, sort one-time after everything is there (O(N * log N) as well as std::map, but much better performance characteristics) and then lookup in the sorted array (O(logN) as the std::map).
And especially if you know the number of elements before reading and could reserve the vector size upfront, that could work pretty well. Or at least if you know some "upper bound" to reserve perhaps slightly more than actually needed but avoid the reallocations.
Given your requirements:
Insertion needs to be quick
There are many elements to read
Read-back can be slow
You only read back data once
I'd consider typedef std::pair<uint64, thirty_six_byte_struct> element; and populate a std::list<element>. That will be hard to beat in terms of performance.
For reading back, I'd simply traverse the linked list, checking at every point if you need one of those elements. That's a O(N) traversal but as you say, you'll only do that once.
Turns out the issue wasn't std::map.
I realized was using 3 separate maps to represent various parts of the same data, and after slimming it down to 1, the difference in memory was entirely negligible.
Looking at the code a little more, I realized code I had written to free a really expensive struct (per element of the map) didn't actually work.
Fixing that part, it now uses <1GB of memory, as it should! :)
TL;DR: std::map's overhead is entirely negligible for this. The issue was my own.

Does a map get slower the longer it is

Will a map get slower the longer it is? I'm not talking about iterating through it, but rather operations like .find() .insert() and .at().
For instance if we have map<int, Object> mapA which contains 100'000'000 elements and map<int, Object> mapB which only contains 100 elements.
Will there be any difference performance wise executing mapA.find(x) and mapB.find(x)?
The complexity of lookup and insertion operations on std::map is logarithmic in the number of elements in the map. So it gets slower as the map gets larger, but only it gets slower only very slowly (slower than any polynomial in the element number). To implement a container with such properties, operations usually take a form of binary search.
To imagine how much slower it is, you essentially require one further operation every time you double the number of elements. So if you need k operations on a map with 4000 elements, you need k + 1 operations on a map with 8000 elements, k + 2 for 16000 elements, and so forth.
By contrast, std::unordered_map does not offer you an ordering of the elements, and in return it gives you a complexity that's constant on average. This container is usually implemented as a hash table. "On average" means that looking up one particular element may take long, but the time it takes to look up many randomly chosen elements, divided by the number of looked-up elements, does not depend on the container size. The unordered map offers you fewer features, and as a result can potentially give you better performance.
However, be careful when choosing which map to use (provided ordering doesn't matter), since asymptotic cost doesn't tell you anything about real wall-clock cost. The cost of hashing involved in the unordered map operations may contribute a significant constant factor that only makes the unordered map faster than the ordered map at large sizes. Moreover, the lack of predictability of the unordered map (along with potential complexity attacks using chosen keys) may make the ordered map preferable in situations where you need control on the worst case rather than the average.
The C++ standard only requires that std::map has logarithmic lookup time; not that it is a logarithm of any particular base or with any particular constant overhead.
As such, asking "how many times slower would a 100 million map be than a 100 map" is nonsensical; it could well be that the overhead easily dominates both, so that the operations are about the same speed. It could even well be that for small sizes the time growth is exponential! By design, none of these things are deducible purely from the specification.
Further, you're asking about time, rather than operations. This depends heavily on access patterns. To use some diagrams from Paul Khong's (amazing) blog on Binary searches, the runtimes for repeated searches (look at stl, the turquoise line) are almost perfectly logarithmic,
but once you start doing random access the performance becomes decidedly non-logarithmic due to memory access outside of level-1 cache:
Note that goog refers to Google's dense_hash_map, which is akin to unordered_map. In this case, even it does not escape performance degradation at larger sizes.
The latter graph is likely the more telling for most situations, and suggests that the speed cost from looking up a random index in a size-100 map will cost about 10x less than a size-500'000 map. dense_hash_map will degrade worse than that, in that it will go from almost-free to certainly-not-free, albeit always remaining much faster than the STL's map.
In general, when asking these questions, an approach from theory can only give you very rough answers. A quick look at actual benchmarks and considerations of constant factors is likely to fine-tune these rough answers significantly.
Now, also remember that you're talking about map<int, Object>, which is very different from set<uint32_t>; if the Object is large this will emphasize the cost of cache misses and de-emphasize the cost of traversal.
A pedantic aside.
A quick note about hash maps: Their time complexity is often described as constant time, but this isn't strictly true. Most hash maps rather give you constant time with very high likelihood with regards to lookups, and amortized constant time with very high likelihood with regards to inserts.
The former means that for most hash tables there is an input that makes them perform less than optimal, and for user-input this could be dangerous. For this reason, Rust uses a cryptographic hash by defaul, Java's HashMap resolves collision with a binary search and CPython randomizes hashes. Generally if you're exposing your hash table to untrusted input, you should make sure you're using some mitigation of this kind.
Some, like Cuckoo hashes, do better than probabilistic (on constrained data types, given a special kind of hash function) for the case where you're worried about attackers, and incremental resizing removes the amortized time cost (assuming cheap allocations), but neither are commonly used since these are rarely problems that need solving, and the solutions are not free.
That said, if you're struggling to think of why we'd go through the hassle of using unordered maps, look back up at the graphs. They're fast, and you should use them.

C++: container replacement for vector/deque for huge sizes

so my applications has containers with 100 million and more elements.
I'm on the hunt for a container which behaves - time-wise - better than std::deque (let alone std::vector) with respect to frequent insertions and deletions all over the container ... including near the middle. Access time to the n-th element does not need to be as fast as vector, but should definetely be better than full traversal like in std::list (which has a huge memory overhead per element anyway).
Elements should be treated ordered by index (like vector, deque, list), so std::set or std::unordered_set also do not work well.
Before I sit down and code such a container myself: has anyone seen such a beast already? I'm pretty sure the STL hasn't anything like this, looking to BOOST I did not find something I could use but I may be wrong.
Any hints?
There's a whole STL replacement for big data, in case your app is centric to such data:
STXXL - http://stxxl.sourceforge.net/
edit: I was actually a bit fast to answer. 100 million is not really a large number. E.g., if each element is one byte, you could save it in a 96MiB array. So whether STXXL is any useful, the size of an element should be significantly bigger.
I think you can get the performance characteristics that you want with a skip list:
https://en.wikipedia.org/wiki/Skip_list#Indexable_skiplist
It's the "indexable" part that you're interested in, of course -- you don't actually want the items to be sorted. So some modification is needed that I leave as an exercise.
You might find that 100 million list nodes begins to strain a 32 bit address space, but probably not an issue in 64 bits.
1) If the data is highly sparse, i.e. has lots of zeroes or can be expressed as such, I would highly recommend a data structure that takes advantage of that:
sparselib++ for matrices
sparsehash for hash maps
2) Hash maps should do O(1) for all the operations you describe and the sparsehash implementation I mentioned earlier is particularly space-efficient; it also includes a sparsetable type which is a bit more low-level and can be used in place of an array.
3) If the strict ordering is not that important (it probably is, because you mentioned elements should be treated ordered by index), you can swap the elements you want to erase to the end of the vector and then resize to do removal in O(1). Insertion would just be push_back.
Try a hash map. The STL has several, all with the unordered naming prefix , such as unorderd_map, etc. It has constant time insertion and look up given a good hashing algorithm. With your 'huge' data set the hash map would most likely cover your needs. Making a slight change to the application to cover the differences in the interfaces is trivial.

How large does a collection have to be for std::map<k,v> to outpace a sorted std::vector<std::pair<k,v> >?

How large does a collection have to be for std::map to outpace a sorted std::vector >?
I've got a system where I need several thousand associative containers, and std::map seems to carry a lot of overhead in terms of CPU cache. I've heard somewhere that for small collections std::vector can be faster -- but I'm wondering where that line is....
EDIT: I'm talking about 5 items or fewer at a time in a given structure. I'm concerned most with execution time, not storage space. I know that questions like this are inherently platform-specific, but I'm looking for a "rule of thumb" to use.
Billy3
It's not really a question of size, but of usage.
A sorted vector works well when the usage pattern is that you read the data, then you do lookups in the data.
A map works well when the usage pattern involves a more or less arbitrary mixture of modifying the data (adding or deleting items) and doing queries on the data.
The reason for this is fairly simple: a map has higher overhead on an individual lookup (thanks to using linked nodes instead of a monolithic block of storage). An insertion or deletion that maintains order, however, has a complexity of only O(lg N). An insertion or deletion that maintains order in a vector has a complexity of O(N) instead.
There are, of course, various hybrid structures that can be helpful to consider as well. For example, even when data is being updated dynamically, you often start with a big bunch of data, and make a relatively small number of changes at a time to it. In this case, you can load your data into memory into a sorted vector, and keep the (small number of) added objects in a separate vector. Since that second vector is normally quite small, you simply don't bother with sorting it. When/if it gets too big, you sort it and merge it with the main data set.
Edit2: (in response to edit in question). If you're talking about 5 items or fewer, you're probably best off ignoring all of the above. Just leave the data unsorted, and do a linear search. For a collection this small, there's effectively almost no difference between a linear search and a binary search. For a linear search you expect to scan half the items on average, giving ~2.5 comparisons. For a binary search you're talking about log2 N, which (if my math is working this time of the morning) works out to ~2.3 -- too small a difference to care about or notice (in fact, a binary search has enough overhead that it could very easily end up slower).
If you say "outspace" you mean consuming more space (aka memory), then it's very likely that vector will always be more efficient (the underlying implementation is an continous memory array with no othe data, where map is a tree, so every data implies using more space). This however depends on how much the vector reserves extra space for future inserts.
When it is about time (and not space), vector will also always be more effective (doing a dichotomic search). But it will be extreamly bad for adding new elements (or removing them).
So : no simple answer ! Look-up the complexities, think about the uses you are going to do. http://www.cplusplus.com/reference/stl/
The main issue with std::map is an issue of cache, as you pointed.
The sorted vector is a well-known approach: Loki::AssocVector.
For very small datasets, the AssocVector should crush the map despite the copy involved during insertion simply because of cache locality. The AssocVector will also outperform the map for read-only usage. Binary search is more efficient there (less pointers to follow).
For all other uses, you'll need to profile...
There is however an hybrid alternative that you might wish to consider: using the Allocator parameter of the map to restrict the memory area where the items are allocated, thus minimizing the locality reference issue (the root of cache misses).
There is also a paradigm shift that you might consider: do you need sorted items, or fast look-up ?
In C++, the only STL-compliant containers for fast-lookup have been implemented in terms of Sorted Associative Containers for years. However the up-coming C++0x features the long awaited unordered_map which could out perform all the above solutions!
EDIT: Seeing as you're talking about 5 items or fewer:
Sorting involves swapping items. When inserting into std::map, that will only involve pointer swaps. Whether a vector or map will be faster depends on how fast it is to swap two elements.
I suggest you profile your application to figure it out.
If you want a simple and general rule, then you're out of luck - you'll need to consider at least the following factors:
Time
How often do you insert new items compared to how often you lookup?
Can you batch inserts of new items?
How expensive is sorting you vector? Vectors of elements that are expensive to swap become very expensive to sort - vectors of pointers take far less.
Memory
How much overhead per allocation does the allocator you're using have? std::map will perform one allocation per item.
How big are your key/value pairs?
How big are your pointers? (32/64 bit)
How fast does you implementation of std::vector grow? (Popular growth factors are 1.5 and 2)
Past a certain size of container and element, the overhead of allocation and tree pointers will become outweighed by the cost of the unused memory at the end of the vector - but by far the easiest way to find out if and when this occurs is by measuring.
It has to be in the millionth items. And even there ...
I am more thinking here to memory usage and memory accesses. Under hundreds of thousands, take whatever you want, there will be no noticeable difference. CPUs are really fast these days, and the bottleneck is memory latency.
But even with millions of items, if your map<> has been build by inserting elements in random order. When you want to traverse your map (in sorted order) you'll end up jumping around randomly in the memory, stalling the CPU for memory to be available, resulting in poor performance.
On the other side, if your millions of items are in a vector, traversing it is really fast, taking advantage of the CPU memory accesses predictions.
As other have written, it depends on your usage.
Edit: I would more question the way to organize your thousands of associative containers than the containers themselves if they contain only 5 items.

c++ container for checking whether ordered data is in a collection

I have data that is a set of ordered ints
[0] = 12345
[1] = 12346
[2] = 12454
etc.
I need to check whether a value is in the collection in C++, what container will have the lowest complexity upon retrieval? In this case, the data does not grow after initiailization. In C# I would use a dictionary, in c++, I could either use a hash_map or set. If the data were unordered, I would use boost's unordered collections. However, do I have better options since the data is ordered? Thanks
EDIT: The size of the collection is a couple of hundred items
Just to detail a bit over what have already been said.
Sorted Containers
The immutability is extremely important here: std::map and std::set are usually implemented in terms of binary trees (red-black trees for my few versions of the STL) because of the requirements on insertion, retrieval and deletion operation (and notably because of the invalidation of iterators requirements).
However, because of immutability, as you suspected there are other candidates, not the least of them being array-like containers. They have here a few advantages:
minimal overhead (in term of memory)
contiguity of memory, and thus cache locality
Several "Random Access Containers" are available here:
Boost.Array
std::vector
std::deque
So the only thing you actually need to do can be broken done in 2 steps:
push all your values in the container of your choice, then (after all have been inserted) use std::sort on it.
search for the value using std::binary_search, which has O(log(n)) complexity
Because of cache locality, the search will in fact be faster even though the asymptotic behavior is similar.
If you don't want to reinvent the wheel, you can also check Alexandrescu's [AssocVector][1]. Alexandrescu basically ported the std::set and std::map interfaces over a std::vector:
because it's faster for small datasets
because it can be faster for frozen datasets
Unsorted Containers
Actually, if you really don't care about order and your collection is kind of big, then a unordered_set will be faster, especially because integers are so trivial to hash size_t hash_method(int i) { return i; }.
This could work very well... unless you're faced with a collection that somehow causes a lot of collisions, because then unsorted containers will search over the "collisions" list of a given hash in linear time.
Conclusion
Just try the sorted std::vector approach and the boost::unordered_set approach with a "real" dataset (and all optimizations on) and pick whichever gives you the best result.
Unfortunately we can't really help more there, because it heavily depends on the size of the dataset and the repartition of its elements
If the data is in an ordered random-access container (e.g. std::vector, std::deque, or a plain array), then std::binary_search will find whether a value exists in logarithmic time. If you need to find where it is, use std::lower_bound (also logarithmic).
Use a sorted std::vector, and use a std::binary_search to search it.
Your other options would be a hash_map (not in the C++ standard yet but there are other options, e.g. SGI's hash_map and boost::unordered_map), or an std::map.
If you're never adding to your collection, a sorted vector with binary_search will most likely have better performance than a map.
I'd suggest using a std::vector<int> to store them and a std::binary_search or std::lower_bound to retrieve them.
Both std::unordered_set and std::set add significant memory overhead - and even though the unordered_set provides O(1) lookup, the O(logn) binary search will probably outperform it given that the data is stored contiguously (no pointer following, less chance of a page fault etc.) and you don't need to calculate a hash function.
If you already have an ordered array or std::vector<int> or similar container of the data, you can just use std::binary_search to probe each value. No setup time, but each probe will take O(log n) time, where n is the number of ordered ints you've got.
Alternately, you can use some sort of hash, such as boost::unordered_set<int>. This will require some time to set up, and probably more space, but each probe will take O(1) time on the average. (For small n, this O(1) could be more than the previous O(log n). Of course, for small n, the time is negligible anyway.)
There is no point in looking at anything like std::set or std::map, since those offer no advantage over binary search, given that the list of numbers to match will not change after being initialized.
So, the questions are the approximate value of n, and how many times you intend to probe the table. If you aren't going to check many values to see if they're in the ints provided, then setup time is very important, and std::binary_search on the sorted container is the way to go. If you're going to check a lot of values, it may be worth setting up a hash table. If n is large, the hash table will be faster for probing than binary search, and if there's a lot of probes this is the main cost.
So, if the number of ints to compare is reasonably small, or the number of probe values is small, go with the binary search. If the number of ints is large, and the number of probes is large, use the hash table.