Memory Allocation in std::map - c++

I am doing a report on the various C++ dictionary implementations (map, dictionary, vectors etc).
The results for insertions using a std::map illustrate that that the performance is O(log n). There are also consistent spikes in the performance. I am not 100% sure what's causing this; I think they are caused by memory allocation but I have been unsuccessful in finding any literature / documentation to prove this.
Can anyone clear this matter up or point me in the right direction?
Cheers.

You are right: it is O(log n) complexity. But this is due to the sorted nature of map (normally binary tree based).
Also see http://www.sgi.com/tech/stl/UniqueSortedAssociativeContainer.html there is a note on insert. It’s worst case is O(log n) and amortized O(1) if you can hint where to do the insert.
Maps are normally based on binary trees and need to be balanced to keep good performance. The load spikes you are observing probably correspond to this balancing process

The empirical approach isn't strictly necessary when it comes to STL. There's no point in experimenting when the standard clearly dictates the minimal complexity of operations such as std::map insertion.
I urge you to read the standard so you're aware of the minimal complexity guarantees before continuing with experiments. Of course, there might be bugs in whatever STL implementation you happen to be testing; but the popular STLs are pretty well-debugged creatures and very widely used, so I'd doubt it.

If I remember correctly, std::map is a balanced red-black tree. Some of the spikes could be caused when the std::map determines that the underlying tree needs balancing. Also, when a new node is allocated, the OS could contribute to some spikes during the allocation portion.

Related

Deciding when to use a hash table

I was soving a competitive programming problem with the following requirements:
I had to maintain a list of unqiue 2d points (x,y), the number of unique points would be less than 500.
My idea was to store them in a hash table (C++ unordered set to be specific) and each time a node turned up i would lookup the table and if the node is not already there i would insert it.
I also know for a fact that i wouldn't be doing more than 500 lookups.
So i saw some solutions simply searching through an array (unsorted) and checking if the node was already there before inserting.
My question is is there any reasonable way to guess when should i use a hash table over a manual search over keys without having to benchmark them?
My question is is there any reasonable way to guess when should i use a hash table over a manual search over keys without having to benchmark them?
I am guessing you are familiar with basic algorithmics & time complexity and C++ standard containers and know that with luck hash table access is O(1)
If the hash table code (or some balanced tree code, e.g. using std::map - assuming there is an easy order on keys) is more readable, I would prefer it for that readability reason alone.
Otherwise, you might make some guess taking into account the approximate timing for various operations on a PC. BTW, the entire http:///norvig.com/21-days.html page is worth reading.
Basically, memory accesses are much more slow than everything else in the CPU. The CPU cache is extremely important. A typical memory access with cache fault requiring fetching data from DRAM modules is several hundreds times slower than some elementary arithmetic operation or machine instruction (e.g. adding two integers in registers).
In practice, it does not matter that much, as long as your data is tiny (e.g. less than a thousand elements), since in that case it is likely to sit in L2 cache.
Searching (linearly) in an array is really fast (since very cache friendly), up to several thousand of (small) elements.
IIRC, Herb Sutter mentions in some video that even inserting an element inside a vector is practically -but unintuitively- faster (taking into account the time needed to move slices) than inserting it into some balanced tree (or perhaps some other container, e.g. an hash table), up to a container size of several thousand small elements. This is on typical tablet, desktop or server microprocessor with a multimegabyte cache. YMMV.
If you really care that much, you cannot avoid benchmarking.
Notice that 500 pairs of integers is probably fitting into the L1 cache!
My rule of thumb is to assume the processor can deal with 10^9 operations per second.
In your case there are only 500 entries. An algorithm up to O(N^2) could be safe. By using contiguous data structure like vector you can leverage the fast cache hit. Also hash function sometimes can be costly in terms of constant. However if you have a data size of 10^6, the safe complexity might be only O(N) in total. In this case you might need to consider O(1) hashmap for a single lookup.
You can use Big O Complexity to roughly estimate the performance. For the Hash Table, Searching an element is between O(1) and O(n) in the worst case. That means, that in the best case your access time is independant of the number of elements in your map but in the worst case it is linear dependant on the size of your hash table.
A Binary tree has a guaranteed search complexity of O(nlog(n)). That means, that searching an element always depends on the size of the array, but in the Worst Case its faster than a hash table.
You can look up some Big O Complexities at this handy website here: http://bigocheatsheet.com/

What is the time complexity of CUDA's 'thrust::min_element' function?

Thrust library's documentation doesn't provide the time complexities for the functions. I need to know the time complexity of this particular function. How can I find it out?
The min-element algorithm just finds the minimum value in an unsorted range. If there is any way to do this in less than linear O(n) time-complexity, then my name is Mickey Mouse. And any implementation that would do worse than linear would have to be extremely badly written.
When it comes to the time complexities of algorithms in CUDA Thrust, well, they are mainly a CUDA-based parallelized implementation of the STL algorithms. So, you can generally just refer to the STL documentation.
The fact that the algorithms are parallelized does not change the time-complexity. At least, it generally cannot make the time-complexity any better. Running things in parallel simply divides the overall execution time by the number of parallel executions. In other words, it only affects the "constant factor" which is omitted from the "Big-O" analysis. You get a certain speed-up factor, but the complexity remains the same. But there is usually difficulties / overhead associated with parallelizing, and therefore, the speedup is rarely "ideal". It is only very rarely that the complexity is reduced, and it's only for some carefully-crafted fancy dynamic programming algorithms, not the kind of thing you'll find in CUDA Thrust. So, for Thrust, it's safe to assume all complexities are the same as those for the corresponding or closest-matching STL algorithm.

Map vs Unordered_map-- Multithreading

I have the following requirements:
I need a datastructure with key,value pairs(keys are integers if that helps).
I need the below operations:-
Iteration(most used)
Insertion (2nd most used)
Searching by key and deletion(least)
I plan to use multiple locks over the structure for concurrent access.
What is the ideal data structure to use?
Map or an unordered map?
I think unordered map makes sense, because i can insert in O(1), delete in O(1). But i am not sure of the iteration. How bad is the performance when compared to map?
Also, i plan to use multiple locks on blocks instead of the whole structure. Any good implementation example of this?
Thanks
The speed of iterator incrementing is O(1) for both containers, although you might get somewhat better cache locality from std::unordered_map.
Apart from the slower O(log N) find/insert/erase functionality of std::map, one other difference is that std::map provides bidirectional iterators, whereas the faster (amortized O(1) element access) std::unordered_map only provides forward iterators.
The excellent book C++ Concurrency in Action: Practical Multithreading by Anthony Williams provides a code example of a multithreaded unordered_map with a lock per entry. This book is highly recommended if you are doing serious multithreaded coding.
Iteration is not a problem in an unordered_map. It is a little less efficient than a vector, but not largely so.
As always, you will need to benchmark for YOUR use-cases, and compare with other container types if it's a critical part of your code.
Not sure what you mean by "multiple locks on blocks instead of the whole structure" - any container updates will need to be locked for the whole container...
Why not simply use an existing concurrent_unordered_map which you can find in both TBB and Concrt.
Have you thought about trying a std::deque The reasoning being as follows:
Iteration is fast - data should be more or less packed close (unlike lists)
Insertion (at either end) should be quick - data in a deque is never resized
Iteration and deletion slow (but uncommon usecase).
If the last two cases are common, a std::list may be used. Also consider testing a std::vector` since it is more cache efficient.
Iteration in an unordered_map may be slow due to iterating over a large number of unused elements in the hashtable. Insertions will be quich until collision levels become intolerable at which point the whole data structure will need to laid out again.
maps have relatively fast iteration except that data elements may be far apart. Insertion can be slow due to the re-balancing of the red-black trees that this requires.
The main usecase for unordered_maps is for fast lookup (O1). normal map have fastish lookup (O log n) but much better iteration performance.
If you have hard real-time requirements, I would recommend map over unordered_map. std::map has guaranteed performance 100% of the time, but the std::unordered_map may do a rehash and completely ruin real-time performance in some critical corner case. In general, I prefer red-black trees (std::map) over hashtables (std::unordered_map) if I need absolute guarantees on worst-case performance.

Fast container for consistent performance

I am looking a container to support frequent adds/removals. I have not idea how large the container may grow but I don't want to get stalls due to huge reallocations. I need a good balance between performance and consistent behavior.
Initially, I considered std::tr1::unordered_map but since I don't know the upper bound of the data set, collisions could really slow down the unordered_map's performance. It's not a matter of a good hashing function because no matter how good it is, if the occupancy of the map is more than half the bucket count, collisions will likely be a problem.
Now I'm considering std::map because it doesn't suffer from the issue of collisions but it only has log(n) performance.
Is there a way to intelligently handle collisions when you don't know the target size of an unordered_map? Any other ideas for handling this situation, which I imagine is not uncommon?
Thanks
This is a run-time container, right?
Are you adding at the end (as in push_back) or in the front or the middle?
Are you removing at random locations, or what?
How are you referencing information in it?
Randomly, or from the front or back, or what?
If you need random access, something based on array or hash is preferred.
If reallocation is a big problem, you want something more like a tree or list.
Even so, if you are constantly new-ing (and delete-ing) the objects that you're putting in the container, that alone is likely to consume a large fraction of time,
in which case you might find it makes sense to save used objects in junk lists, so you can recycle them.
My suggestion is, rather than agonize over the choice of container, just pick one, write the program, and then tune it, as in this example.
No matter what you choose, you will probably want to change it, maybe more than once.
What I found in that example was that any pre-existing container class is justifying its existence by ease of programming, not by fastest-possible speed.
I know it's counter-intuitive, but
unless some other activity in your program ends up being the dominant cost, and you can't shrink it, your final burst of speed will require hand-coding the data structure.
What kind of access do you need? Sequential, random access, lookup by key? Plus, you can rehash unordered map either manually (rehash method), and set its load factor. In any case the hash will rebuild itself when chains get too long (i.e., when the load factor is exceeded). Additionally, the slow-down point of a hash table is when it is full ~80%, not 50%.
You should really have read the documentation, for example here.

std::map vs. self-written std::vector based dictionary

I'm building a content storage system for my game engine and I'm looking at possible alternatives for storing the data. Since this is a game, it's obvious that performance is important. Especially considering various entities in the engine will be requesting resources from the data structures of the content manager upon their creation. I'd like to be able to search resources by a name instead of an index number, so a dictionary of some sort would be appropriate.
What are the pros and cons to using an std::map and to creating my own dictionary class based on std::vector? Are there any speed differences (if so, where will performance take a hit? I.e. appending vs. accessing) and is there any point in taking the time to writing my own class?
For some background on what needs to happen:
Writing to the data structures occurs only at one time, when the engine loads. So no writing actually occurs during gameplay. When the engine exits, these data structures are to be cleaned up. Reading from them can occur at any time, whenever an entity is created or a map is swapped. There can be as little as one entity being created at a time, or as many as 20, each needing a variable number of resources. Resource size can also vary depending on the size of the file being read in at the start of the engine, images being the smallest and music being the largest depending on the format (.ogg or .midi).
Map: std::map has guaranteed logarithmic lookup complexity. It's usually implemented by experts and will be of high quality (e.g. exception safety). You can use custom allocators for custom memory requirements.
Your solution: It'll be written by you. A vector is for contiguous storage with random access by position, so how will you implement lookup by value? Can you do it with guaranteed logarithmic complexity or better? Do you have specific memory requirements? Are you sure you can implement a the lookup algorithm correctly and efficiently?
3rd option: If you key type is string (or something that's expensive to compare), do also consider std::unordered_map, which has constant-time lookup by value in typical situations (but not quite guaranteed).
If you want the speed guarantee of std::map as well as the low memory usage of std::vector you could put your data in a std::vector, std::sort it and then use std::lower_bound to find the elements.
std::map is written with performance in mind anyway, whilst it does have some overhead as they have attempted to generalize to all circumstances, it will probably end up more efficient than your own implementation anyway. It uses a red-black binary tree, giving all of it's operations O[log n] efficiency (aside from copying and iterating for obvious reasons).
How often will you be reading/writing to the map, and how long will each element be in it? Also, you have to consider how often will you need to resize etc. Each of these questions is crucial to choosing the correct data structure for your implementation.
Overall, one of the std functions will probably be what you want, unless you need functionality which is not in a single one of them, or if you have an idea which could improve on their time complexities.
EDIT: Based on your update, I would agree with Kerrek SB that if you're using C++0x, then std::unordered_map would be a good data structure to use in this case. However, bear in mind that your performance can degrade to linear time complexity if you have conflicting hashes (this cannot happen with std::map), as it will store the two pair's in the same bucket. Whilst this is rare, the probability of it obviously increases with the number of elements. So if you're writing a huge game, it's possible that std::unordered_map could become less optimal than std::map. Just a consideration. :)