Fastest C++ map? - c++

Correct me I'm wrong but std::map is an ordered map, thus each time I insert a value the map uses an algorithm to sort its items internally, which takes some time.
My application gets information regarding some items on a constant interval.
This app keeps a map which is defined like this:
::std::map<DWORD, myItem*>
At first all items are considered "new" to the app. An "Item" object is being allocated and added to this map, associating its id and a pointer to it.
When it's not a "new" item (just an update of this object) my app should find the object at the map, using the given id, and update.
Most of the times I get updates.
My question is:
Is there any faster map implementation or should I keep using this one?
Am I better use unordered_map?

Am I better use unordered_map?
Possibly.
std:map provides consistent performance at O(log n) because it needs to be implemented as a balanced tree. But std:unordered_map will be implemented as a hash table which might give you O(1) performance (good hash function and distribution of keys across hash buckets), but it could be O(n) (everything in one hash bucket and devolves to a list). One would normally expect something inbetween these extremes.
So you can have reasonable performance (O(log n)) all the time, or you need to ensure everything lines up to get good performance with a hash.
As with any such question: you need to measure before committing to one approach. Unless your datasets are large you might find there is no significant difference.

Important warning: Unless you have measured (and your question suggests that you haven't) that map performance substantially influences your application performance (large percentage of time is spent on searching and updating the map) don't bother with making it faster.
Stick to std::map (or std::unordered_map or any available hash_map implementation).
Speeding up your application by 1% probably will not be worth the effort.
Make it bug free instead.
Echoing Richard's answer: measure performance with different map implementation using your real classes and real data.
Some additional notes:
Understand the difference between expected cost (hash maps usually have it lower), worst case cost (O(logn) for balanced binary tree but much higher for hash map if insert triggers reallocation of hash array) and amortized cost (total cost divided by number of operations or elements; depends on things like ratio of new and existing elements). You need to find out which is more constraining in your case. For example reallocating of hash maps can be too much if you need to adhere to very low latency limit.
Find out where real bottleneck is. It might be that cost of searching in map is insignificant compared to e.g. IO cost.
Try more specialized map implementation. For example a lot can be gained if you know something more about map's key. Authors of generic map implementations do not have such knowledge.
In your example (32 bit unsigned integer keys which strongly cluster, e.g. are assigned sequentially) you can use radix based approach. Very simple example (threat it as an illustration, not ready to use recipe):
Item *sentinel[65536]; // sentinel page, initialized to NULLs.
Item (*pages[65536])[65536]; // list of pages,
// initialized so every element points to sentinel
Then search is as simple as:
Item *value = pages[index >> 16][index & 0xFFFF];
When you need to set new value:
if (pages[index >> 16] == sentinel) {
pages[index >> 16] = allocate_new_null_filled_page();
}
pages[index >> 16][index & 0xFFFF] = value;
Tweak your map implementation.
E.g. every hash_map likes to know approximate number of elements in advance. It helps avoid unnecessary reallocation of hash table and (possibly) rehashing of all keys.
With my specialized example above you certainly would try different page sizes, or three level version.
Common optimization is providing specialized memory allocator to avoid multiple allocations of small objects.

Whenever you insert or delete item, the memory allocation/deallocation costs a lot. Instead you can use an allocator like this one: https://github.com/moya-lang/Allocator which speeds up std::map twice as author says, but I found it even faster especially for other STL containers.

Related

Difference between multimap and unordered_multimap in c++? [duplicate]

I have a simple requirement, i need a map of type . however i need fastest theoretically possible retrieval time.
i used both map and the new proposed unordered_map from tr1
i found that at least while parsing a file and creating the map, by inserting an element at at time.
map took only 2 minutes while unordered_map took 5 mins.
As i it is going to be part of a code to be executed on Hadoop cluster and will contain ~100 million entries, i need smallest possible retrieval time.
Also another helpful information:
currently the data (keys) which is being inserted is range of integers from 1,2,... to ~10 million.
I can also impose user to specify max value and to use order as above, will that significantly effect my implementation? (i heard map is based on rb trees and inserting in increasing order leads to better performance (or worst?) )
here is the code
map<int,int> Label // this is being changed to unordered_map
fstream LabelFile("Labels.txt");
// Creating the map from the Label.txt
if (LabelFile.is_open())
{
while (! LabelFile.eof() )
{
getline (LabelFile,inputLine);
try
{
curnode=inputLine.substr(0,inputLine.find_first_of("\t"));
nodelabel=inputLine.substr(inputLine.find_first_of("\t")+1,inputLine.size()-1);
Label[atoi(curnode.c_str())]=atoi(nodelabel.c_str());
}
catch(char* strerr)
{
failed=true;
break;
}
}
LabelFile.close();
}
Tentative Solution: After review of comments and answers, i believe a Dynamic C++ array would be the best option, since the implementation will use dense keys. Thanks
Insertion for unordered_map should be O(1) and retrieval should be roughly O(1), (its essentially a hash-table).
Your timings as a result are way OFF, or there is something WRONG with your implementation or usage of unordered_map.
You need to provide some more information, and possibly how you are using the container.
As per section 6.3 of n1836 the complexities for insertion/retreival are given:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf
One issue you should consider is that your implementation may need to continually be rehashing the structure, as you say you have 100mil+ items. In that case when instantiating the container, if you have a rough idea about how many "unique" elements will be inserted into the container, you can pass that in as a parameter to the constructor and the container will be instantiated accordingly with a bucket-table of appropriate size.
The extra time loading the unordered_map is due to dynamic array resizing. The resizing schedule is to double the number of cells each when the table exceeds it's load factor. So from an empty table, expect O(lg n) copies of the entire data table. You can eliminate these extra copies by sizing the hash table upfront. Specifically
Label.reserve(expected_number_of_entries / Label.max_load_factor());
Dividing by the max_load_factor is to account for the empty cells that are necessary for the hash table to operate.
unordered_map (at least in most implementations) gives fast retrieval, but relatively poor insertion speed compared to map. A tree is generally at its best when the data is randomly ordered, and at its worst when the data is ordered (you constantly insert at one end of the tree, increasing the frequency of re-balancing).
Given that it's ~10 million total entries, you could just allocate a large enough array, and get really fast lookups -- assuming enough physical memory that it didn't cause thrashing, but that's not a huge amount of memory by modern standards.
Edit: yes, a vector is basically a dynamic array.
Edit2: The code you've added some some problems. Your while (! LabelFile.eof() ) is broken. You normally want to do something like while (LabelFile >> inputdata) instead. You're also reading the data somewhat inefficiently -- what you apparently expecting is two numbers separated by a tab. That being the case, I'd write the loop something like:
while (LabelFile >> node >> label)
Label[node] = label;

Unordered map vs vector

I'm building a little 2d game engine. Now I need to store the prototypes of the game objects (all type of informations). A container that will have at most I guess few thousand elements all with unique key and no elements will be deleted or added after a first load. The key value is a string.
Various threads will run, and I need to send to everyone a key(or index) and with that access other information(like a texture for the render process or sound for the mixer process) available only to those threads.
Normally I use vectors because they are way faster to accessing a known element. But I see that unordered map also usually have a constant speed if I use the ::at element access. It would make the code much cleaner and also easier to maintain because I will deal with much more understandable man made strings.
So the question is, the difference in speed between a access to a vector[n] compared to a unorderedmap.at("string") is negligible compared to his benefits?
From what I understand accessing various maps in different part of the program, with different threads running just with a "name" for me is a big deal and the speed difference isn't that great. But I'm too inexperienced to be sure of this. Although I found informations about it seem I can't really understand if I'm right or wrong.
Thank you for your time.
As an alternative, you could consider using an ordered vector because the vector itself will not be modified. You can easily write an implementation yourself with STL lower_bound etc, or use an implementation from libraries ( boost::flat_map).
There is a blog post from Scott Meyers about container performance in this case. He did some benchmarks and the conclusion would be that an unordered_mapis probably a very good choice with high chances that it will be the fastest option. If you have a restricted set of keys, you can also compute a minimal optimal hash function, e.g. with gperf
However, for these kind of problems the first rule is to measure yourself.
My problem was to find a record on a container by a given std::string type as Key access. Considering Keys that only EXISTS(not finding them was not a option) and the elements of this container are generated only at the beginning of the program and never touched thereafter.
I had huge fears unordered map was not fast enough. So I tested it, and I want to share the results hoping I've not mistaken everything.
I just hope that can help others like me and to get some feedback because in the end I'm beginner.
So, given a struct of record filled randomly like this:
struct The_Mess
{
std::string A_string;
long double A_ldouble;
char C[10];
int* intPointer;
std::vector<unsigned int> A_vector;
std::string Another_String;
}
I made a undordered map, give that A_string contain the key of the record:
std::unordered_map<std::string, The_Mess> The_UnOrdMap;
and a vector I sort by the A_string value(which contain the key):
std::vector<The_Mess> The_Vector;
with also a index vector sorted, and used to access as 3thrd way:
std::vector<std::string> index;
The key will be a random string of 0-20 characters in lenght(I wanted the worst possible scenario) containing letter both capital and normal and numbers or spaces.
So, in short our contendents are:
Unordered map I measure the time the program get to execute:
record = The_UnOrdMap.at( key ); record is just a The_Mess struct.
Sorted Vector measured statements:
low = std::lower_bound (The_Vector.begin(), The_Vector.end(), key, compare);
record = *low;
Sorted Index vector:
low2 = std::lower_bound( index.begin(), index.end(), key);
indice = low2 - index.begin();
record = The_Vector[indice];
The time is in nanoseconds and is a arithmetic average of 200 iterations. I have a vector that I shuffle at every iteration containing all the keys, and at every iteration I cycle through it and look for the key I have here in the three ways.
So this are my results:
I think the initials spikes are a fault of my testing logic(the table I iterate contains only the keys generated so far, so it only has 1-n elements). So 200 iterations of 1 key search for the first time. 200 iterations of 2 keys search the second time etc...
Anyway, it seem that in the end the best option is the unordered map, considering that is a lot less code, it's easier to implement and will make the whole program way easier to read and probably maintain/modify.
You have to think about caching as well. In case of std::vector you'll have very good cache performance when accessing the elements - when accessing one element in RAM, CPU will cache nearby memory values and this will include nearby portions of your std::vector.
When you use std::map (or std::unordered_map) this is no longer true. Maps are usually implemented as self balancing binary-search trees, and in this case values can be scattered around the RAM. This imposes great hit on cache performance, especially as maps get bigger and bigger as CPU just cannot cache the memory that you're about to access.
You'll have to run some tests and measure performance, but cache misses can greatly hurt the performance of your program.
You are most likely to get the same performance (the difference will not be measurable).
Contrary to what some people seem to believe, unordered_map is not a binary tree. The underlying data structure is a vector. As a result, cache locality does not matter here - it is the same as for vector. Granted, you are going to suffer if you have collissions due to your hashing function being bad. But if your key is a simple integer, this is not going to happen. As a result, access to to element in hash map will be exactly the same as access to the element in the vector with time spent on getting hash value for integer, which is really non-measurable.

Why is std::tr1::unordered_map slower than a homegrown hash map?

I wrote a basic program that takes strings and counts the incidences of unique ones by inserting them into a string->integer hash map.
I use std::tr1::unordered_map for the storage, templated for a custom hash function and a custom equality function. The key type is actually char* rather than the too-slow std::string.
I then changed the same code to use a very, very simple hash table (really an array of {key, value} structures indexed by hash) with a power-of-two size and linear probing for collisions. The program got 33% faster.
Given that when I was using tr1::unordered_map I presized the hash table so it never had to grow, and that I was using exactly the same hash and comparison routines, what is tr1::unordered_map doing that slows it down by 50% as compared to the most basic hash map imaginable?
Code for the hash map type I'm talking about as "simple" here:
typedef struct dataitem {
char* item;
size_t count;
} dataitem_t;
dataitem_t hashtable[HASHTABLE_SIZE] = {{NULL,0}}; // Start off with empty table
void insert(char* item) {
size_t hash = generate_hash(item);
size_t firsthash = hash;
while (true) {
hash &= HASHTABLE_SIZE_MASK; // Bitmasking effect is hash %= HASHTABLE_SIZE
if (hashtable[hash].item == NULL) { // Free bucket
hashtable[hash].item = item;
hashtable[hash].count = 1;
break;
}
if (strcmp(hashtable[hash].item, item) == 0) { // Not hash collision; same item
hashtable[hash].count += 1;
break;
}
hash++; // Hash collision. Move to next bucket (linear probing)
if (hash == firsthash) {
// Table is full. This does not happen because the presizing is correct.
exit(1);
}
}
}
I wish to extend #AProgrammer answer.
Your hash map is simple because it is custom tailored to your need. On the other hand std::tr1::unordered_map has to fulfill a number of different tasks, and do well in all case. This require a mean-performance approach in all cases, so it'll never be excellent in any particular area.
Hash containers are very special in that there are many ways to implement them, you chose Open-Addressing, while the standard forces a bucket approach on the implementors. Both have different trade-offs, and this is one reason why the standard, this time, actually enforced a particular implementation: so that performance do not change dramatically when switching from one library to another. Simply specifying Big-O complexity / amortized complexity would not have been enough here.
You say that you instructed the unordered_map as to the number of finals elements, but did you change the load factor ? Chaining is notoriously "bad" (because of the lack of memory locality) in case of collisions, and using a smaller load factor would favor spreading out your elements.
Finally, to point out one difference: what happens when you resize your hash map ? By using chaining, the unordered_map does not move the elements in memory:
references to them are still valid (even though the iterators may be invalidated)
in case of big or complex objects, there is no invocation of copy constructors
This is in contrast with your simple implementation, which would incur O(N) copies (unless you use linear rehashing to spread out the work, but this is definitely not simple).
It seems, therefore, that the choice for unordered_map was to smooth the spikes, at the cost of a slower average insert.
There is something you can do though: provide a custom allocator. By writing a specific allocator for your usecase, and allocate all its memory in one go (since you know how many objects will be inserted, and can have the allocator report how much memory is a node). Then allocate the nodes in a stack-like fashion (simple pointer increase). It should improve (somewhat) the performance.
Your "homegrown hash map" is not a hash map at all, it's an intrusive hash set.
And that's the reason it's faster. Simple as that.
Well, actually intrusive hash set isn't exact either, but it's the closest match.
In general comparing speed of components not build to the same spec isn't fair.
Without knowing exactly what you have measured -- which mix of operations on which load factor with which mix of present/absent data --, it is difficult to explain where the difference come from.
The TR1 of g++ solve collision by chaining. This implies dynamic allocation. But this also gives better performance at high load level.
Your "homegrown" hash map is faster1 than std::tr1::unordered_map because, as you yourself said, your homegrown hash map is "simple" and it doesn't handle checking if the hash table is full. And possibly many things that you're not checking before operating on it. That may be the reason why your hash map is faster than std::tr1::unordered_map.
Also, the performance of std::tr1::unordered_map is defined by the implementation, so different implementation would perform differently speed-wise. You can see its implementation and compare it with yours, as that is the first thing you can do, and I believe, that will also answer your question to some extent.
1. I just assumed your claim to be correct, and based on it I said the above thing.

Difference in performance between map and unordered_map in c++

I have a simple requirement, i need a map of type . however i need fastest theoretically possible retrieval time.
i used both map and the new proposed unordered_map from tr1
i found that at least while parsing a file and creating the map, by inserting an element at at time.
map took only 2 minutes while unordered_map took 5 mins.
As i it is going to be part of a code to be executed on Hadoop cluster and will contain ~100 million entries, i need smallest possible retrieval time.
Also another helpful information:
currently the data (keys) which is being inserted is range of integers from 1,2,... to ~10 million.
I can also impose user to specify max value and to use order as above, will that significantly effect my implementation? (i heard map is based on rb trees and inserting in increasing order leads to better performance (or worst?) )
here is the code
map<int,int> Label // this is being changed to unordered_map
fstream LabelFile("Labels.txt");
// Creating the map from the Label.txt
if (LabelFile.is_open())
{
while (! LabelFile.eof() )
{
getline (LabelFile,inputLine);
try
{
curnode=inputLine.substr(0,inputLine.find_first_of("\t"));
nodelabel=inputLine.substr(inputLine.find_first_of("\t")+1,inputLine.size()-1);
Label[atoi(curnode.c_str())]=atoi(nodelabel.c_str());
}
catch(char* strerr)
{
failed=true;
break;
}
}
LabelFile.close();
}
Tentative Solution: After review of comments and answers, i believe a Dynamic C++ array would be the best option, since the implementation will use dense keys. Thanks
Insertion for unordered_map should be O(1) and retrieval should be roughly O(1), (its essentially a hash-table).
Your timings as a result are way OFF, or there is something WRONG with your implementation or usage of unordered_map.
You need to provide some more information, and possibly how you are using the container.
As per section 6.3 of n1836 the complexities for insertion/retreival are given:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf
One issue you should consider is that your implementation may need to continually be rehashing the structure, as you say you have 100mil+ items. In that case when instantiating the container, if you have a rough idea about how many "unique" elements will be inserted into the container, you can pass that in as a parameter to the constructor and the container will be instantiated accordingly with a bucket-table of appropriate size.
The extra time loading the unordered_map is due to dynamic array resizing. The resizing schedule is to double the number of cells each when the table exceeds it's load factor. So from an empty table, expect O(lg n) copies of the entire data table. You can eliminate these extra copies by sizing the hash table upfront. Specifically
Label.reserve(expected_number_of_entries / Label.max_load_factor());
Dividing by the max_load_factor is to account for the empty cells that are necessary for the hash table to operate.
unordered_map (at least in most implementations) gives fast retrieval, but relatively poor insertion speed compared to map. A tree is generally at its best when the data is randomly ordered, and at its worst when the data is ordered (you constantly insert at one end of the tree, increasing the frequency of re-balancing).
Given that it's ~10 million total entries, you could just allocate a large enough array, and get really fast lookups -- assuming enough physical memory that it didn't cause thrashing, but that's not a huge amount of memory by modern standards.
Edit: yes, a vector is basically a dynamic array.
Edit2: The code you've added some some problems. Your while (! LabelFile.eof() ) is broken. You normally want to do something like while (LabelFile >> inputdata) instead. You're also reading the data somewhat inefficiently -- what you apparently expecting is two numbers separated by a tab. That being the case, I'd write the loop something like:
while (LabelFile >> node >> label)
Label[node] = label;

Need to store string as id for objects in some fast data structure

I'm implementing a session store for a web-server. Keys are string
and stored objects are pointers. I tried using map but need something
faster. I will look up an object 5-20 times
as frequent than insert.
I tried using hash-map but failed. I felt like I got more constraints than more free time.
I'm coding c/c++ under Linux.
I don't want to commit to boost, since my web server is going to outlive boost. :)
This is a highly relevant question since the hardware (ssd disk) is
changing rapidly. What was the right solution will not be in 2 years.
I was going to suggest a map, but I see you have already ruled this out.
I tried using map but need something
faster.
These are the std::map performance bounds courtesy of the Wikipedia page:
Searching for an element takes O(log n) time
Inserting a new element takes O(log n) time
Incrementing/decrementing an iterator takes O(log n) time
Iterating through every element of a map takes O(n) time
Removing a single map element takes O(log n) time
Copying an entire map takes O(n log n) time.
How have you measured and determined that a map is not optimised sufficiently for you? It's quite possible that any bottlenecks you are seeing are in other parts of the code, and a map is perfectly adequate.
The above bounds seem like they would fit within all but the most stringent scalability requirements.
The type of data structure that will be used will be determined by the data you want to access. Some questions you should ask:
How many items will be in the session store? 50? 100000? 10000000000?
How large is each item in the store (byte size)?
What kind of string input is used for the key? ASCII-7? UTF-8? UCS2?
...
Hash tables generally perform very well for look ups. You can optimize them heavily for speed by writing them yourself (and yes, you can resize the table). Suggestions to improve performance with hash tables:
Choose a good hash function! this will have preferably even distribution among your hash table and will not be time intensive to compute (this will depend on the format of the key input).
Make sure that if you are using buckets to not exceed a length of 6. If you do exceed 6 buckets then your hash function probably isn't distributing evenly enough. A bucket length of < 3 is preferable.
Watch out for how you allocate your objects. If at all possible, try to allocate them near each other in memory to take advantage of locality of reference. If you need to, write your own sub-allocator/heap manager. Also keep to aligned boundaries for better access speeds (aligned is processor/bus dependent so you'll have to determine if you want to target a particular processor type).
BTrees are also very good and in general perform well. (Someone can insert info about btrees here).
I'd recommend looking at the data you are storing and making sure that the data is as small as possible. use shorts, unsigned char, bit fields as necessary. There are other additional ways to squeeze out improved performance as well such as allocating your string data at the end of your struct at the same time that you allocate the struct. i.e.
struct foo {
int a;
char my_string[0]; // allocate an instance of foo to be
// sizeof(int) + sizeof(your string data) etc
}
You may also find that implementing your own string compare routine can actually boost performance dramatically, however this will depend upon your input data.
It is possible to make your own. But you shouldn't have any problems with boost or std::tr1::unordered_map.
A ternary trie may be faster than a hash map for a smaller number of elements.