Inserting to unordered_map takes too much time - c++

I wrote a program that reads numbers out of a file (about 500,000 of them), and inserts them to a data structure. the numbers are distinct.
I'm inserting the numbers to an unordered_map with another struct (using std::make_pair(myNumber, emptyStruct)).
And after the insertion of all the numbers, I'm using it to search only a couple of hundred times. I never delete the DS until the program finish running.
After profiling, I've noticed that the insert operation takes about 50% of the running time. (There is also some other code, that runs as many times as the insertion, but it doesn't take so much time).
I thought maybe the resizing takes time, so I used the reserve function with 500,000, but the results are still the same.
As far as I know, this DS should be O(1) insert and search (and the trade off is large memory), so I don't see why it takes so much time to insert. How can I improve my results?

Unordered maps are implemented with a hash table. It has amortised constant insertion time. Reserving size to the map helps, but not by too much. There is not much better you can do in terms of insertions to it.
This means that you might be able to shave some time, but it is only going to be marginal. For instance, inserting into a vector is slightly faster, but it is also amortized constant time. So you will shave some seconds in the insertion at the cost of the search.
This is where a database helps. Say you have the data in a sqlite database instead. You create the database, create the table with the search value as its primary key, and the data value as its other attribute, insert the values into a table once. Now, the program simply runs and queries the database. It only reads the minimum necessary. In this case, the sqlite database takes the role of the unordered map you are using.

Since you are specifically not using a value, and merely searching for existence, go with std::unordered_set. It does what you wanted when you made a dummy value to go with every key in the map.
First, I want to re-iterate what everyone said: inserting 500,000 items to use it a few hundred times is going to take up a sizable chunk of your time, and you can't really avoid that, unless you can turn it around -- build a set of the things you are searching for, then search that 500,000 times.
All that said, I was able to get some improvement on the insertion of 500,000 items in a test app, by taking into account the nature of hash tables:
Reviewing http://en.cppreference.com/w/cpp/container/unordered_map, I found these:
[Insert] Complexity: Average case: O(1), worst case O(size())
By default, unordered_map containers have a max_load_factor of 1.0.
When you reserve space for 500000 items, you get 500000 buckets. If you put 500000 pieces of data in 500000 buckets, you are going to get a lot of collisions. I reserved extra space, and it was faster.
If you really need speed, and are willing to get some errors, look into bloom filters.

Related

Difference between multimap and unordered_multimap in c++? [duplicate]

I have a simple requirement, i need a map of type . however i need fastest theoretically possible retrieval time.
i used both map and the new proposed unordered_map from tr1
i found that at least while parsing a file and creating the map, by inserting an element at at time.
map took only 2 minutes while unordered_map took 5 mins.
As i it is going to be part of a code to be executed on Hadoop cluster and will contain ~100 million entries, i need smallest possible retrieval time.
Also another helpful information:
currently the data (keys) which is being inserted is range of integers from 1,2,... to ~10 million.
I can also impose user to specify max value and to use order as above, will that significantly effect my implementation? (i heard map is based on rb trees and inserting in increasing order leads to better performance (or worst?) )
here is the code
map<int,int> Label // this is being changed to unordered_map
fstream LabelFile("Labels.txt");
// Creating the map from the Label.txt
if (LabelFile.is_open())
{
while (! LabelFile.eof() )
{
getline (LabelFile,inputLine);
try
{
curnode=inputLine.substr(0,inputLine.find_first_of("\t"));
nodelabel=inputLine.substr(inputLine.find_first_of("\t")+1,inputLine.size()-1);
Label[atoi(curnode.c_str())]=atoi(nodelabel.c_str());
}
catch(char* strerr)
{
failed=true;
break;
}
}
LabelFile.close();
}
Tentative Solution: After review of comments and answers, i believe a Dynamic C++ array would be the best option, since the implementation will use dense keys. Thanks
Insertion for unordered_map should be O(1) and retrieval should be roughly O(1), (its essentially a hash-table).
Your timings as a result are way OFF, or there is something WRONG with your implementation or usage of unordered_map.
You need to provide some more information, and possibly how you are using the container.
As per section 6.3 of n1836 the complexities for insertion/retreival are given:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf
One issue you should consider is that your implementation may need to continually be rehashing the structure, as you say you have 100mil+ items. In that case when instantiating the container, if you have a rough idea about how many "unique" elements will be inserted into the container, you can pass that in as a parameter to the constructor and the container will be instantiated accordingly with a bucket-table of appropriate size.
The extra time loading the unordered_map is due to dynamic array resizing. The resizing schedule is to double the number of cells each when the table exceeds it's load factor. So from an empty table, expect O(lg n) copies of the entire data table. You can eliminate these extra copies by sizing the hash table upfront. Specifically
Label.reserve(expected_number_of_entries / Label.max_load_factor());
Dividing by the max_load_factor is to account for the empty cells that are necessary for the hash table to operate.
unordered_map (at least in most implementations) gives fast retrieval, but relatively poor insertion speed compared to map. A tree is generally at its best when the data is randomly ordered, and at its worst when the data is ordered (you constantly insert at one end of the tree, increasing the frequency of re-balancing).
Given that it's ~10 million total entries, you could just allocate a large enough array, and get really fast lookups -- assuming enough physical memory that it didn't cause thrashing, but that's not a huge amount of memory by modern standards.
Edit: yes, a vector is basically a dynamic array.
Edit2: The code you've added some some problems. Your while (! LabelFile.eof() ) is broken. You normally want to do something like while (LabelFile >> inputdata) instead. You're also reading the data somewhat inefficiently -- what you apparently expecting is two numbers separated by a tab. That being the case, I'd write the loop something like:
while (LabelFile >> node >> label)
Label[node] = label;

Unordered map vs vector

I'm building a little 2d game engine. Now I need to store the prototypes of the game objects (all type of informations). A container that will have at most I guess few thousand elements all with unique key and no elements will be deleted or added after a first load. The key value is a string.
Various threads will run, and I need to send to everyone a key(or index) and with that access other information(like a texture for the render process or sound for the mixer process) available only to those threads.
Normally I use vectors because they are way faster to accessing a known element. But I see that unordered map also usually have a constant speed if I use the ::at element access. It would make the code much cleaner and also easier to maintain because I will deal with much more understandable man made strings.
So the question is, the difference in speed between a access to a vector[n] compared to a unorderedmap.at("string") is negligible compared to his benefits?
From what I understand accessing various maps in different part of the program, with different threads running just with a "name" for me is a big deal and the speed difference isn't that great. But I'm too inexperienced to be sure of this. Although I found informations about it seem I can't really understand if I'm right or wrong.
Thank you for your time.
As an alternative, you could consider using an ordered vector because the vector itself will not be modified. You can easily write an implementation yourself with STL lower_bound etc, or use an implementation from libraries ( boost::flat_map).
There is a blog post from Scott Meyers about container performance in this case. He did some benchmarks and the conclusion would be that an unordered_mapis probably a very good choice with high chances that it will be the fastest option. If you have a restricted set of keys, you can also compute a minimal optimal hash function, e.g. with gperf
However, for these kind of problems the first rule is to measure yourself.
My problem was to find a record on a container by a given std::string type as Key access. Considering Keys that only EXISTS(not finding them was not a option) and the elements of this container are generated only at the beginning of the program and never touched thereafter.
I had huge fears unordered map was not fast enough. So I tested it, and I want to share the results hoping I've not mistaken everything.
I just hope that can help others like me and to get some feedback because in the end I'm beginner.
So, given a struct of record filled randomly like this:
struct The_Mess
{
std::string A_string;
long double A_ldouble;
char C[10];
int* intPointer;
std::vector<unsigned int> A_vector;
std::string Another_String;
}
I made a undordered map, give that A_string contain the key of the record:
std::unordered_map<std::string, The_Mess> The_UnOrdMap;
and a vector I sort by the A_string value(which contain the key):
std::vector<The_Mess> The_Vector;
with also a index vector sorted, and used to access as 3thrd way:
std::vector<std::string> index;
The key will be a random string of 0-20 characters in lenght(I wanted the worst possible scenario) containing letter both capital and normal and numbers or spaces.
So, in short our contendents are:
Unordered map I measure the time the program get to execute:
record = The_UnOrdMap.at( key ); record is just a The_Mess struct.
Sorted Vector measured statements:
low = std::lower_bound (The_Vector.begin(), The_Vector.end(), key, compare);
record = *low;
Sorted Index vector:
low2 = std::lower_bound( index.begin(), index.end(), key);
indice = low2 - index.begin();
record = The_Vector[indice];
The time is in nanoseconds and is a arithmetic average of 200 iterations. I have a vector that I shuffle at every iteration containing all the keys, and at every iteration I cycle through it and look for the key I have here in the three ways.
So this are my results:
I think the initials spikes are a fault of my testing logic(the table I iterate contains only the keys generated so far, so it only has 1-n elements). So 200 iterations of 1 key search for the first time. 200 iterations of 2 keys search the second time etc...
Anyway, it seem that in the end the best option is the unordered map, considering that is a lot less code, it's easier to implement and will make the whole program way easier to read and probably maintain/modify.
You have to think about caching as well. In case of std::vector you'll have very good cache performance when accessing the elements - when accessing one element in RAM, CPU will cache nearby memory values and this will include nearby portions of your std::vector.
When you use std::map (or std::unordered_map) this is no longer true. Maps are usually implemented as self balancing binary-search trees, and in this case values can be scattered around the RAM. This imposes great hit on cache performance, especially as maps get bigger and bigger as CPU just cannot cache the memory that you're about to access.
You'll have to run some tests and measure performance, but cache misses can greatly hurt the performance of your program.
You are most likely to get the same performance (the difference will not be measurable).
Contrary to what some people seem to believe, unordered_map is not a binary tree. The underlying data structure is a vector. As a result, cache locality does not matter here - it is the same as for vector. Granted, you are going to suffer if you have collissions due to your hashing function being bad. But if your key is a simple integer, this is not going to happen. As a result, access to to element in hash map will be exactly the same as access to the element in the vector with time spent on getting hash value for integer, which is really non-measurable.

Difference in performance between map and unordered_map in c++

I have a simple requirement, i need a map of type . however i need fastest theoretically possible retrieval time.
i used both map and the new proposed unordered_map from tr1
i found that at least while parsing a file and creating the map, by inserting an element at at time.
map took only 2 minutes while unordered_map took 5 mins.
As i it is going to be part of a code to be executed on Hadoop cluster and will contain ~100 million entries, i need smallest possible retrieval time.
Also another helpful information:
currently the data (keys) which is being inserted is range of integers from 1,2,... to ~10 million.
I can also impose user to specify max value and to use order as above, will that significantly effect my implementation? (i heard map is based on rb trees and inserting in increasing order leads to better performance (or worst?) )
here is the code
map<int,int> Label // this is being changed to unordered_map
fstream LabelFile("Labels.txt");
// Creating the map from the Label.txt
if (LabelFile.is_open())
{
while (! LabelFile.eof() )
{
getline (LabelFile,inputLine);
try
{
curnode=inputLine.substr(0,inputLine.find_first_of("\t"));
nodelabel=inputLine.substr(inputLine.find_first_of("\t")+1,inputLine.size()-1);
Label[atoi(curnode.c_str())]=atoi(nodelabel.c_str());
}
catch(char* strerr)
{
failed=true;
break;
}
}
LabelFile.close();
}
Tentative Solution: After review of comments and answers, i believe a Dynamic C++ array would be the best option, since the implementation will use dense keys. Thanks
Insertion for unordered_map should be O(1) and retrieval should be roughly O(1), (its essentially a hash-table).
Your timings as a result are way OFF, or there is something WRONG with your implementation or usage of unordered_map.
You need to provide some more information, and possibly how you are using the container.
As per section 6.3 of n1836 the complexities for insertion/retreival are given:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf
One issue you should consider is that your implementation may need to continually be rehashing the structure, as you say you have 100mil+ items. In that case when instantiating the container, if you have a rough idea about how many "unique" elements will be inserted into the container, you can pass that in as a parameter to the constructor and the container will be instantiated accordingly with a bucket-table of appropriate size.
The extra time loading the unordered_map is due to dynamic array resizing. The resizing schedule is to double the number of cells each when the table exceeds it's load factor. So from an empty table, expect O(lg n) copies of the entire data table. You can eliminate these extra copies by sizing the hash table upfront. Specifically
Label.reserve(expected_number_of_entries / Label.max_load_factor());
Dividing by the max_load_factor is to account for the empty cells that are necessary for the hash table to operate.
unordered_map (at least in most implementations) gives fast retrieval, but relatively poor insertion speed compared to map. A tree is generally at its best when the data is randomly ordered, and at its worst when the data is ordered (you constantly insert at one end of the tree, increasing the frequency of re-balancing).
Given that it's ~10 million total entries, you could just allocate a large enough array, and get really fast lookups -- assuming enough physical memory that it didn't cause thrashing, but that's not a huge amount of memory by modern standards.
Edit: yes, a vector is basically a dynamic array.
Edit2: The code you've added some some problems. Your while (! LabelFile.eof() ) is broken. You normally want to do something like while (LabelFile >> inputdata) instead. You're also reading the data somewhat inefficiently -- what you apparently expecting is two numbers separated by a tab. That being the case, I'd write the loop something like:
while (LabelFile >> node >> label)
Label[node] = label;

Need to store string as id for objects in some fast data structure

I'm implementing a session store for a web-server. Keys are string
and stored objects are pointers. I tried using map but need something
faster. I will look up an object 5-20 times
as frequent than insert.
I tried using hash-map but failed. I felt like I got more constraints than more free time.
I'm coding c/c++ under Linux.
I don't want to commit to boost, since my web server is going to outlive boost. :)
This is a highly relevant question since the hardware (ssd disk) is
changing rapidly. What was the right solution will not be in 2 years.
I was going to suggest a map, but I see you have already ruled this out.
I tried using map but need something
faster.
These are the std::map performance bounds courtesy of the Wikipedia page:
Searching for an element takes O(log n) time
Inserting a new element takes O(log n) time
Incrementing/decrementing an iterator takes O(log n) time
Iterating through every element of a map takes O(n) time
Removing a single map element takes O(log n) time
Copying an entire map takes O(n log n) time.
How have you measured and determined that a map is not optimised sufficiently for you? It's quite possible that any bottlenecks you are seeing are in other parts of the code, and a map is perfectly adequate.
The above bounds seem like they would fit within all but the most stringent scalability requirements.
The type of data structure that will be used will be determined by the data you want to access. Some questions you should ask:
How many items will be in the session store? 50? 100000? 10000000000?
How large is each item in the store (byte size)?
What kind of string input is used for the key? ASCII-7? UTF-8? UCS2?
...
Hash tables generally perform very well for look ups. You can optimize them heavily for speed by writing them yourself (and yes, you can resize the table). Suggestions to improve performance with hash tables:
Choose a good hash function! this will have preferably even distribution among your hash table and will not be time intensive to compute (this will depend on the format of the key input).
Make sure that if you are using buckets to not exceed a length of 6. If you do exceed 6 buckets then your hash function probably isn't distributing evenly enough. A bucket length of < 3 is preferable.
Watch out for how you allocate your objects. If at all possible, try to allocate them near each other in memory to take advantage of locality of reference. If you need to, write your own sub-allocator/heap manager. Also keep to aligned boundaries for better access speeds (aligned is processor/bus dependent so you'll have to determine if you want to target a particular processor type).
BTrees are also very good and in general perform well. (Someone can insert info about btrees here).
I'd recommend looking at the data you are storing and making sure that the data is as small as possible. use shorts, unsigned char, bit fields as necessary. There are other additional ways to squeeze out improved performance as well such as allocating your string data at the end of your struct at the same time that you allocate the struct. i.e.
struct foo {
int a;
char my_string[0]; // allocate an instance of foo to be
// sizeof(int) + sizeof(your string data) etc
}
You may also find that implementing your own string compare routine can actually boost performance dramatically, however this will depend upon your input data.
It is possible to make your own. But you shouldn't have any problems with boost or std::tr1::unordered_map.
A ternary trie may be faster than a hash map for a smaller number of elements.

Fast container for setting bits in a sparse domain, and iterating (C++)?

I need a fast container with only two operations. Inserting keys on from a very sparse domain (all 32bit integers, and approx. 100 are set at a given time), and iterating over the inserted keys. It should deal with a lot of insertions which hit the same entries (like, 500k, but only 100 different ones).
Currently, I'm using a std::set (only insert and the iterating interface), which is decent, but still not fast enough. std::unordered_set was twice as slow, same for the Google Hash Maps. I wonder what data structure is optimized for this case?
Depending on the distribution of the input, you might be able to get some improvement without changing the structure.
If you tend to get a lot of runs of a single value, then you can probably speed up insertions by keeping a record of the last value you inserted, and don't bother doing the insertion if it matches. It costs an extra comparison per input, but saves a lookup for each element in a run beyond the first. So it could improve things no matter what data structure you're using, depending on the frequency of repeats and the relative cost of comparison vs insertion.
If you don't get runs, but you tend to find that values aren't evenly distributed, then a splay tree makes accessing the most commonly-used elements cheaper. It works by creating a deliberately-unbalanced tree with the frequent elements near the top, like a Huffman code.
I'm not sure I understand "a lot of insertions which hit the same entries". Do you mean that there are only 100 values which are ever members, but 500k mostly-duplicate operations which insert one of those 100 values?
If so, then I'd guess that the fastest container would be to generate a collision-free hash over those 100 values, then maintain an array (or vector) of flags (int or bit, according to what works out fastest on your architecture).
I leave generating the hash as an exercise for the reader, since it's something that I'm aware exists as a technique, but I've never looked into it myself. The point is to get a fast hash over as small a range as possible, such that for each n, m in your 100 values, hash(n) != hash(m).
So insertion looks like array[hash(value)] = 1;, deletion looks like array[hash(value)] = 0; (although you don't need that), and to enumerate you run over the array, and for each set value at index n, inverse_hash(n) is in your collection. For a small range you can easily maintain a lookup table to perform the inverse hash, or instead of scanning the whole array looking for set flags, you can run over the 100 potentially-in values checking each in turn.
Sorry if I've misunderstood the situation and this is useless to you. And to be honest, it's not very much faster than a regular hashtable, since realistically for 100 values you can easily size the table such that there will be few or no collisions, without using so much memory as to blow your caches.
For an in-use set expected to be this small, a non-bucketed hash table might be OK. If you can live with an occasional expansion operation, grow it in powers of 2 if it gets more than 70% full. Cuckoo hashing has been discussed on Stackoverflow before and might also be a good approach for a set this small. If you really need to optimise for speed, you can implement the hashing function and lookup in assembler - on linear data structures this will be very simple so the coding and maintenance effort for an assembler implementation shouldn't be unduly hard to maintain.
You might want to consider implementing a HashTree using a base 10 hash function at each level instead of a binary hash function. You could either make it non-bucketed, in which case your performance would be deterministic (log10) or adjust your bucket size based on your expected distribution so that you only have a couple of keys/bucket.
A randomized data structure might be perfect for your job. Take a look at the skip list – though I don't know any decend C++ implementation of it. I intended to submit one to Boost but never got around to do it.
Maybe a set with a b-tree (instead of binary tree) as internal data structure. I found this article on codeproject which implements this.
Note that while inserting into a hash table is fast, iterating over it isn't particularly fast, since you need to iterate over the entire array.
Which operation is slow for you? Do you do more insertions or more iteration?
How much memory do you have? 32-bits take "only" 4GB/8 bytes, which comes to 512MB, not much for a high-end server. That would make your insertions O(1). But that could make the iteration slow. Although skipping all words with only zeroes would optimize away most iterations. If your 100 numbers are in a relatively small range, you can optimize even further by keeping the minimum and maximum around.
I know this is just brute force, but sometimes brute force is good enough.
Since no one has explicitly mentioned it, have you thought about memory locality? A really great data structure with an algorithm for insertion that causes a page fault will do you no good. In fact a data structure with an insert that merely causes a cache miss would likely be really bad for perf.
Have you made sure a naive unordered set of elements packed in a fixed array with a simple swap to front when an insert collisides is too slow? Its a simple experiment that might show you have memory locality issues rather than algorithmic issues.