Most efficient container for uint16_t to uint16_t map - c++

I am developing program for very processing power limited machine where I want to map uint16_t keys to uint16_t values.
I am currently using std::map use non-safe reading:
std::map<uint16_t, uint16_t> m;
//fill m only once
while(true){
auto x = m[y];
}
The performance is still non-acceptable for the requirements. Is there a better solution in term of execution speed?
Edit:
Some information:
Total number of items is less than 500
Insertion is done only once
Querying values is done more than 250 times / sec
Keys and values are unique
Ram is very limited, overall ram is 512KB, free ram for this part of code is less than 50KB
100 MHz single core processor

Without some more context about your map,
if you intend to use a lot of keys, a big array like suggested before would be easy enough to handle since there wouldn't be collisions, but if you're not going to use all the memory, it might be wasteful.
if you intend to use a fair amount of data, but not enough that there'd be too many hash collisions, std::unordered_map has amortized O(1) lookups, and if you don't care about the order they're stored in, that could be a good guess.
if you're using not a lot of data and require it to be flexible, std::vector is a good choice
Seeing as all we know is that it's a uin16_t to uint16_t map, there's no one best answer.

Total number of items is less than 500
Insertion is done only once
Keep a fixed-size (500) array of key-value pairs plus a "valid items count", if the actual size is known only at runtime; populate it with the data and sort it; then you can simply do a binary search.
Given that the elements are few, depending on the specific processor it may even be more convenient to just do a linear search (perhaps keeping the most used elements first if you happen to have clues about the most frequently looked-up values).

Usually a binary tree (std::map which you currently use) provides adequate performance. Your CPU budget must be really small.
A binary search approach probably won't be much faster. A hashtable (std::unordered_map) might be a solution.
Just make sure to size it properly so as to avoid rehashing and stay just below 1.0 load factor.
std::unordered_map<uint16_t, uint16_t> m(503); // 503 is a prime
m.emplace(123, 234);
m.emplace(2345, 345);
std::cout << m.find(123)->second; // don't use operator[] to avoid creating entries
std::cout << m.find(2345)->second;
To check the quality of the hashing function, iterate over all the buckets by calling bucket_size() and add up anything above 1 (i.e. a collision).

Related

Difference between multimap and unordered_multimap in c++? [duplicate]

I have a simple requirement, i need a map of type . however i need fastest theoretically possible retrieval time.
i used both map and the new proposed unordered_map from tr1
i found that at least while parsing a file and creating the map, by inserting an element at at time.
map took only 2 minutes while unordered_map took 5 mins.
As i it is going to be part of a code to be executed on Hadoop cluster and will contain ~100 million entries, i need smallest possible retrieval time.
Also another helpful information:
currently the data (keys) which is being inserted is range of integers from 1,2,... to ~10 million.
I can also impose user to specify max value and to use order as above, will that significantly effect my implementation? (i heard map is based on rb trees and inserting in increasing order leads to better performance (or worst?) )
here is the code
map<int,int> Label // this is being changed to unordered_map
fstream LabelFile("Labels.txt");
// Creating the map from the Label.txt
if (LabelFile.is_open())
{
while (! LabelFile.eof() )
{
getline (LabelFile,inputLine);
try
{
curnode=inputLine.substr(0,inputLine.find_first_of("\t"));
nodelabel=inputLine.substr(inputLine.find_first_of("\t")+1,inputLine.size()-1);
Label[atoi(curnode.c_str())]=atoi(nodelabel.c_str());
}
catch(char* strerr)
{
failed=true;
break;
}
}
LabelFile.close();
}
Tentative Solution: After review of comments and answers, i believe a Dynamic C++ array would be the best option, since the implementation will use dense keys. Thanks
Insertion for unordered_map should be O(1) and retrieval should be roughly O(1), (its essentially a hash-table).
Your timings as a result are way OFF, or there is something WRONG with your implementation or usage of unordered_map.
You need to provide some more information, and possibly how you are using the container.
As per section 6.3 of n1836 the complexities for insertion/retreival are given:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf
One issue you should consider is that your implementation may need to continually be rehashing the structure, as you say you have 100mil+ items. In that case when instantiating the container, if you have a rough idea about how many "unique" elements will be inserted into the container, you can pass that in as a parameter to the constructor and the container will be instantiated accordingly with a bucket-table of appropriate size.
The extra time loading the unordered_map is due to dynamic array resizing. The resizing schedule is to double the number of cells each when the table exceeds it's load factor. So from an empty table, expect O(lg n) copies of the entire data table. You can eliminate these extra copies by sizing the hash table upfront. Specifically
Label.reserve(expected_number_of_entries / Label.max_load_factor());
Dividing by the max_load_factor is to account for the empty cells that are necessary for the hash table to operate.
unordered_map (at least in most implementations) gives fast retrieval, but relatively poor insertion speed compared to map. A tree is generally at its best when the data is randomly ordered, and at its worst when the data is ordered (you constantly insert at one end of the tree, increasing the frequency of re-balancing).
Given that it's ~10 million total entries, you could just allocate a large enough array, and get really fast lookups -- assuming enough physical memory that it didn't cause thrashing, but that's not a huge amount of memory by modern standards.
Edit: yes, a vector is basically a dynamic array.
Edit2: The code you've added some some problems. Your while (! LabelFile.eof() ) is broken. You normally want to do something like while (LabelFile >> inputdata) instead. You're also reading the data somewhat inefficiently -- what you apparently expecting is two numbers separated by a tab. That being the case, I'd write the loop something like:
while (LabelFile >> node >> label)
Label[node] = label;

Unordered map vs vector

I'm building a little 2d game engine. Now I need to store the prototypes of the game objects (all type of informations). A container that will have at most I guess few thousand elements all with unique key and no elements will be deleted or added after a first load. The key value is a string.
Various threads will run, and I need to send to everyone a key(or index) and with that access other information(like a texture for the render process or sound for the mixer process) available only to those threads.
Normally I use vectors because they are way faster to accessing a known element. But I see that unordered map also usually have a constant speed if I use the ::at element access. It would make the code much cleaner and also easier to maintain because I will deal with much more understandable man made strings.
So the question is, the difference in speed between a access to a vector[n] compared to a unorderedmap.at("string") is negligible compared to his benefits?
From what I understand accessing various maps in different part of the program, with different threads running just with a "name" for me is a big deal and the speed difference isn't that great. But I'm too inexperienced to be sure of this. Although I found informations about it seem I can't really understand if I'm right or wrong.
Thank you for your time.
As an alternative, you could consider using an ordered vector because the vector itself will not be modified. You can easily write an implementation yourself with STL lower_bound etc, or use an implementation from libraries ( boost::flat_map).
There is a blog post from Scott Meyers about container performance in this case. He did some benchmarks and the conclusion would be that an unordered_mapis probably a very good choice with high chances that it will be the fastest option. If you have a restricted set of keys, you can also compute a minimal optimal hash function, e.g. with gperf
However, for these kind of problems the first rule is to measure yourself.
My problem was to find a record on a container by a given std::string type as Key access. Considering Keys that only EXISTS(not finding them was not a option) and the elements of this container are generated only at the beginning of the program and never touched thereafter.
I had huge fears unordered map was not fast enough. So I tested it, and I want to share the results hoping I've not mistaken everything.
I just hope that can help others like me and to get some feedback because in the end I'm beginner.
So, given a struct of record filled randomly like this:
struct The_Mess
{
std::string A_string;
long double A_ldouble;
char C[10];
int* intPointer;
std::vector<unsigned int> A_vector;
std::string Another_String;
}
I made a undordered map, give that A_string contain the key of the record:
std::unordered_map<std::string, The_Mess> The_UnOrdMap;
and a vector I sort by the A_string value(which contain the key):
std::vector<The_Mess> The_Vector;
with also a index vector sorted, and used to access as 3thrd way:
std::vector<std::string> index;
The key will be a random string of 0-20 characters in lenght(I wanted the worst possible scenario) containing letter both capital and normal and numbers or spaces.
So, in short our contendents are:
Unordered map I measure the time the program get to execute:
record = The_UnOrdMap.at( key ); record is just a The_Mess struct.
Sorted Vector measured statements:
low = std::lower_bound (The_Vector.begin(), The_Vector.end(), key, compare);
record = *low;
Sorted Index vector:
low2 = std::lower_bound( index.begin(), index.end(), key);
indice = low2 - index.begin();
record = The_Vector[indice];
The time is in nanoseconds and is a arithmetic average of 200 iterations. I have a vector that I shuffle at every iteration containing all the keys, and at every iteration I cycle through it and look for the key I have here in the three ways.
So this are my results:
I think the initials spikes are a fault of my testing logic(the table I iterate contains only the keys generated so far, so it only has 1-n elements). So 200 iterations of 1 key search for the first time. 200 iterations of 2 keys search the second time etc...
Anyway, it seem that in the end the best option is the unordered map, considering that is a lot less code, it's easier to implement and will make the whole program way easier to read and probably maintain/modify.
You have to think about caching as well. In case of std::vector you'll have very good cache performance when accessing the elements - when accessing one element in RAM, CPU will cache nearby memory values and this will include nearby portions of your std::vector.
When you use std::map (or std::unordered_map) this is no longer true. Maps are usually implemented as self balancing binary-search trees, and in this case values can be scattered around the RAM. This imposes great hit on cache performance, especially as maps get bigger and bigger as CPU just cannot cache the memory that you're about to access.
You'll have to run some tests and measure performance, but cache misses can greatly hurt the performance of your program.
You are most likely to get the same performance (the difference will not be measurable).
Contrary to what some people seem to believe, unordered_map is not a binary tree. The underlying data structure is a vector. As a result, cache locality does not matter here - it is the same as for vector. Granted, you are going to suffer if you have collissions due to your hashing function being bad. But if your key is a simple integer, this is not going to happen. As a result, access to to element in hash map will be exactly the same as access to the element in the vector with time spent on getting hash value for integer, which is really non-measurable.

How to map 13 bit value to 4 bit code?

I have a std::map for some packet processing program.
I didn't noticed before profiling but unfortunately this map lookup alone consume about 10% CPU time (called too many time).
Usually there only exist at most 10 keys in the input data. So I'm trying to implement a kind of key cache in front of the map.
Key value is 13 bit integer. I know there are only 8192 possible keys and array of 8192 items can give constant time lookup but I feel already ashamed and don't want use such a naive approach :(
Now, I'm just guessing some method of hashing that yield 4 bit code value for 13 bit integer very fast.
Any cool idea?
Thanks in advance.
UPDATE
Beside my shame, I don't have total control over source code and it's almost prohibited to make new array for this purpose.
Project manager said (who ran the profiler) linked list show small performance gain and recommended using std::list instead of std::map.
UPDATE
Value of keys are random (no relationship) and doesn't have good distribution.
Sample:
1) 0x100, 0x101, 0x10, 0x0, 0xffe
2) 0x400, 0x401, 0x402, 0x403, 0x404, 0x405, 0xff
Assuming your hash table either contains some basic type -- it's almost no memory at all. Even on 64-bit systems it's only 64kb of memory. There is no shame in using a lookup table like that, it has some of the best performance you can get.
You may want to go with middle solution and open addressing technique: one array of size 256. Index to an array is some simple hash function like XOR of two bytes. Element of the array is struct {key, value}. Collisions are handled by storing collided element at the next available index. If you need to delete element from array, and if deletion is rare then just recreate array (create a temporary list from remaining elements, and then create array from this list).
If you pick your hash function smartly there would not be any collisions almost ever. For instance, from your two examples one such hash would be to XOR low nibble of high byte with high nibble of low byte (and do what you like with remaining 13-th bit).
Unless you're writing for some sort of embedded system where 8K is really significant, just use the array and move on. If you really insist on doing something else, you might consider a perfect hash generator (e.g., gperf).
If there are really only going to be something like 10 active entries in your table, you might seriously consider using an unsorted vector to hold this mapping. Something like this:
typedef int key_type;
typedef int value_type;
std::vector<std::pair<key_type, value_type> > mapping;
inline void put(key_type key, value_type value) {
for (size_t i=0; i<mapping.size(); ++i) {
if (mapping[i].first==key) {
mapping[i].second=value;
return;
}
}
mapping.push_back(std::make_pair(key, value));
}
inline value_type get(key_type key) {
for (size_t i=0; i<mapping.size(); ++i) {
if (mapping[i].first==key) {
return mapping[i].second;
}
}
// do something reasonable if not found?
return value_type();
}
Now, the asymptotic speed of these algorithms (each O(n)) is much worse than you'd have with either a red-black tree (like std::map at O(log n)) or hash table (O(1)). But you're not talking about dealing with a large number of objects, so asymptotic estimates don't really buy you much.
Additionally, std::vector buys you both low overhead and locality of reference, which neither std::map nor std::list can offer. So it's more likely that a small std::vector will stay entirely within the L1 cache. As it's almost certainly the memory bottleneck that's causing your performance issues, using a std::vector with even my poor choice of algorithm will likely be faster than either a tree or linked list. Of course, only a few solid profiles will tell you for sure.
There are certainly algorithms that might be better choices: a sorted vector could potentially give even better performance; a well tuned small hash table might work as well. I suspect that you'll run into Amdahl's law pretty quickly trying to improve on a simple unsorted vector, however. Pretty soon you might find yourself running into function call overhead, or some other such concern, as a large contributor to your profile.
I agree with GWW, you don't use so much memory in the end...
But if you want, you could use an array of 11 or 13 linkedlists, and hash the keys with the % function. If the key number is less than the array size, complexity tents still to be O(1).
When you always just have about ten keys, use a list (or array). Do some benchmarking to find out whether or not using a sorted list (or array) and binary search will improve performance.
You might first want to see if there are any unnecessary calls to the key lookup. You only want to do this once per packet ideally -- each time you call a function there is going to be some overhead, so getting rid of extra calls is good.
Map is generally pretty fast, but if there is any exploitable pattern in the way keys are mapped to items you could use that and potentially do better. Could you provide a bit more information about the keys and the associated 4-bit values? E.g. are they sequential, is there some sort of pattern?
Finally, as others have mentioned, a lookup table is very fast, 8192 values * 4 bits is only 4kb, a tiny amount of memory indeed.
I would use a lookup table. It's tiny unless you are using a micrcontroller or something.
Otherwise I would do this -
Generate a table of say 30 elements.
For each lookup calculate a hash value of (key % 30) and compare it with the stored key in that location in the table. If the key is there then you found your value. if the slot is empty, then add it. If the key is wrong then skip to the next free cell and repeat.
With 30 cells and 10 keys collisions should be rare but if you get one it's fast to skip to the next cell, and normal lookups are simply a modulus and a compare operation so fairly fast

Difference in performance between map and unordered_map in c++

I have a simple requirement, i need a map of type . however i need fastest theoretically possible retrieval time.
i used both map and the new proposed unordered_map from tr1
i found that at least while parsing a file and creating the map, by inserting an element at at time.
map took only 2 minutes while unordered_map took 5 mins.
As i it is going to be part of a code to be executed on Hadoop cluster and will contain ~100 million entries, i need smallest possible retrieval time.
Also another helpful information:
currently the data (keys) which is being inserted is range of integers from 1,2,... to ~10 million.
I can also impose user to specify max value and to use order as above, will that significantly effect my implementation? (i heard map is based on rb trees and inserting in increasing order leads to better performance (or worst?) )
here is the code
map<int,int> Label // this is being changed to unordered_map
fstream LabelFile("Labels.txt");
// Creating the map from the Label.txt
if (LabelFile.is_open())
{
while (! LabelFile.eof() )
{
getline (LabelFile,inputLine);
try
{
curnode=inputLine.substr(0,inputLine.find_first_of("\t"));
nodelabel=inputLine.substr(inputLine.find_first_of("\t")+1,inputLine.size()-1);
Label[atoi(curnode.c_str())]=atoi(nodelabel.c_str());
}
catch(char* strerr)
{
failed=true;
break;
}
}
LabelFile.close();
}
Tentative Solution: After review of comments and answers, i believe a Dynamic C++ array would be the best option, since the implementation will use dense keys. Thanks
Insertion for unordered_map should be O(1) and retrieval should be roughly O(1), (its essentially a hash-table).
Your timings as a result are way OFF, or there is something WRONG with your implementation or usage of unordered_map.
You need to provide some more information, and possibly how you are using the container.
As per section 6.3 of n1836 the complexities for insertion/retreival are given:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf
One issue you should consider is that your implementation may need to continually be rehashing the structure, as you say you have 100mil+ items. In that case when instantiating the container, if you have a rough idea about how many "unique" elements will be inserted into the container, you can pass that in as a parameter to the constructor and the container will be instantiated accordingly with a bucket-table of appropriate size.
The extra time loading the unordered_map is due to dynamic array resizing. The resizing schedule is to double the number of cells each when the table exceeds it's load factor. So from an empty table, expect O(lg n) copies of the entire data table. You can eliminate these extra copies by sizing the hash table upfront. Specifically
Label.reserve(expected_number_of_entries / Label.max_load_factor());
Dividing by the max_load_factor is to account for the empty cells that are necessary for the hash table to operate.
unordered_map (at least in most implementations) gives fast retrieval, but relatively poor insertion speed compared to map. A tree is generally at its best when the data is randomly ordered, and at its worst when the data is ordered (you constantly insert at one end of the tree, increasing the frequency of re-balancing).
Given that it's ~10 million total entries, you could just allocate a large enough array, and get really fast lookups -- assuming enough physical memory that it didn't cause thrashing, but that's not a huge amount of memory by modern standards.
Edit: yes, a vector is basically a dynamic array.
Edit2: The code you've added some some problems. Your while (! LabelFile.eof() ) is broken. You normally want to do something like while (LabelFile >> inputdata) instead. You're also reading the data somewhat inefficiently -- what you apparently expecting is two numbers separated by a tab. That being the case, I'd write the loop something like:
while (LabelFile >> node >> label)
Label[node] = label;

Fast container for setting bits in a sparse domain, and iterating (C++)?

I need a fast container with only two operations. Inserting keys on from a very sparse domain (all 32bit integers, and approx. 100 are set at a given time), and iterating over the inserted keys. It should deal with a lot of insertions which hit the same entries (like, 500k, but only 100 different ones).
Currently, I'm using a std::set (only insert and the iterating interface), which is decent, but still not fast enough. std::unordered_set was twice as slow, same for the Google Hash Maps. I wonder what data structure is optimized for this case?
Depending on the distribution of the input, you might be able to get some improvement without changing the structure.
If you tend to get a lot of runs of a single value, then you can probably speed up insertions by keeping a record of the last value you inserted, and don't bother doing the insertion if it matches. It costs an extra comparison per input, but saves a lookup for each element in a run beyond the first. So it could improve things no matter what data structure you're using, depending on the frequency of repeats and the relative cost of comparison vs insertion.
If you don't get runs, but you tend to find that values aren't evenly distributed, then a splay tree makes accessing the most commonly-used elements cheaper. It works by creating a deliberately-unbalanced tree with the frequent elements near the top, like a Huffman code.
I'm not sure I understand "a lot of insertions which hit the same entries". Do you mean that there are only 100 values which are ever members, but 500k mostly-duplicate operations which insert one of those 100 values?
If so, then I'd guess that the fastest container would be to generate a collision-free hash over those 100 values, then maintain an array (or vector) of flags (int or bit, according to what works out fastest on your architecture).
I leave generating the hash as an exercise for the reader, since it's something that I'm aware exists as a technique, but I've never looked into it myself. The point is to get a fast hash over as small a range as possible, such that for each n, m in your 100 values, hash(n) != hash(m).
So insertion looks like array[hash(value)] = 1;, deletion looks like array[hash(value)] = 0; (although you don't need that), and to enumerate you run over the array, and for each set value at index n, inverse_hash(n) is in your collection. For a small range you can easily maintain a lookup table to perform the inverse hash, or instead of scanning the whole array looking for set flags, you can run over the 100 potentially-in values checking each in turn.
Sorry if I've misunderstood the situation and this is useless to you. And to be honest, it's not very much faster than a regular hashtable, since realistically for 100 values you can easily size the table such that there will be few or no collisions, without using so much memory as to blow your caches.
For an in-use set expected to be this small, a non-bucketed hash table might be OK. If you can live with an occasional expansion operation, grow it in powers of 2 if it gets more than 70% full. Cuckoo hashing has been discussed on Stackoverflow before and might also be a good approach for a set this small. If you really need to optimise for speed, you can implement the hashing function and lookup in assembler - on linear data structures this will be very simple so the coding and maintenance effort for an assembler implementation shouldn't be unduly hard to maintain.
You might want to consider implementing a HashTree using a base 10 hash function at each level instead of a binary hash function. You could either make it non-bucketed, in which case your performance would be deterministic (log10) or adjust your bucket size based on your expected distribution so that you only have a couple of keys/bucket.
A randomized data structure might be perfect for your job. Take a look at the skip list – though I don't know any decend C++ implementation of it. I intended to submit one to Boost but never got around to do it.
Maybe a set with a b-tree (instead of binary tree) as internal data structure. I found this article on codeproject which implements this.
Note that while inserting into a hash table is fast, iterating over it isn't particularly fast, since you need to iterate over the entire array.
Which operation is slow for you? Do you do more insertions or more iteration?
How much memory do you have? 32-bits take "only" 4GB/8 bytes, which comes to 512MB, not much for a high-end server. That would make your insertions O(1). But that could make the iteration slow. Although skipping all words with only zeroes would optimize away most iterations. If your 100 numbers are in a relatively small range, you can optimize even further by keeping the minimum and maximum around.
I know this is just brute force, but sometimes brute force is good enough.
Since no one has explicitly mentioned it, have you thought about memory locality? A really great data structure with an algorithm for insertion that causes a page fault will do you no good. In fact a data structure with an insert that merely causes a cache miss would likely be really bad for perf.
Have you made sure a naive unordered set of elements packed in a fixed array with a simple swap to front when an insert collisides is too slow? Its a simple experiment that might show you have memory locality issues rather than algorithmic issues.