How does C++ STL unordered_map resolve collisions?
Looking at the, it says "Unique keys
No two elements in the container can have equivalent keys."
That should mean that the container is indeed resolving collisions. However, that page does not tell me how it is doing it. I know some ways to resolve collisions like using linked lists and/or probing. What I want to know is how the c++ STL unordered_map is resolving it.

The standard defines a little more about this than most people seem to realize.
Specifically, the standard requires (ยง23.2.5/9):
The elements of an unordered associative container are organized into buckets. Keys with the same hash code appear in the same bucket.
The interface includes a bucket_count that runs in constant time. (table 103). It also includes a bucket_size that has to run in time linear on the size of the bucket.
That's basically describing an implementation that uses collision chaining. When you do use collision chaining, meeting all the requirements is somewhere between easy and trivial. bucket_count() is the number of elements in your array. bucket_size() is the number of elements in the collision chain. Getting them in constant and linear time respectively is simple and straightforward.
By contrast, if you use something like linear probing or double hashing, those requirements become all but impossible to meet. Specifically, all the items that hashed to a specific value need to land in the same bucket, and you need to be able to count those buckets in constant time.
But, if you use something like linear probing or double hashing, finding all the items that hashed to the same value means you need to hash the value, then walk through the "chain" of non-empty items in your table to find how many of those hashed to the same value. That's not linear on the number of items that hashed to the same value though--it's linear on the number of items that hashed to the same or a colliding value.
With enough extra work and a fair amount of stretching the meaning of some of the requirements almost to the breaking point, it might be barely possible to create a hash table using something other than collision chaining, and still at least sort of meet the requirements--but I'm not really certain it's possible, and it would certain involve quite a lot of extra work.
Summary: all practical implementations of std::unordered_set (or unordered_map) undoubtedly use collision chaining. While it might be (just barely) possible to meet the requirements using linear probing or double hashing, such an implementation seems to lose a great deal and gain nearly nothing in return.

I found this answer looking for how to detect when my types are colliding, so I will post this in case that is the intent of the question.:
I believe there's some misconception about "Unique keys No two elements in the container can have equivalent keys."
look at the code below
std::unordered_map<int, char> hashmap;
hashmap[5] = 'a';
hashmap[5] = 'b'; //replace 'a' with 'b', there is no collision being handled.
I think the Jerry's answer is referring to the internal system that it uses to shrink keys to appropriate array indices.
If you want collisions to be handled for your types (with buckets), you need std::unordered_multimap and will have to iterate over
Hopefully this code can be read without the context I generated it with.
it basically checks to see if any element in the bucket associated with the hash is the element I'm looking for.
//sp is std::shared_ptr
//memo is std::unordered_multimap< int, sp<AStarNode> >
//there's probably multiple issues with this code in terms of good design (like using int keys rather than unsigned)
bool AStar_Incremental::hasNodeBeenVisited(sp<AStarNode> node)
using UMIter = std::unordered_multimap<int, sp<AStarNode> >::iterator;
bool bAlreadyVisited = false;
//get all values for key in O(1*)
int hash = WorldGrid::hashGrid(node->location);
std::pair<UMIter, UMIter> start_end = memo.equal_range(hash); //bucket range
UMIter start = start_end.first;
UMIter end = start_end.second;
//hopefully this is implemented to be O(m) where m is the bucket size.
for(UMIter bucketIter = start; bucketIter != end; ++bucketIter)
sp<AStarNode> previousNode = bucketIter->second;
sf::Vector2i& previousVisit = previousNode->location;
if (previousVisit == node->location)
bAlreadyVisited = true;
return bAlreadyVisited;


Unordered map of unordered set in C++ 11

I wanted to implement something, that maps an unordered set of integers to an integer value. Some kind of C++ equivalent of Python dict, which has sets as keys and ints as values.
So far I used std::map<std::set<int>, int> set_lookup; but from what I understood this is unnecessarily slow as it uses trees. I don't care about the ordering, only speed is important.
From what I have understand, the desired structure is std::unordered_map<std::unordered_set<int>, int, hash> set_lookup; which needs a hash function to work.
Is this the right approach? And how would a minimum running example look like? I couldn't find how the hash part should look like.
It isn't clear whether you ask about the syntax for defining a hash function, or about how to define a mathematically good hash for a set of ints.
Anyway - in case it is the former, here is how you should technically define a hash function for your case:
template <>
struct hash<std::unordered_set<int>>
std::size_t operator()(const std::unordered_set<int>& k) const
using std::size_t;
using std::hash;
using std::string;
// ...
// Here you should create and return a meaning full hash value:
return 5;
void main()
std::unordered_map<std::unordered_set<int>, int> m;
Having written that, I join the other comments about whether it is a good direction to solve your problem.
You haven't described your problem, so I cannot answer that.
I understood [std::map<std::set<int>, int> set_lookup;] is unnecessarily slow as it uses trees.
Is [std::unordered_map<std::unordered_set<int>, int, hash>] the right approach?
It depends. If your keys are created then not changed, and you want to be able to do a lot of lookups very fast, then a hash-table based approach would indeed be good, but you'll need two things for that:
to be able to hash keys
to be able to compare keys
To hash keys, deciding on a good hash function is a bit of an art form. A rarely bad - but sometimes slower than necessary - approach is to use boost hash_combine (which is short enough that you can copy it into your code - see here for the implementation). If your integer values are already quite random across most of their bits, though, simply XORing them together would produce a great hash. If you're not sure, use hash_combine or a better hash (e.g. MURMUR32). The time taken to hash will depend on the time to traverse, and traversing an unordered_set typically involves a linked list traversal (which typically jumps around in memory pages and is CPU cache unfriendly). The best way to store the values for fast traversal is in contiguous memory - i.e. a std::vector<>, or std::array<> if the size is known at compile time.
The other thing you need to do is compare keys for equality: that also works fastest when elements in the key are contiguous in memory, and consistently ordered. Again, a sorted std::vector<> or std::array<> would be best.
That said, if the sets for your keys are large, and you can compromise on a statistical guarantee of key equality, you could use e.g. a 256-bit hash and code as if hash collisions always correspond to key equality. That's often not an acceptable risk, but if your hash is not collision prone and you have e.g. a 256 bit hash, a CPU could run flat-chat for millennia hashing distinct keys and still be unlikely to produce the same hash even once, so it is a use I've seen even financial firms use in their core in-house database products, as it can save so much time.
If you're tempted by that compromise, you'd want std::unordered_map<HashValue256, std::pair<int, std::vector<int>>>. To find the int associated with a set of integers, you'd hash them first, then do a lookup. It's easy to write a hash function that produces the same output for a set or sorted vector<> or array<>, as you can present the elements to something like hash_combine in the same sorted order during traversal (i.e. just size_t seed = 0; for (auto& element : any_sorted_container) hash_combine(seed, element);). Storing the vector<int> means you can traverse the unordered_map later if you want to find all the key "sets" - if you don't need to do that (e.g. you're only ever looking up the ints by keys known to the code at the time, and you're comfortable with the statistical improbability of a good hash colliding, you don't even need to store the keys/vectors): std::unordered_map<HashValue256, int>.

Array with generic index type

Is there any data structure in C++11/STL/Boost which represents an array with a generic index type or do I have to implement such a type on my own?
I.e. I would like to do something like this:
std::set<std::string>> to_lookup, to_lookup2;
int i = 10, j = 13;
// initialization of to_lookup
// count is of the container type/data structure I am looking for
count[to_lookup] = i;
count[to_lookup2] = j;
I know the std::map and std::unordered_map containers from the STL but those do not match my requirements. It is critical for me that insert and look up can be done in O(1).
This is basically almost impossible to become significantly faster if you're using std::unordered_map already simply due to the fact that there's always some overhead based on the number of elements (so you can't get a perfect O(1) (unless you're able to reference all possible keys as indexes in an array)).
However, if you still think that a std::unordered_map is too slow simply due to the sheer amount of entries, try adding another layer reducing the number of elements in a map.
In your example, using std::string as keys(?), you could just use the very first character (untested but should work):
std::vector<std::unordered_map<const std::string, myWhateverType> > container(256);
// To access an element, this just adds one more layer:
container[key[0]][key] = value;
Iterating over all elements becomes a bit more complicated though. However, this essentially reduces the number of elements in your std::unordered_map to 1/255 (depending on the actual distribution of key values of course; if all keys start with something such as key, then you won't gain anything other than a small overhead).
Will it improve performance? This really depends on the number of entries and your keys.

64bit array operation by C/C++

I have an efficiency critical application, where I need such an array-type data structure A. Its keys are 0, 1, 2,..., and its values are uint64_t distinct values. I need two constant operations:
1. Given i, return A[i];
2. Given val, return i such that A[i] == val
I prefer not to use hash table. Because I tried GLib GHashTable, it took around 20 mins to load 60 million values into the hash table (If I remove the insertion statement, it took only around 6 seconds). The time is not acceptable for my application. Or maybe somebody recommend other hash table libraries? I tried uthash.c, it crashed immediately.
I also tried SDArray, but it seems not the right one.
Does anybody know any data structure that would fulfill my requirements? Or any efficient hash table implementations? I prefer using C/C++.
In general, you need two hash tables for this task. As you know, hash tables give you a key look-up in expected constant time. Searching for a value requires iterating through the whole data structure, since information about the values isn't encoded in the hash look-up table.
Use two hash tables: One for key-value and one (reversed) for value-key look-up. In your particular case, the forward search can be done using a vector as long as your keys are "sequential". But this doesn't change the requirement for a data structure enabling fast reverse look-up.
Regarding the hash table implementation: In C++11, you have the new standard container std::unordererd_map available.
An implementation might look like this (of course this is tweakable, like introducing const-correctness, calling by reference etc.):
std::unordered_map<K,T> kvMap; // hash table for forward search
std::unordered_map<T,K> vkMap; // hash table for backward search
void insert(std::pair<K,T> item) {
vkMap.insert(std::make_pair(item.second, item.first));
// expected O(1)
T valueForKey(K key) {
return kvMap[key];
// expected O(1)
K keyForValue(T value) {
return vkMap[value];
A clean C++11 implementation should "wrap" around the key-value hash map, so you have the "standard" interface in your wrapper class. Always keep the reverse map in sync with your forward map.
Regarding the creation performance: In most implementations, there is a way to tell the data structure how much elements are going to be inserted, called "reserve". For hash tables, this is a huge performance benefit, as dynamically resizing the data structure (which happens during insertions every now and then) completely re-structures the whole hash table, as it changes the hash function itself.
I would go for two vectors (assuming that your values are really distinct), as this is O(1) in access where map is O(log n) in access
vector<uint64_t> values;
vector<size_t> keys
values.reserve(maxSize); // do memory reservation first, so reallocation doesn't occur during reading of data
keys.reserve(maxSize); // do memory reservation first, so reallocation doesn't occur during reading of data
Then, when reading in data
values[keyRead] = data;
keys[valueRead] = key;
Reading information is then the same
data = values[currentKey];
key = keys[currentData];

C++ some questions on boost::unordered_map & boost::hash

I've only recently started dwelling into boost and it's containers, and I read a few articles on the web and on stackoverflow that a boost::unordered_map is the fastest performing container for big collections.
So, I have this class State, which must be unique in the container (no duplicates) and there will be millions if not billions of states in the container.
Therefore I have been trying to optimize it for small size and as few computations as possible. I was using a boost::ptr_vector before, but as I read on stackoverflow a vector is only good as long as there are not that many objects in it.
In my case, the State descibes sensorimotor information from a robot, so there can be an enormous amount of states, and therefore fast lookup is of topemost priority.
Following the boost documentation for unordered_map I realize that there are two things I could do to speed things up: use a hash_function, and use an equality operator to compare States based on their hash_function.
So, I implemented a private hash() function which takes in State information and using boost::hash_combine, creates an std::size_t hash value.
The operator== compares basically the state's hash values.
is std::size_t enough to cover billions of possible hash_function
combinations ? In order to avoid duplicate states I intend to use
their hash_values.
When creating a state_map, should I use as key the State* or the hash
value ?
i.e: boost::unordered_map<State*,std::size_t> state_map;
boost::unordered_map<std::size_t,State*> state_map;
Are the lookup times with a boost::unordered_map::iterator =
state_map.find() faster than going through a boost::ptr_vector and
comparing each iterator's key value ?
Finally, any tips or tricks on how to optimize such an unordered map
for speed and fast lookups would be greatly appreciated.
EDIT: I have seen quite a few answers, one being not to use boost but C++0X, another not to use an unordered_set, but to be honest, I still want to see how boost::unordered_set is used with a hash function.
I have followed boost's documentation and implemented, but I still cannot figure out how to use the hash function of boost with the ordered set.
This is a bit muddled.
What you say are not "things that you can do to speed things up"; rather, they are mandatory requirements of your type to be eligible as the element type of an unordered map, and also for an unordered set (which you might rather want).
You need to provide an equality operator that compares objects, not hash values. The whole point of the equality is to distinguish elements with the same hash.
size_t is an unsigned integral type, 32 bits on x86 and 64 bits on x64. Since you want "billions of elements", which means many gigabytes of data, I assume you have a solid x64 machine anyway.
What's crucial is that your hash function is good, i.e. has few collisions.
You want a set, not a map. Put the objects directly in the set: std::unordered_set<State>. Use a map if you are mapping to something, i.e. states to something else. Oh, use C++0x, not boost, if you can.
Using hash_combine is good.
Baby example:
struct State
inline bool operator==(const State &) const;
/* Stuff */
namespace std
template <> struct hash<State>
inline std::size_t operator()(const State & s) const
/* your hash algorithm here */
std::size_t Foo(const State & s) { /* some code */ }
int main()
std::unordered_set<State> states; // no extra data needed
std::unordered_set<State, Foo> states; // another hash function
An unordered_map is a hashtable. You don't store the hash; it is done internally as the storage and lookup method.
Given your requirements, an unordered_set might be more appropriate, since your object is the only item to store.
You are a little confused though -- the equality operator and hash function are not truly performance items, but required for nontrivial objects for the container to work correctly. A good hash function will distribute your nodes evenly across the buckets, and the equality operator will be used to remove any ambiguity about matches based on the hash function.
std::size_t is fine for the hash function. Remember that no hash is perfect; there will be collisions, and these collision items are stored in a linked list at that bucket position.
Thus, .find() will be O(1) in the optimal case and very close to O(1) in the average case (and O(N) in the worst case, but a decent hash function will avoid that.)
You don't mention your platform or architecture; at billions of entries you still might have to worry about out-of-memory situations depending on those and the size of your State object.
forget about hash; there is nothing (at least from your question) that suggests you have a meaningful key;
lets take a step back and rephrase your actual performance goals:
you want to quickly validate no duplicates ever exist for any of your State objects
comment if i need to add others.
From the aforementioned goal, and from your comment i would suggest you use actually a ordered_set rather than an unordered_map. Yes, the ordered search uses binary search O(log (n)) while unordered uses lookup O(1).
However, the difference is that with this approach you need the ordered_set ONLY to check that a similar state doesn't exist already when you are about to create a new one, that is, at State creation-time.
In all the other lookups, you actually don't need to look into the ordered_set! because you already have the key; State*, and the key can access the value by the magic dereference operator: *key
so with this approach, you only are using the ordered_set as an index to verify States on creation time only. In all the other cases, you access your State with the dereference operator of your pointer-value key.
if all the above wasn't enough to convince you, here is the final nail in the coffin of the idea of using a hash to quickly determine equality; hash function has a small probability of collision, but as the number of states will grow, that probability will become complete certainty. So depending on your fault-tolerance, you are going to deal with state collisions (and from your question and the number of States you are expecting to deal, it seems you will deal with a lot of them)
For this to work, you obviously need the compare predicate to test for all the internal properties of your state (giroscope, thrusters, accelerometers, proton rays, etc.)

Why is std::tr1::unordered_map slower than a homegrown hash map?

I wrote a basic program that takes strings and counts the incidences of unique ones by inserting them into a string->integer hash map.
I use std::tr1::unordered_map for the storage, templated for a custom hash function and a custom equality function. The key type is actually char* rather than the too-slow std::string.
I then changed the same code to use a very, very simple hash table (really an array of {key, value} structures indexed by hash) with a power-of-two size and linear probing for collisions. The program got 33% faster.
Given that when I was using tr1::unordered_map I presized the hash table so it never had to grow, and that I was using exactly the same hash and comparison routines, what is tr1::unordered_map doing that slows it down by 50% as compared to the most basic hash map imaginable?
Code for the hash map type I'm talking about as "simple" here:
typedef struct dataitem {
char* item;
size_t count;
} dataitem_t;
dataitem_t hashtable[HASHTABLE_SIZE] = {{NULL,0}}; // Start off with empty table
void insert(char* item) {
size_t hash = generate_hash(item);
size_t firsthash = hash;
while (true) {
hash &= HASHTABLE_SIZE_MASK; // Bitmasking effect is hash %= HASHTABLE_SIZE
if (hashtable[hash].item == NULL) { // Free bucket
hashtable[hash].item = item;
hashtable[hash].count = 1;
if (strcmp(hashtable[hash].item, item) == 0) { // Not hash collision; same item
hashtable[hash].count += 1;
hash++; // Hash collision. Move to next bucket (linear probing)
if (hash == firsthash) {
// Table is full. This does not happen because the presizing is correct.
I wish to extend #AProgrammer answer.
Your hash map is simple because it is custom tailored to your need. On the other hand std::tr1::unordered_map has to fulfill a number of different tasks, and do well in all case. This require a mean-performance approach in all cases, so it'll never be excellent in any particular area.
Hash containers are very special in that there are many ways to implement them, you chose Open-Addressing, while the standard forces a bucket approach on the implementors. Both have different trade-offs, and this is one reason why the standard, this time, actually enforced a particular implementation: so that performance do not change dramatically when switching from one library to another. Simply specifying Big-O complexity / amortized complexity would not have been enough here.
You say that you instructed the unordered_map as to the number of finals elements, but did you change the load factor ? Chaining is notoriously "bad" (because of the lack of memory locality) in case of collisions, and using a smaller load factor would favor spreading out your elements.
Finally, to point out one difference: what happens when you resize your hash map ? By using chaining, the unordered_map does not move the elements in memory:
references to them are still valid (even though the iterators may be invalidated)
in case of big or complex objects, there is no invocation of copy constructors
This is in contrast with your simple implementation, which would incur O(N) copies (unless you use linear rehashing to spread out the work, but this is definitely not simple).
It seems, therefore, that the choice for unordered_map was to smooth the spikes, at the cost of a slower average insert.
There is something you can do though: provide a custom allocator. By writing a specific allocator for your usecase, and allocate all its memory in one go (since you know how many objects will be inserted, and can have the allocator report how much memory is a node). Then allocate the nodes in a stack-like fashion (simple pointer increase). It should improve (somewhat) the performance.
Your "homegrown hash map" is not a hash map at all, it's an intrusive hash set.
And that's the reason it's faster. Simple as that.
Well, actually intrusive hash set isn't exact either, but it's the closest match.
In general comparing speed of components not build to the same spec isn't fair.
Without knowing exactly what you have measured -- which mix of operations on which load factor with which mix of present/absent data --, it is difficult to explain where the difference come from.
The TR1 of g++ solve collision by chaining. This implies dynamic allocation. But this also gives better performance at high load level.
Your "homegrown" hash map is faster1 than std::tr1::unordered_map because, as you yourself said, your homegrown hash map is "simple" and it doesn't handle checking if the hash table is full. And possibly many things that you're not checking before operating on it. That may be the reason why your hash map is faster than std::tr1::unordered_map.
Also, the performance of std::tr1::unordered_map is defined by the implementation, so different implementation would perform differently speed-wise. You can see its implementation and compare it with yours, as that is the first thing you can do, and I believe, that will also answer your question to some extent.
1. I just assumed your claim to be correct, and based on it I said the above thing.