How to retrieve collisions of unordered map? - c++

I have two elements (6 and 747) that share their key ("eggs"). I want to find all the elements that share a key (let's say "eggs", but I would in real life do that for every key). How to do that?
There must be a way to get a container or something back from the data structure . . .

You're still mistaking key's value with key's hash. But to answer question as asked: you can use unordered_map's bucket() member function with bucket iterators:
std::unordered_map<int,int,dumbest_hash> m;
m[0] = 42;
m[1] = 43;
size_t bucket = m.bucket(1);
for(auto it = m.begin(bucket), e = m.end(bucket); it != e; ++it) {
cout << "bucket " << bucket << ": " << it->first << " -> " << it->second << '\n';
}
demo
In simple and mostly correct terms, unordered containers imitate their ordered counterparts in terms of interface. That means that if a map will not allow you to have duplicate keys, then neither will unordered_map.
unordered do employ hashing function to speed up the lookup, but if two keys have the same hash, they will not necessarily have the same value. To keep the behaviour similar to the ordered containers, unordered_set and unordered_map will only consider elements equal when they're actually equal (using operator== or provided comparator), not when their hashed values collide.
To put things in perspective, let's assume that "eggs" and "chicken" have the same hash value and that there's no equality checking. Then the following code would be "correct":
unordered_map<string, int> m;
m["eggs"] = 42;
m.insert(make_pair("chicken", 0)); // not inserted, key already exists
assert(m["chicken"] == 42);
But if you want allow duplicate keys in the same map, simply use unordered_multimap.

Unordered map does not have elements that share a key.
Unordered multi map does.
Use umm.equal_range(key) to get a pair of iterators describing the elements in the map that match a given key.
However, note that "collision" when talking about hashed containers usually refers to elements with the same hashed key, not the same key.
Also, consider using a unordered_map<key, std::vector<value>> instead of a multimap.

Related

Implementation of hash table using unordered_map in c++ and handling collisions

How to implement a hash map using unordered map in c++.
If the keys in unordered map corresponds to the index generated by the hashing function,
what when there are multiple values have same keys( collision). Then how do we access these values using the same keys.
eg
unordered_map <int,string> hashTable;
hashTable.insert(pair<int,string>(3,"ab"))
hashTable.insert(pair<int,string>(3,"ba"))
Now how do access "ba"?
As #amchacon pointed out, an std::unordered_map is already a hash table.
There is a difference between a key and hash(key). In an unordered_map, keys must be distinct, while hash of keys may collide. Take a closer look at the template parameters of std::unordered_map. There is a Hash and there is a KeyEqual.
If you indeed want to have multiple records with the same key, use std::unordered_multimap instead.
hashTable[3] is "ba" actually.
you can search it like.
auto it = hashTable.find(3);
if(it!=hashTable.end()){
std::cout << "item found" << it->second << "\n";
}else{
std::cout << "item does not exists in table.";
}

What can bring a std::map to not find one of its keys?

I have a std::map associating const char* keys with int values:
std::map<const char*, int> myMap;
I initialize it with three keys, then check if it can find it:
myMap["zero"] = 0;
myMap["first"] = 1;
myMap["second"] = 2;
if (myMap.at("zero") != 0)
{
std::cerr << "We have a problem here..." << std::endl;
}
And nothing is printed. From here, everything looks ok.
But later in my code, without any alteration of this map, I try to find again a key:
int value = myMap.at("zero");
But the at function throws an std::out_of_range exception, which means it cannot find the element. myMap.find("zero") thinks the same, because it returns an iterator on the end of the map.
But the creepiest part is that the key is really in the map, if just before the call to the at function, I print the content of the map like this:
for (auto it = myMap.begin(); it != myMap.end(); it++)
{
std::cout << (*it).first << std::endl;
}
The output is as expected:
zero
first
second
How is it even possible? I don't use any beta-test library or anything supposed to be unstable.
You have a map of pointers to characters, not strings. The map lookup is based on the pointer value (address) and not the value of what's pointed at. In the first case, where "zero" is found in the map, you compiler has performed some string merging and is using one array of characters for both identical strings. This is not required by the language but is a common optimization. In the second case, when the string is not found, this merging has not been done (possibly your code here is in a different source module), so the address being used in the map is different from what was inserted and is then not found.
To fix this either store std::string objects in the map, or specify a comparison in your map declaration to order based on the strings and not the addresses.
key to map is char * . So map comparison function will try to compare raw pointer values and not the c style char string equivalence check. So declare the map having std::string as the key.
if you do not want to deal with the std::string and still want the same functionality with improved time complexity, sophisticated data structure is trie. Look at some implementations like Judy Array.

Inserting elements at desired positions in a STL map

map <int, string> rollCallRegister;
map <int, string> :: iterator rollCallRegisterIter;
map <int, string> :: iterator temporaryRollCallRegisterIter;
rollCallRegisterIter = rollCallRegister.begin ();
tempRollCallRegisterIter = rollCallRegister.insert (rollCallRegisterIter, pair <int, string> (55, "swati"));
rollCallRegisterIter++;
tempRollCallRegisterIter = rollCallRegister.insert (rollCallRegisterIter, pair <int, string> (44, "shweta"));
rollCallRegisterIter++;
tempRollCallRegisterIter = rollCallRegister.insert (rollCallRegisterIter, pair <int, string> (33, "sindhu"));
// Displaying contents of this map.
cout << "\n\nrollCallRegister contains:\n";
for (rollCallRegisterIter = rollCallRegister.begin(); rollCallRegisterIter != rollCallRegister.end(); ++rollCallRegisterIter)
{
cout << (*rollCallRegisterIter).first << " => " << (*rollCallRegisterIter).second << endl;
}
Output:
rollCallRegister contains:
33 => sindhu
44 => shweta
55 => swati
I have incremented the iterator. Why is it still getting sorted? And if the position is supposed to be changed by the map on its own, then what's the purpose of providing an iterator?
Because std::map is a sorted associative container.
In a map, the key value is generally used to uniquely identify the element, while the mapped value is some sort of value associated to this key.
According to here position parameter is
the position of the first element to be compared for the insertion
operation. Notice that this does not force the new element to be in
that position within the map container (elements in a set always
follow a specific ordering), but this is actually an indication of a
possible insertion position in the container that, if set to the
element that precedes the actual location where the element is
inserted, makes for a very efficient insertion operation. iterator is
a member type, defined as a bidirectional iterator type.
So the purpose of this parameter is mainly slightly increasing the insertion speed by narrowing the range of elements.
You can use std::vector<std::pair<int,std::string>> if the order of insertion is important.
The interface is indeed slightly confusing, because it looks very much like std::vector<int>::insert (for example) and yet does not produce the same effect...
For associative containers, such as set, map and the new unordered_set and co, you completely relinquish the control over the order of the elements (as seen by iterating over the container). In exchange for this loss of control, you gain efficient look-up.
It would not make sense to suddenly give you control over the insertion, as it would let you break invariants of the container, and you would lose the efficient look-up that is the reason to use such containers in the first place.
And thus insert(It position, value_type&& value) does not insert at said position...
However this gives us some room for optimization: when inserting an element in an associative container, a look-up need to be performed to locate where to insert this element. By letting you specify a hint, you are given an opportunity to help the container speed up the process.
This can be illustrated for a simple example: suppose that you receive elements already sorted by way of some interface, it would be wasteful not to use this information!
template <typename Key, typename Value, typename InputStream>
void insert(std::map<Key, Value>& m, InputStream& s) {
typename std::map<Key, Value>::iterator it = m.begin();
for (; s; ++s) {
it = m.insert(it, *s).first;
}
}
Some of the items might not be well sorted, but it does not matter, if two consecutive items are in the right order, then we will gain, otherwise... we'll just perform as usual.
The map is always sorted, but you give a "hint" as to where the element may go as an optimisation.
The insertion is O(log N) but if you are able to successfully tell the container where it goes, it is constant time.
Thus if you are creating a large container of already-sorted values, then each value will get inserted at the end, although the tree will need rebalancing quite a few times.
As sad_man says, it's associative. If you set a value with an existing key, then you overwrite the previous value.
Now the iterators are necessary because you don't know what the keys are, usually.

How is the C++ multimap container implemented?

For example a C++ vector is implemented using a dynamic array where each element uses consecutive memory spaces.
I know that a C++ multimap is a one to many relationship but what is the internal structure?
The C++ standard does not define how the standard containers should be implemented, it only gives certain constraints like the one you say for vectors.
multimaps have certain runtime complexity (O(lg n) for the interesting operations) and other guarantees, and can be implemented as red-black trees. This is how they are implemented in the GNU standard C++ library.
Very often, a red-black tree. See e.g. STL's Red-Black Trees from Dr. Dobb's.
Addition to the "preferred" answer, because SO won't let me comment:
Given a key with values B, C, D, the behavior of iterators is a lot easier to implement if each element has it's own node. Find() is defined to return the first result in the series, and subsequent iteration takes you across the remaining elements. The de facto difference between a map and a multimap is that multimap is sorted using < over the entire value_type, where the map use < over only the key_type
Correction: the C++11 standard is explicit that new (key, mapping) pairs are inserted at the end of any existing values having the same key. This raises a question I hadn't considered: can a multimap contain two nodes in which both the key and the mapped target are the same. The standard doesn't seem to take a clear position on this, but it's noteworthy that no comparison operator is required on the mapped type. If you write a test program, you will find that a multimap can map X to 1, 2, 1. That is: "1" can appear multiple times as a target and the two instances will not be merged. For some algorithms that's a deficiency.
This article from Dr. Dobbs talks about the underlying rb-tree implementation that is commonly used. The main point to note is that the re-balance operation actually doesn't care about the keys at all, which is why you can build an rb-tree that admits duplicated keys.
The multimap just like it's simpler version i.e the std::map, is mostly built using red black trees. C++ standard itself does not specify the implementation. But in most of the cases ( I personally checked SGI STL) red black trees are used. Red Black trees are height balanced trees and hence fetch / read operation on them is always guaranteed to be O(log(n)) time. But if you are wondering on how values of the key are stored. each key->pair is saved as a separate node in the red black tree ( Even though the same key might appear multiple times just like in the case of key 'b' below). Key is used to lookup/ search the rb-tree. Once the key is found, it's value stored in the node is returned.
std::multimap<char,int> mmp;
mmp.insert(std::pair<char,int>('a',10));
mmp.insert(std::pair<char,int>('b',20));
mmp.insert(std::pair<char,int>('b',10));
mmp.insert(std::pair<char,int>('b',15));
mmp.insert(std::pair<char,int>('b',20));
mmp.insert(std::pair<char,int>('c',25));
mmp.insert(std::pair<char,int>('a',15));
mmp.insert(std::pair<char,int>('a',7));
for (std::multimap<char,int>::iterator it=mmp.begin(); it!=mmp.end(); ++it){
std::cout << (*it).first << " => " << (*it).second << " . Address of (*it).second = " << &((*it).second) << '\n';
}
Output :
a => 10 . Address of (*it).second = 0x96cca24
a => 15 . Address of (*it).second = 0x96ccae4
a => 7 . Address of (*it).second = 0x96ccb04
b => 20 . Address of (*it).second = 0x96cca44
b => 10 . Address of (*it).second = 0x96cca64
b => 15 . Address of (*it).second = 0x96cca84
b => 20 . Address of (*it).second = 0x96ccaa4
c => 25 . Address of (*it).second = 0x96ccac4
Initially I thought the values of a single key like 'b' might be stored in a std::vector .
template <class K, class V>
struct Node {
K key;
std::vector<V> values;
struct Node* left;
struct Node* right;
}
But later I realized that would violate the guaranteed fetch time of O(log(n)). Moreover, printing out the addresses of the values confirms that values with a common key are not contiguous.
They keys are inserted using operator<, so values with the same keys are stored in the order in which they are inserted.
So if we insert first
(key = 'b', value = 20)
and then
(key = 'b', value = 10)
The insertion is done using operator< , since the second 'b' is NOT lesser than the first inserted 'b', it is inserted in the 'right branch of a binary tree'.
The compiler I have used is gcc-5.1 ( C++14).

What's the difference between set<pair> and map in C++?

There are two ways in which I can easily make a key,value attribution in C++ STL: maps and sets of pairs. For instance, I might have
map<key_class,value_class>
or
set<pair<key_class,value_class> >
In terms of algorithm complexity and coding style, what are the differences between these usages?
They are semantically different. Consider:
#include <set>
#include <map>
#include <utility>
#include <iostream>
using namespace std;
int main() {
pair<int, int> p1(1, 1);
pair<int, int> p2(1, 2);
set< pair<int, int> > s;
s.insert(p1);
s.insert(p2);
map<int, int> m;
m.insert(p1);
m.insert(p2);
cout << "Set size = " << s.size() << endl;
cout << "Map size = " << m.size() << endl;
}
http://ideone.com/cZ8Vjr
Output:
Set size = 2
Map size = 1
Set elements cannot be modified while they are in the set. set's iterator and const_iterator are equivalent. Therefore, with set<pair<key_class,value_class> >, you cannot modify the value_class in-place. You must remove the old value from the set and add the new value. However, if value_class is a pointer, this doesn't prevent you from modifying the object it points to.
With map<key_class,value_class>, you can modify the value_class in-place, assuming you have a non-const reference to the map.
map<key_class,value_class> will sort on key_class and allow no duplicates of key_class.
set<pair<key_class,value_class> > will sort on key_class and then value_class if the key_class instances are equal, and will allow multiple values for key_class
The basic difference is that for the set the key is the pair, whereas for the map the key is key_class - this makes looking things up by key_class, which is what you want to do with maps, difficult for the set.
Both are typically implemented with the same data structure (normally a red-black balanced binary tree), so the complexity for the two should be the same.
std::map acts as an associative data structure. In other words, it allows you to query and modify values using its associated key.
A std::set<pair<K,V> > can be made to work like that, but you have to write extra code for the query using a key and more code to modify the value (i.e. remove the old pair and insert another with the same key and a different value). You also have to make sure there are no more than two values with the same key (you guessed it, more code).
In other words, you can try to shoe-horn a std::set to work like a std::map, but there is no reason to.
To understand algorithmic complexity, you first need to understand the implementation.
std::map is implemented using RB-tree where as hash_map are implemented using arrays of linked list. std::map provides O(log(n)) for insert/delete/search operation, hash_map is O(1) is best case and o(n) in worst case depending upon hash collisions.
Visualising that semantic difference Philipp mentioned after stepping through the code, note how map key is a const int and how p2 was not inserted into m: