c++ cannot find element is unordered_set with the same hash - c++

I have custom hash function for unordered_set of vectors< int >:
struct VectorHash {
int operator()(const vector<int> &V) const {
int hsh=V[0] + V[1];
return hash<int>()(hsh);
}};
And for two such vectors I have the same hash equal 3:
vector<int> v1{2,1};
vector<int> v2{1,2};
But when I try to insert first vector v1 in unordered_set, and then check if I have the same vector by hash as v2 in my unordered_set I get false:
std::unordered_set<std::vector<int>, VectorHash> mySet;
mySet.insert(v1);
if(mySet.find(v2) == mySet.end())
cout << "didn't find" << endl;
Output: "didn't find"
I assume that if two elements in unordered_set have the same hash then if I have v1 in my unordered_set, find method should return true, when I try to find v2. But it is not the case.
Could anyone explain me what is wrong in my reasoning?

Hash isn't everything, what you're seeing here, is a collision.
Both std::vector<int> have the same hash value here, but after hash is calculated, std::unordered_map will actually actually check for equality of elements using operator== to check for equality of elements, which fails in this case, and fails to find the element.
Collisions are a normal thing in HashMaps, not much you can do here without providing custom operator==.

I assume that if two elements in unordered_set have the same hash then if I have v1 in my unordered_set, find method should return true, when I try to find v2.
That assumption is incorrect, same hash doesn't mean objects are equal.
unordered_map uses the equality predicate to determine key equality (by default std::equal_to).

If you happen to want unique identifiers but not automatically compare values, you could use an (unordered_)map<int, vector<int>> and use that VectorHash function to generate the int key:
unordered_map<int, vector<int>> map;
int key=V[0] + V[1]
map[key] = V;

you need to provide a comparator to the unordered_set as well if you want the two elements to match, you can do something along the lines of this:
struct VectorComparator {
bool operator()(const std::vector<int> & obj1, const std::vector<int> & obj2) const
{
if ((obj1[0] + obj1[1]) == (obj2[0] + obj2[1]))
return true;
return false;
}
};
and create your unordered_set like this
std::unordered_set<std::vector<int>, VectorHash, VectorComparator> mySet;
Then you should get the result you are expecting

Related

How to remove duplicates from a vector of pair<int, Object>

This is what I am trying right now. I made a comparison function:
bool compare(const std::pair<int, Object>& left, const std::pair<int, Object>& right)
{
return (left.second.name == right.second.name) && (left.second.time == right.second.time) &&
(left.second.value == right.second.value);
}
After I add an element I call std::unique to filter duplicates:
data.push_back(std::make_pair(index, obj));
data.erase(std::unique(data.begin(), data.end(), compare), data.end());
But it seems that this doesn't work. And I don't know what the problem is.
From my understanding std::unique should use the compare predicate.
How should I update my code to make this work ?
I am using C++03.
edit:
I have tried to sort it too, but still doens't work.
bool compare2(const std::pair<int, Object>& left, const std::pair<int, Object>& right)
{
return (left.second.time< right.second.time);
}
std::sort(simulatedLatchData.begin(), simulatedLatchData.end(), compare2);
std::unique requires the range passed to it to have all the duplicate elements next to one another in order to work.
You can use std::sort on the range before you a call unique to achieve that as sorting automatically groups duplicates.
Sorting and filtering is nice, but since you never want any duplicate, why not use std::set?
And while we're at it, these pairs look suspiciously like key-values, so how about std::map?
If you want to keep only unique objects, then use an appropriate container type, such as a std::set (or std::map). For example
bool operator<(object const&, object const&);
std::set<object> data;
object obj = new_object(/*...*/);
data.insert(obj); // will only insert if unique

C++ unordered_map where key is also unordered_map

I am trying to use an unordered_map with another unordered_map as a key (custom hash function). I've also added a custom equal function, even though it's probably not needed.
The code does not do what I expect, but I can't make heads or tails of what's going on. For some reason, the equal function is not called when doing find(), which is what I'd expect.
unsigned long hashing_func(const unordered_map<char,int>& m) {
string str;
for (auto& e : m)
str += e.first;
return hash<string>()(str);
}
bool equal_func(const unordered_map<char,int>& m1, const unordered_map<char,int>& m2) {
return m1 == m2;
}
int main() {
unordered_map<
unordered_map<char,int>,
string,
function<unsigned long(const unordered_map<char,int>&)>,
function<bool(const unordered_map<char,int>&, const unordered_map<char,int>&)>
> mapResults(10, hashing_func, equal_func);
unordered_map<char,int> t1 = getMap(str1);
unordered_map<char,int> t2 = getMap(str2);
cout<<(t1 == t2)<<endl; // returns TRUE
mapResults[t1] = "asd";
cout<<(mapResults.find(t2) != mapResults.end()); // returns FALSE
return 0;
}
First of all, the equality operator is certainly required, so you should keep it.
Let's look at your unordered map's hash function:
string str;
for (auto& e : m)
str += e.first;
return hash<string>()(str);
Since it's an unordered map, by definition, the iterator can iterate over the unordered map's keys in any order. However, since the hash function must produce the same hash value for the same key, this hash function will obviously fail in that regard.
Additionally, I would also expect that the hash function will also include the values of the unorderered map key, in addition to the keys themselves. I suppose that you might want to do it this way -- for two unordered maps to be considered to be the same key as long as their keys are the same, ignoring their values. It's not clear from the question what your expectation is, but you may want to think it over.
Comparing two std::unordered_map objects using == compares whether the maps contain the same keys. It does nothing to tell whether they contain them in the same order (it's an unordered map, after all). However, your hashing_func depends on the order of items in the map: hash<string>()("ab") is in general different from hash<string>()("ba").
A good place to start is with what hashing_func returns for each map, or more easily what the string construction in hashing_func generates.
A more obviously correct hash function for such a type could be:
unsigned long hashing_func(const unordered_map<char,int>& m) {
unsigned long res = 0;
for (auto& e : m)
res ^ hash<char>()(e.first) ^ hash<int>()(e.second);
return res;
}

How can I sort a std::map first by value, then by key?

I need to sort a std::map by value, then by key. The map contains data like the following:
1 realistically
8 really
4 reason
3 reasonable
1 reasonably
1 reassemble
1 reassembled
2 recognize
92 record
48 records
7 recs
I need to get the values in order, but the kicker is that the keys need to be in alphabetical order after the values are in order. How can I do this?
std::map will sort its elements by keys. It doesn't care about the values when sorting.
You can use std::vector<std::pair<K,V>> then sort it using std::sort followed by std::stable_sort:
std::vector<std::pair<K,V>> items;
//fill items
//sort by value using std::sort
std::sort(items.begin(), items.end(), value_comparer);
//sort by key using std::stable_sort
std::stable_sort(items.begin(), items.end(), key_comparer);
The first sort should use std::sort since it is nlog(n), and then use std::stable_sort which is n(log(n))^2 in the worst case.
Note that while std::sort is chosen for performance reason, std::stable_sort is needed for correct ordering, as you want the order-by-value to be preserved.
#gsf noted in the comment, you could use only std::sort if you choose a comparer which compares values first, and IF they're equal, sort the keys.
auto cmp = [](std::pair<K,V> const & a, std::pair<K,V> const & b)
{
return a.second != b.second? a.second < b.second : a.first < b.first;
};
std::sort(items.begin(), items.end(), cmp);
That should be efficient.
But wait, there is a better approach: store std::pair<V,K> instead of std::pair<K,V> and then you don't need any comparer at all — the standard comparer for std::pair would be enough, as it compares first (which is V) first then second which is K:
std::vector<std::pair<V,K>> items;
//...
std::sort(items.begin(), items.end());
That should work great.
You can use std::set instead of std::map.
You can store both key and value in std::pair and the type of container will look like this:
std::set< std::pair<int, std::string> > items;
std::set will sort it's values both by original keys and values that were stored in std::map.
As explained in Nawaz's answer, you cannot sort your map by itself as you need it, because std::map sorts its elements based on the keys only. So, you need a different container, but if you have to stick to your map, then you can still copy its content (temporarily) into another data structure.
I think, the best solution is to use a std::set storing flipped key-value pairs as presented in ks1322's answer.
The std::set is sorted by default and the order of the pairs is exactly as you need it:
3) If lhs.first<rhs.first, returns true. Otherwise, if rhs.first<lhs.first, returns false. Otherwise, if lhs.second<rhs.second, returns true. Otherwise, returns false.
This way you don't need an additional sorting step and the resulting code is quite short:
std::map<std::string, int> m; // Your original map.
m["realistically"] = 1;
m["really"] = 8;
m["reason"] = 4;
m["reasonable"] = 3;
m["reasonably"] = 1;
m["reassemble"] = 1;
m["reassembled"] = 1;
m["recognize"] = 2;
m["record"] = 92;
m["records"] = 48;
m["recs"] = 7;
std::set<std::pair<int, std::string>> s; // The new (temporary) container.
for (auto const &kv : m)
s.emplace(kv.second, kv.first); // Flip the pairs.
for (auto const &vk : s)
std::cout << std::setw(3) << vk.first << std::setw(15) << vk.second << std::endl;
Output:
1 realistically
1 reasonably
1 reassemble
1 reassembled
2 recognize
3 reasonable
4 reason
7 recs
8 really
48 records
92 record
Code on Ideone
Note: Since C++17 you can use range-based for loops together with structured bindings for iterating over a map.
As a result, the code for copying your map becomes even shorter and more readable:
for (auto const &[k, v] : m)
s.emplace(v, k); // Flip the pairs.
std::map already sorts the values using a predicate you define or std::less if you don't provide one. std::set will also store items in order of the of a define comparator. However neither set nor map allow you to have multiple keys. I would suggest defining a std::map<int,std::set<string> if you want to accomplish this using your data structure alone. You should also realize that std::less for string will sort lexicographically not alphabetically.
EDIT: The other two answers make a good point. I'm assuming that you want to order them into some other structure, or in order to print them out.
"Best" can mean a number of different things. Do you mean "easiest," "fastest," "most efficient," "least code," "most readable?"
The most obvious approach is to loop through twice. On the first pass, order the values:
if(current_value > examined_value)
{
current_value = examined_value
(and then swap them, however you like)
}
Then on the second pass, alphabetize the words, but only if their values match.
if(current_value == examined_value)
{
(alphabetize the two)
}
Strictly speaking, this is a "bubble sort" which is slow because every time you make a swap, you have to start over. One "pass" is finished when you get through the whole list without making any swaps.
There are other sorting algorithms, but the principle would be the same: order by value, then alphabetize.

Equal keys function in boost::unordered_multimap: Is the query key guaranteed to be the first arg?

I have a boost::unordered_multimap< std::vector<int>, float>. The keys that I use to query the multimap may contain additional 0 ints that I'd like to ignore (but the keys inserted in the map never contain 0).
Example:
int main() {
typedef std::vector<int> Vec;
typedef boost::unordered_multimap<Vec, float, MyHash, MyEqualKeys> Map;
Map map;
Vec vec1;
vec1.push_back(2);
vec1.push_back(6);
map.insert(Map::value_type(vec1, 4.3));
map.insert(Map::value_type(vec1, 6.8));
Vec queryVec;
queryVec.push_back(2);
queryVec.push_back(0); // additional 0, to be ignored
queryVec.push_back(6);
for (std::pair<Map::iterator, Map::iterator> iter = map.equal_range(queryVec);
iter.first != iter.second; ++iter.first) {
std::cout << iter.first->second << std::endl; // 4.3 and 6.8
}
}
I wrote a hash function MyHash that ignores the 0 in the key to be hashed.
My question is: When I write MyEqualKeys, is it guaranteed that a query key (which in my case may have additional 0) is always the first argument?
So, when I write this functor:
struct MyEqualKeys {
bool operator()(Vec const& x, Vec const& y) const {...}
};
Can only the x argument contain the additional 0?
I'd like to know because above I slightly simplified, and in reality I may have to check for more than just a 0 and it may be slightly costly to have to check the y argument too (for millions of queries).
No, there is no such guarantee or requirement (on any of the unordered associative containers). I'd suggest adding a 'sanitised' flag to your key objects.

predicate for a map from string to int

I have this small program that reads a line of input & prints the words in it, with their respective number of occurrences. I want to sort the elements in the map that stores these values according to their occurrences. I mean, the words that only appear once, will be ordered to be at the beginning, then the words that appeared twice 7 so on. I know that the predicate should return a bool value, but I don't know what the parameters should be. Should it be two iterators to the map? If some one could explain this, it would be greatly appreciated. Thank you in advance.
#include<iostream>
#include<map>
using std::cout;
using std::cin;
using std::endl;
using std::string;
using std::map;
int main()
{
string s;
map<string,int> counters; //store each word & an associated counter
//read the input, keeping track of each word & how often we see it
while(cin>>s)
{
++counters[s];
}
//write the words & associated counts
for(map<string,int>::const_iterator iter = counters.begin();iter != counters.end();iter++)
{
cout<<iter->first<<"\t"<<iter->second<<endl;
}
return 0;
}
std::map is always sorted according to its key. You cannot sort the elements by their value.
You need to copy the contents to another data structure (for example std::vector<std::pair<string, int> >) which can be sorted.
Here is a predicate that can be used to sort such a vector. Note that sorting algorithms in C++ standard library need a "less than" predicate which basically says "is a smaller than b".
bool cmp(std::pair<string, int> const &a, std::pair<string, int> const &b) {
return a.second < b.second;
}
You can't resort a map, it's order is predefined (by default, from std::less on the key type). The easiest solution for your problem would be to create a std::multimap<int, string> and insert your values there, then just loop over the multimap, which will be ordered on the key type (int, the number of occurences), which will give you the order that you want, without having to define a predicate.
You are not going to be able to do this with one pass with an std::map. It can only be sorted on one thing at a time, and you cannot change the key in-place. What I would recommend is to use the code you have now to maintain the counters map, then use std::max_element with a comparison function that compares the second field of each std::pair<string, int> in the map.
A map has its keys sorted, not its values. That's what makes the map efficent. You cannot sort it by occurrences without using another data structure (maybe a reversed index!)
As stated, it simply won't work -- a map always remains sorted by its key value, which would be the strings.
As others have noted, you can copy the data to some other structure, and sort by the value. Another possibility would be to use a Boost bimap instead. I've posted a demo of the basic idea previously.
You probably want to transform map<string,int> to vector<pair<const string, int> > then sort the vector on the int member.
You could do
struct PairLessSecond
{
template< typename P >
bool operator()( const P& pairLeft, const P& pairRight ) const
{
return pairLeft.second < pairRight.second;
}
};
You can probably also construct all this somehow using a lambda with a bind.
Now
std::vector< std::map<std::string,int>::value_type > byCount;
std::sort( byCount.begin(), byCount.end(), PairLessSecond() );