How to detect duplicates in a vector of unordered_map?

How to detect duplicates in a vector of unordered_map? - c++

Given a vector of unordered_map<u_int,int>,
I would like to check if the vector contains any duplicated values. Two unordered_maps are considered duplicated if all of their keys and their corresponding values are equal.
I know the comparison operator exists for unordered_maps, but I would like to avoid the pairwise comparison of each element with each other. One classical solution is to insert the values of the vector into a set, then to compare the number of elements in the set and the vector.
However, the problem here is that the object to be inserted into the set must have the comparison operators overloaded. In case of the unordered_set, the hash function to be used must be overloaded for the complex object. In order to overload, I need to derive a class from the std::unordered_map. Then I need to overload either the comparison operator or the hash function. Another solution that I could think of is to concatenate all of the key value pairs into a string, then sort the string by the keys and detect the duplicates on those strings. I wonder what would be the best solution for this problem.
Example data:
using namespace std;
typedef unordered_map<u_int,int> int_map;
int_map a = { {1,1}, {2,4}, {3,5} };
int_map b = { {1,1}, {2,-1}, {4,-2} };
int_map c = { {1,1}, {3,5} };
vector<unordered_map<u_int,int>> my_vec;
my_vec.push_back(a);
my_vec.push_back(b);
my_vec.push_back(c);
The contents of my_vec is:
{ { 1 => 1, 2 => 4, 3 => 5 },
{ 1 => 1, 2 => -1, 4 => -2 },
{ 1 => 1, 3 => 5 } }
Please feel free to ask/commend/edit if the question is not clear enough.
Any help would be appreciated. Thank you in advance!

you can something similar to the following :
typedef unordered_map<u_int,int> int_map;
struct my_map_comparator
{
bool operator()(const int_map& a, const int_map& b)
{
a_hash = compute_hash_for_a(all keys of a)
b_hash = compute_hash_for_b(all keys of b)
return a_hash == b_hash;
}
};
std::unordered_set<int_map,std::hash<int_map>, my_map_comparator> map_list();

If you can get a good hash function for std::unordered_map then you should do it like this probably:
bool has_distinct_values(const std::vector<std::unordered_map<u_int, int>> v)
{
std::unordered_map<int, std::list<int>> hash_to_indexes_map;
for(auto i = 0u; i < v.size(); ++i)
{
auto empl_result = hash_to_index_map.emplace(custom_hash(v[i]), {i});
if (!empl_result.second)
{
for (auto index : empl_result.first->second)
{
if (v[index] == v[i]) return false;
}
empl_result.first->second.push_back(i);
}
}
return true;
}
The algorithm is straightforward: map hashes to list indexes, doing pairwise map comparison whenever hashes are equal.
This way you avoid copying the entire maps, get O(N) (depending mostly on the quality of the hash function you provide) time complexity and generally are good to go.

Related

Spaceship operator on arrays

The following code is intended to implement comparison on an object that contains an array. Two objects should compare as <,==,> if all array elements compare like that. The following does not compile for a variety of reason:
#include <compare>
class witharray {
private:
array<int,4> the_array;
public:
witharray( array<int,4> v )
: the_array(v) {};
int size() { return the_array.size(); };
auto operator<=>( const witharray& other ) const {
array< std::strong_ordering,4 > cmps;
for (int id=0; id<4; id++)
cmps[id] = the_array[id]<=>other.the_array[id];
return accumulate
(cmps.begin(),cmps.end(),
std::equal,
[] (auto x,auto y) -> std::strong_ordering { return x and y; }
);
};
};
First of all, the array of comparisons:
call to implicitly-deleted default constructor of 'array<std::strong_ordering, 4>
Then the attempt to accumulate the comparisons:
no matching function for call to 'accumulate'
Compiler explorer: https://godbolt.org/z/E3ovh5qGa
Or am I completely on the wrong track?

Two objects should compare as <,==,> if all array elements compare like that.
This is a fairly interesting order. One thing to note here is that it's a partial order. That is, given {1, 2} vs {2, 1}, those elements aren't all < or == or >. So you're left with unordered.
C++20's comparisons do have a way to represent that: you have to return a std::partial_ordering.
The way that we can achieve this ordering is that we first compare the first elements, and then we ensure that all the other elements compare the same. If any pair of elements doesn't compare the same, then we know we're unordered:
auto operator<=>( const witharray& other ) const
-> std::partial_ordering
{
std::strong_ordering c = the_array[0] <=> other.the_array[0];
for (int i = 1; i < 4; ++i) {
if ((the_array[i] <=> other.the_array[i]) != c) {
return std::partial_ordering::unordered;
}
}
return c;
}
This has the benefit of not having to compare every pair of elements, since we might already know the answer by the time we get to the 2nd element (e.g. {1, 2, x, x} vs {1, 3, x, x} is already unordered, doesn't matter what the other elements are).
This seems like what you were trying to accomplish with your accumulate, except accumulate is the wrong algorithm here since we want to stop early. You'd want all_of in this case:
auto comparisons = views::iota(0, 4)
| views::transform([&](int i){
return the_array[i] <=> other.the_array[i];
});
bool all_match = ranges::all_of(comparisons | drop(1), [&](std::strong_ordering c){
return c == comparisons[0];
});
return all_match ? comparisons[0] : std::partial_ordering::unordered;
Which is admittedly awkward. In C++23, we can do the comparisons part more directly:
auto comparisons = views::zip_transform(
compare_three_way{}, the_array, other.the_array);
And then it would read better if you had a predicate like:
bool all_match = ranges::all_of(comparisons | drop(1), equals(comparisons[0]));
or wrote your own algorithm for this specific use-case (which is a pretty easy algorithm to write):
return all_same_value(comparisons)
? comparisons[0]
: std::partial_ordering::unordered;

Note that std::array already has spaceship operator which apparently does what you need:
class witharray {
private:
array<int, 4> the_array;
public:
witharray(array<int, 4> v)
: the_array(v) {};
int size() { return the_array.size(); };
auto operator<=>(const witharray& other) const
{
return the_array <=> other.the_array;
};
};
https://godbolt.org/z/4drddWa8G
Now to cover problems with your code:
array< std::strong_ordering, 4 > cmps; can't be initialized since there is no default value for std::strong_ordering
use of std::accumluate here is strange there is better algorithm for that: std::lexicographical_compare_three_way which was added to handle spaceship operator
You have feed std::equal to std::accumluate as binary operation when in fact this is algorithm to compare ranges (it accepts iterators). Most probably your plan here was to use std::equal_to.

Proper way to use large amount of known constant variables

The program receives a vector that represents a character.
It then compares the received vector with all the known vectors that represents characters.
I'm not sure how should I use the known vectors.
A few options I thought of:
1) Using global variables:
vector<int> charA{1,2,3,4,5};
vector<int> charB{5,3,7,1};
...
vector<int> charZ{3,2,5,6,8,9,0}
char getLetter(const vector<int> &input){
if(compareVec(input,charA) return 'A';
if(compareVec(input,charB) return 'B';
....
if(compareVec(input,charZ) return 'Z';
}
2) Declaring all variables in function:
char getLetter(const vector<int> &input){
vector<int> charA{1,2,3,4,5};
vector<int> charB{5,3,7,1};
...
vector<int> charZ{3,2,5,6,8,9,0}
if(compareVec(input,charA) return 'A';
if(compareVec(input,charB) return 'B';
....
if(compareVec(input,charZ) return 'Z';
}
3) Passing the variables
char getLetter(const vector<int> &input, vector<int> charA,
vector<int> charB... , vecotr<int> charZ){
if(compareVec(input,charA) return 'A';
if(compareVec(input,charB) return 'B';
....
if(compareVec(input,charZ) return 'Z';
}

This sounds like an application for a perfect hash generator (link to GNU gperf).
To quote the documentation
gperf is a perfect hash function generator written in C++. It
transforms an n element user-specified keyword set W into a perfect
hash function F. F uniquely maps keywords in W onto the range 0..k,
where k >= n-1. If k = n-1 then F is a minimal perfect hash function.
gperf generates a 0..k element static lookup table and a pair of C
functions. These functions determine whether a given character string
s occurs in W, using at most one probe into the lookup table.
If this is not a suitable solution then I'd recommend using function statics. You want to avoid function locals as this will badly affect performance, and globals will pollute your namespace.
So something like
char getLetter(const vector<int> &input){
static vector<int> charA{1,2,3,4,5};
static vector<int> charB{5,3,7,1};

Giving you snippet, I'd go for:
char getLetter(const vector<int> &input)
{
struct
{
char result;
std::vector<char> data;
} const data[]=
{
{ 'A', {1,2,3,4,5}, },
{ 'B', {5,3,7,1}, },
...
};
for(auto const & probe : data)
{
if (comparevec(input, probe.data))
return probe.result;
}
// input does not match any of the given values
throw "That's not the input I'm looking for!";
}
For 40 such pairs, if this is not called in a tight inner loop, the linear search is good enough.
Alternatives:
use a std::map<std::vector<char>, char> to map valid values to results, and turn compareVec into a functor suitable as key-comaprison for the map, and initialize it the same way.
as above, but use a std::unordered_map.
use gperf, as suggested by #PaulFloyd above

I would start by suggesting that you hash or represent the numbers in their binary collection so that you are not comparing vectors each time as that would prove very costly. That said, your question is about how to make a dictionary, so whether you improve your keys as I suggested or not, I'd prefer the use of a map:
map<vector<int>, char, function<bool(const vector<int>&, const vector<int>&)>> dictionary([](const auto& lhs, const auto& rhs){
const auto its = mismatch(cbegin(lhs), cend(lhs), cbegin(rhs), cend(rhs));
return its.second != cend(rhs) && (its.first == cend(lhs) || *its.first < *its.second);
});
If possible dictionary should be constructed constant with an initializer_list containing all mappings and the comparator. If mappings must be looked up before you are guaranteed to have finished all letters then you obviously can't construct constant. Either way this map should be a private member of the class responsible for translating strings. Adding and mapping should be public functions of the class.
Live Example

map comparator for pair of objects in c++

I want to use a map to count pairs of objects based on member input vectors. If there is a better data structure for this purpose, please tell me.
My program returns a list of int vectors. Each int vector is the output of a comparison between two int vectors ( a pair of int vectors). It is, however, possible, that the output of the comparison differs, though the two int vectors are the same (maybe in different order). I want to store how many different outputs (int vectors) each pair of int vectors has produced.
Assuming that I can access the int vector of my object with .inp()
Two pairs (a1,b1) and (a2,b2) should be considered equal, when (a1.inp() == a2.inp() && b2.inp() == b1.inp()) or (a1.inp() == b2.inp() and b1.inp() == a2.inp()).
This answer says:
The keys in a map a and b are equivalent by definition when neither a
< b nor b < a is true.
class SomeClass
{
vector <int> m_inputs;
public:
//constructor, setter...
vector<int> inp() {return m_inputs};
}
typedef pair < SomeClass, SomeClass > InputsPair;
typedef map < InputsPair, size_t, MyPairComparator > InputsPairCounter;
So the question is, how can I define equivalency of two pairs with a map comparator. I tried to concatenate the two vectors of a pair, but that leads to (010,1) == (01,01), which is not what I want.
struct MyPairComparator
{
bool operator() (const InputsPair & pair1, const InputsPair pair2) const
{
vector<int> itrc1 = pair1.first->inp();
vector<int> itrc2 = pair1.second->inp();
vector<int> itrc3 = pair2.first->inp();
vector<int> itrc4 = pair2.second->inp();
// ?
return itrc1 < itrc3;
}
};

I want to use a map to count pairs of input vectors. If there is a better data structure for this purpose, please tell me.
Using std::unordered_map can be considered instead due to 2 reasons:
if hash implemented properly it could be faster than std::map
you only need to implement hash and operator== instead of operator<, and operator== is trivial in this case
Details on how implement hash for std::vector can be found here. In your case possible solution could be to join both vectors into one, sort it and then use that method to calculate the hash. This is straightforward solution, but can produce to many hash collisions and lead to worse performance. To suggest better alternative would require knowledge of the data used.

As I understand, you want:
struct MyPairComparator
{
bool operator() (const InputsPair& lhs, const InputsPair pair2) const
{
return std::minmax(std::get<0>(lhs), std::get<1>(lhs))
< std::minmax(std::get<0>(rhs), std::get<1>(rhs));
}
};
we order the pair {a, b} so that a < b, then we use regular comparison.

C++ using an unordered key combination for a map lookup

I want to create an unordered_map, where the key is a combination of two integers. As the key values order shall be ignored when comparing, I thought of using an unordered_set as key like this:
#include <unordered_set>
#include <unordered_map>
using namespace std;
int main ()
{
unordered_set<int> key_set1 = {21, 42};
unordered_map<unordered_set<int>, char> map;
map[key_set1] = 'a';
...
unordered_set<int> key_set2 = {42, 21};
if(map[key_set2] == map[key_set2])
success();
}
On compile time it looks like some problem with the hash function:
error: no match for call to ‘(const std::hash<std::unordered_set<int> >) (const std::unordered_set<int>&)’
noexcept(declval<const _Hash&>()(declval<const _Key&>()))>
How can I solve this? Or is there a better way/data structure?

There is no predefined hash function for an unordered_set so you have to implement your own; there's documentation for that here http://en.cppreference.com/w/cpp/utility/hash.
Basically you'd need:
// custom specialization of std::hash can be injected in namespace std
namespace std
{
template<> struct hash<unordered_set<int>>
{
std::size_t operator()(unordered_set<int> const& s) const
{
std::size_t hash = 0;
for (auto && i : s) hash ^= std::hash<int>()(i);
return hash;
}
};
}
Now xor isn't the recommended way to combine hash functions, but it should work in this case specifically because it's both unordered and set. Because it's unordered you need a function that's commutative. The recommended hash combiners don't have this property as you usually want "abc" to hash differently than "bca". Secondly the fact that it's a set insures that you won't have any duplicate elements. This saves your hash function from failing because x ^ x == 0.
I should also mention that you want to define this in the cpp file so you don't expose this specific hash implementation on std types to everyone.

The problem is that unordered_set is not built for being used as a key in an unordered container.
If you always use exactly two ints, it would be more economical for you to use a pair of ints as a key, and add a function that makes a properly ordered pair from two integers:
pair<int,int> unordered_key(int a, int b) {
return a<b?make_pair(a, b):make_pair(b, a);
}

As pointed out earlier to use std::pair directly as a key you would need to explicitly define a hash function for it. If you want to avoid that, you can just do a bit-wise combination of 2 unsigned integers into 1:
uint64_t makeKey(uint32_t a, uint32_t b)
{
return a < b ? (static_cast<uint64_t>(a) << 32) + b : (static_cast<uint64_t>(b) << 32) + a;
}
int main ()
{
auto key_set1 = makeKey(21, 42);
unordered_map<uint64_t, char> map;
map[key_set1] = 'a';
//...
auto key_set2 = makeKey(42, 21);
if(map[key_set1] == map[key_set2])
std::cout << "success" << std::endl;
}

Since the order is not important here, you can use std::pair with a customized factory to force the order of the two integers:
std::pair<int, int> make_my_pair(int x, int y) {
return std::make_pair(std::min(x, y), std::max(x, y));
}
Of course this is only going to work if you use make_my_pair consistently.
Alternatively you can define your own key class that has a similar property.

Best way to pass tuples to a function and have it return a list of number?

For example, let's say I want to pass the values (1,2),(2,3),(3,4), etc. into a function and have it return a list of numbers, whatever they may be, i.e. 1, 3, 5, 3, 6 after some operations. What is the best way to achieve this result in C++? After moving from python it seems a lot more difficult to do it here, any help?

In general, you would use the std::vector container and its method push_back. You can then return the vector (return it by value, don't bother allocating it dynamically since your compiler probably supports move-semantics).
std::vector<int> func(
const std::tuple<int, int>& a, const std::tuple <int, int>& b)
{
std::vector<int> ret;
ret.push_back(...);
ret.push_back(...);
return ret;
}

I'm not saying this is the best way but I think it is pretty good, also from the memory-copying prospective, note I avoid returning a vector (expensive since it invokes the operator= implicitly):
#include <vector>
using namespace std;
/**
* Meaningful example: takes a vector of tuples (pairs) values_in and returns in
* values_out the second elements of the tuple whose first element is less than 5
*/
void less_than_5(const vector<pair<int, int> >& values_in, vector<int>& values_out) {
// clean up the values_out
values_out.clear();
// do something with values_in
for (vector<pair<int, int> >::iterator iter = values_in.begin(); iter != values_in.end(); ++iter) {
if (iter->first < 5) {
values_out.push_back(iter->second);
}
}
// clean up the values_out (again just to be consistent :))
values_out.clear();
// do something with values_in (equivalent loop)
for (int i = 0; i < values_in.size(); ++i) {
if (values_in[i].first < 5) {
values_out.push_back(values_in[i].second);
}
}
// at this point values_out contains all second elements from values_in tuples whose
// first is less than 5
}

void function(const std::vector<std::pair<int,int>> &pairs,
std::vector<int> &output) {
/* ... */
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to detect duplicates in a vector of unordered_map? - c++

Related

Spaceship operator on arrays

Proper way to use large amount of known constant variables

map comparator for pair of objects in c++

C++ using an unordered key combination for a map lookup

Best way to pass tuples to a function and have it return a list of number?

Categories

Resources