Can I use reduce instead of a for loop? - c++

I have a hashSet with 0 to X numbers and values of high integer numbers.
0:1000000001
1:1000000002
...
and a vector of positions (22,14,29,59,10).
I need to generate all combinations of this vector.
For this goal i use a library from https://github.com/mraggi/discreture to do the generation. I get all combinations of a vector size and need_comb_size like (10,3)
0,1,2 0,1,3 ...
Now i link the generated combination with my vector position and hashSet like
hashSet[vect[comb[0]]+...
Can I use this in an one line with reduce.
The goal is to generate a high integer number(hash) and use this hash as key and my positions from comb example: (12,59,11) as value.
3423422821 : vector ((12,59,11) <-- positions, (3,19,299,490) <-- dimensions). If a comb has the same signature in any dimension, this dimension will be add.
void combination(int size, vector<unsigned short> chunk, vector<unsigned long long> hashSet, unordered_map<unsigned long long, pair<vector<unsigned short>, vector<unsigned short>>> collision_map, unsigned long long low, unsigned long long high, int dimension) {
for (auto&& comb : discreture::combinations_stack(size,KCOMB))
{
unsigned long long signature = 0;
vector<unsigned short> newChunk;
//signature = reduce(std::execution::par, comb.begin(), comb.end())
for (auto v : comb) {
signature += hashSet.at(chunk.at(v));
newChunk.push_back(chunk.at(v));
}
checkSignature(low, high, signature, collision_map, newChunk, dimension);
}
}

Note that std::reduce has the caveat
The behavior is non-deterministic if binary_op is not associative or not commutative.
If you care about the order of elements in newChunk, then you can't use reduce in the inner loop.
If you care about the order of calls to checkSignature, then you can't use reduce in the outer loop, and otherwise you still have to synthesise a value to throw away at the end, as you can't pass a void as the accumulator.

Related

Create custom Hash Function

I tried to implement an unordered map for a Class called Pair, that stores an integer and a bitset. Then I found out, that there isn't a hashfunction for this Class.
Now I wanted to create my own hashfunction. But instead of using the XOR function or comparable functions, I wanted to have a hashfunction like the following approach:
the bitsets in my class obviously have fixed size, so I wanted to do the following:
example: for a instance of Pair with the bitset<6> = 101101, and the integer 6:
create a string = "1011016"
and now use the default hashfunction on this string
because the bitsets have fixed size, each key would be unique
how could I implement this approach?
thank you in advance
To expand on a comment, as requested:
Converting to string and then hashing that string would be somewhat slow. At least slower than it needs to be. A faster approach would be to combine the bit patterns, e.g. like this:
struct Pair
{
std::bitset<6> bits;
int intval;
};
template<>
std::hash<Pair>
{
std::size_t operator()(const Pair& pair) const noexcept
{
std::size_t rtrn = static_cast<std::size_t>(pair.intval);
rtrn = (rtrn << pair.bits.size()) | pair.bits.to_ulong();
return rtrn;
}
};
This works on two assumptions:
The upper bits of the integer are generally not interesting
The size of the bitset is always small compared to size_t
I think it is a suitable hash function for use in unordered_map. One may argue that it has poor mixing and a very good hash should change many bits if only a few bits in its input change. But that is not required here. unordered_map is generally designed to work with cheap hash functions. For example GCC's hash for builtin types and pointers is just a static- or reinterpret-cast.
Possible improvements
We can preserve the upper bits by rotating instead of shifting.
template<>
std::hash<Pair>
{
std::size_t operator()(const Pair& pair) const noexcept
{
std::size_t rtrn = static_cast<std::size_t>(pair.intval);
std::size_t intdigits = std::numeric_limits<decltype(pair.intval)>::digits;
std::size_t bitdigits = pair.bits.size();
// can be simplified to std::rotl(rtrn, bitdigits) in C++20
rtrn = (rtrn << bitdigits) | (rtrn >> (intdigits - bitdigits));
rtrn ^= pair.bits.to_ulong();
return rtrn;
}
};
Nothing will change for small integers (except some bitflips for small negative ints). But for large integers we still use the whole range of inputs, which might be of interest for pathological cases such as integer series 2^30, 2^30 + 2^29, 2^30 + 2^28, ...
If the size of the bitset may increase, stop doing fancy stuff and just combine the hashes. I wouldn't just xor them to avoid hash collisions on small integers.
std::hash<Pair>
{
std::size_t operator()(const Pair& pair) const noexcept
{
std::hash<decltype(pair.intval)> ihash;
std::hash<decltype(pair.bits)> bhash;
return ihash(pair.intval) * 31 + bhash(pair.bits);
}
};
I picked the simple polynomial hash approach common in Java. I believe GCC uses the same one internally for string hashing. Someone else may expand on the topic or suggest a better one. 31 is commonly chosen as it is a prime number one off a power of two. So it can be computed quickly as (x << 5) - x

Code stores a vector as a vector<vector>, why no error message?

I'm working on a piece of C++ code left by a predecessor, and it apparently stores a vector<long int> as a vector<vector<long int>>. The code compiles and runs, but I don't understand why. Here's the function that does the storing.
void setPotentialParameters(const int& seed, const int& nMax, const double& lambdaStd, const int fieldNum, potentialPars& par)
{
gsl_rng * r = gsl_rng_alloc (gsl_rng_taus);
gsl_rng_set (r, seed);
par.nMaximum= nMax;
par.fNum= fieldNum;
for (int i=0; i<100; i++) gsl_ran_gaussian (r, lambdaStd);
int counter=0;
vector<long int> tempNs(fieldNum); //Defines tempNs as a vector<long int>
for (long int i=0; i< (-0.2+pow(2*nMax+1, fieldNum)); i++) {
findPartition(i, fieldNum, 2*nMax+1, tempNs );
for (int i = 0; i < tempNs.size(); i++) {
tempNs[i] -= nMax;
}
if (goodPartition(tempNs, nMax)) {
counter++;
par.lambdas.push_back(abs( gsl_ran_gaussian (r, lambdaStd)));
par.nVals.push_back(tempNs); //Stores tempNs in nVals
par.alphas.push_back(2*M_PI * gsl_rng_uniform (r));
};
};
};
And this is the struct that tempNs is stored in.
struct potentialPars{
int nMaximum;
int fNum;
vector<double> lambdas;
vector<vector<long int> > nVals; //Defines nVals as a vector<vector<long int>>
vector<double> alphas;
};
I marked the three most relevant lines with comments. tempNs only has one element (as seen from the tempNs[i] -= nMax line), consistent with its definition as a vector<long int> but when nVals is called elsewhere in the program it has two elements, also consistent with its definition as a vector<vector<long int>>. It doesn't seem possible. Even though tempNs is modified by the findPartition function, it should still remain a vector of long integers. What am I missing?
A vector<vector<long> > has elements of type vector<long>.
A vectors push_back() method copies an element to the vector.
In code you've shown, par.nValues is of type vector<vector<long> > so pushing tempNS - which is of type vector<long> - is completely appropriate.
There is no problem in using a vector of vectors of longs.
Its almost the same as using a two-dimensional array, but you don't need to know the size at compile time or manage memory allocation.
There is no problem for compiling that code. Vector elements can be primitives(int, float, double), pointers or other objects(like vector or your user-defined classes).
The constraints on the (first) type parameter of std::vector are fairly lax. Almost any non-reference type can be stored in a std::vector, including std::vectors of something else. This is exactly what this code is doing.
You could wrap std::vector<long int> in a
struct partition {
std::vector<long int> indexes;
double lambda;
double alpha;
};
and change potentialPars to
struct potentialPartitions {
int nMaximum;
int fNum;
std::vector<partition> partitions;
};
which would add clarity, but would change how the consumer of potentialPartitions accesses those values.

Algorithm for hash/crc of unordered multiset

Let's say I would like to create a unordered set of unordered multisets of unsigned int. For this, I need to create a hash function to calculate a hash of the unordered multiset. In fact, it has to be good for CRC as well.
One obvious solution is to put the items in vector, sort them and return a hash of the result. This seems to work, but it is expensive.
Another approach is to xor the values, but obviously if I have one item twice or none the result will be the same - which is not good.
Any ideas how I can implement this cheaper - I have an application that will be doing this thousand for thousands of sets, and relatively big ones.
Since it is a multiset, you would like for the hash value to be the same for identical multisets, whose representation might have the same elements presented, added, or deleted in a different order. You would then like for the hash value to be commutative, easy to update, and change for each change in elements. You would also like for two changes to not readily cancel their effect on the hash.
One operation that meets all but the last criteria is addition. Just sum the elements. To keep the sum bounded, do the sum modulo the size of your hash value. (E.g. modulo 264 for a 64-bit hash.) To make sure that inserting or deleting zero values changes the hash, add one to each value first.
A drawback of the sum is that two changes can readily cancel. E.g. replacing 1 3 with 2 2. To address that, you can use the same approach and sum a polynomial of the entries, still retaining commutativity. E.g. instead of summing x+1, you can sum x2+x+1. Now it is more difficult to contrive sets of changes with the same sum.
Here's a reasonable hash function for std::unordered_multiset<int> it would be better if the computations were taken mod a large prime but the idea stands.
#include <iostream>
#include <unordered_set>
namespace std {
template<>
struct hash<unordered_multiset<int>> {
typedef unordered_multiset<int> argument_type;
typedef std::size_t result_type;
const result_type BASE = static_cast<result_type>(0xA67);
result_type log_pow(result_type ex) const {
result_type res = 1;
result_type base = BASE;
while (ex > 0) {
if (ex % 2) {
res = res * base;
}
base *= base;
ex /= 2;
}
return res;
}
result_type operator()(argument_type const & val) const {
result_type h = 0;
for (const int& el : val) {
h += log_pow(el);
}
return h;
}
};
};
int main() {
std::unordered_set<std::unordered_multiset<int>> mySet;
std::unordered_multiset<int> set1{1,2,3,4};
std::unordered_multiset<int> set2{1,1,2,2,3,3,4,4};
std::cout << "Hash 1: " << std::hash<std::unordered_multiset<int>>()(set1)
<< std::endl;
std::cout << "Hash 2: " << std::hash<std::unordered_multiset<int>>()(set2)
<< std::endl;
return 0;
}
Output:
Hash 1: 2290886192
Hash 2: 286805088
When it's a prime p, the number of collisions is proportional to 1/p. I'm not sure what the analysis is for powers of two. You can make updates to the hash efficient by adding/subtracting BASE^x when you insert/remove the integer x.
Implement the inner multiset as a value->count hash map.
This will allow you to avoid the problem that an even number of elements cancels out via xor in the following way: Instead of xor-ing each element, you construct a new number from the count and the value (e.g. multiplying them), and then you can build the full hash using xor.

Manipulating one of the values of a vector of pairs in C++

If I have a vector of doubles PMF I can divide all elements of the vector by a double count by using the transform command as follows:
transform(PMF.begin(),PMF.end(),PMF.begin(),bind2nd(divides<double>(),count));
Now however, I have a vector of unsigned char/double pairs:
vector<pair<unsigned char, double>> PMF
I wish to replace the double values by their values divided by count. I haven't been able to find a way to do this using the transform command or any other C++11 functionality. Does anyone have an idea as to how to do this?
You can use a lambda function as same as this:
transform(PMF.begin(),
PMF.end(),
PMF.begin(),
[count](const pair<unsigned char, double> &x)
{
return make_pair(x.first, x.second/count);
});
or
for_each(PMF.begin(),
PMF.end(),
[count](pair<unsigned char, double>& x)
{
x.second /= count;
});
or
for (auto &x : PMF)
x.second /= count;

creating a vector of pointers that point to more vectors

I am trying to create a vector that contains pointers, each pointer points to another vector of a type Cell which I have made using a struct.
The for loop below allows me to let the user define how many elements there are in the vector of pointers. Here's my code:
vector< vector<Cell>* > vEstore(selection);
for (int t=0; t<selection; t++)
{
vEstore[t] = new vector<Cell>;
vEstore[t]->reserve(1000);
}
This, I think, gives me a vector of pointers to destination vectors of the type Cell.
This compiles but I'm now trying to push_back onto the destination vectors and can't see how to do it.
Since the destination vector is of the type Cell which is made from a type as follows:
struct Cell
{
unsigned long long lr1;
unsigned int cw2;
};
I can't work out how to push_back onto this destination vector with 2 values?
I was thinking ...
binpocket[1]->lr1.push_back(10);
binpocket[1]->cw2.push_back(12);
As I thought this would dereference the pointer at binpocket[1] revealing the destination array values, then address each element in turn.
But it doesn't compile.
can anyone help
...but this only has one value and doesn't compile anyway.
Cell cell = { 10, 12 };
binpocket[1]->push_back(cell);
Alternatively, you can give your struct a constructor.
struct Cell
{
Cell() {}
Cell(unsigned long long lr1, unsigned int cw2)
: lr1(lr1), cw2(cw2)
{
}
unsigned long long lr1;
unsigned int cw2;
};
Then you could do
binpocket[1]->push_back(Cell(10, 12));
Note that long long is non-standard (yet), but is a generally accepted extension.
Give your cell a constructor:
struct Cell
{
unsigned long long lr1;
unsigned int cw2;
Cell( long long lv, int iv ) : lr1(lv), cw2(iv ) {}
};
You can now say things like:
binpocket[1]->push_back( Cell( 10, 12 ) );
BTW, note that long long is not standard C++.