How do I reuse my hasher in another hasher? - c++

I have been given a task to write a C++ program that stores an unordered_set of objects street. At the same time, object street is to contain some general info and an unordered_set of objects house. I have written a structure-hasher for house
struct house_hasher {
std::hash<std::string> number_hash;
std::hash<size_t> storeys_hash;
std::hash<size_t> aparts_hash;
std::hash<size_t> residents_hash;
std::hash<std::string> street_name_hash;
std::hash<double> pfsm_hash;
std::hash<double*> pea_hash;
std::hash<double*> sea_hash;
std::hash<bool*> ps_hash;
size_t operator()(const house& h) const {
const size_t coef = 2'946'901;
size_t hash_value = 0;
hash_value = (pow(coef, 8) * number_hash(h.getter_number()) +
pow(coef, 7) * storeys_hash(h.getter_storeys_n()) +
pow(coef, 6) * aparts_hash(h.getter_aparts_n()) +
pow(coef, 5) * residents_hash(h.getter_residents_n()) +
pow(coef, 4) * street_name_hash(h.getter_street_name()) +
pow(coef, 3) * pfsm_hash(h.getter_price_for_square_meter()) +
pow(coef, 2) * pea_hash(h.getter_payments_each_apartments()) +
coef * sea_hash(h.getter_square_each_apartments()) +
ps_hash(h.getter_payments_statuses()));
return hash_value;
}
};
to enable adding house to the container and it works correctly. But the thing is that I have to write a structure-hasher for object street since it must be addable as well. I assume I could easily add another structure and copy-paste the code from the initial structure to the one mentioned, so that the part, that hashes houses, repeats twice in the project but still it seems to be stupid.
What I want to do is to create one more structure-hasher for streets but to use the object of house_hasher in it and be able to hash houses like that
struct street_hasher {
std::hash<std::string> name_hash;
std::hash<size_t> number_hash;
std::hash<size_t> houses_hash;
std::hash<std::unordered_set<house, house_hasher>> uset_houses_hash;
size_t operator()(const street& s) const {
const size_t coef = 2'946'901;
size_t hash_value = 0;
size_t add_hash = 0;
house_hasher hasher_for_house;
std::unordered_set<house, house_hasher>::iterator uset_it = s.getter_street_uset_houses().begin();
for (uset_it; uset_it != s.getter_street_uset_houses().end(); ++uset_it) {
add_hash += hasher_for_house(*uset_it);
}
hash_value = (pow(coef, 3) * name_hash(s.getter_street_name()) +
pow(coef, 2) * number_hash(s.getter_street_number()) +
coef * houses_hash(s.getter_street_houses_n()) +
add_hash);
return hash_value;
}
};
However, this does not seem to be correct. VS throws an error C2280 "std::_Uhash_compare<_Kty,_Hasher,_Keyeq>::_Uhash_compare(const std::_Uhash_compare<_Kty,_Hasher,_Keyeq> &)": предпринята попытка ссылки на удаленную функцию
The last few words say an attempt to refer to deleted function occurred.
I have tried a lot of ideas of mine and those found on the net. Still, there is no output. Could anybody please tell me how to solve this tricky problem?
Thanks in advance!

The solution is to arrange the street_hasher structure this way (it works but I am sure there is another way to make this work which would be more elegant and laconic):
struct street_hasher {
std::hash<std::string> name_hash;
std::hash<size_t> number_hash;
std::hash<size_t> houses_hash;
size_t operator()(const street& s) const {
const size_t coef = 2'946'901;
size_t hash_value = 0;
size_t add_hash = 0;
house_hasher hasher_for_house; // need this to hash houses
std::unordered_set < house, house_hasher > uset = s.getter_street_uset_houses();
std::unordered_set<house, house_hasher>::iterator uset_it = uset.begin();
std::unordered_set<house, house_hasher>::iterator end_it = uset.end();
size_t houses_it = s.getter_street_houses_n() + 2;
// there is 2 as we have y = 3 more attributes to hash in street
// but we start with the power of n - 1, where n = houses_n + y =>
// => n = s.getter_street_houses_n() + 3
// => n - 1 = s.getter_street_houses_n() + 2
hash_value = (pow(coef, houses_it) * name_hash(s.getter_street_name()) +
pow(coef, houses_it - 1) * number_hash(s.getter_street_number()) +
pow(coef, houses_it - 2) * houses_hash(s.getter_street_houses_n()) +
add_hash);
houses_it -= 3;
for (uset_it; uset_it != end_it; ++uset_it) {
add_hash += pow(coef, houses_it) * hasher_for_house(*uset_it);
--houses_it;
}
// I guess it is much safer to treat each house as an independent attribute
hash_value += add_hash;
return hash_value;
}
};
I have deleted std::hash<std::unordered_set<house, house_hasher>> uset_houses_hash; from the header and it stopped throwing the error. After one problem went away, another one came. The compiler was angry at how I initialized the loop saying list iterators incompatible. I have just rearranged the function a bit by setting so-called static house_hasher and iterators. As an addition, I came up with a thought to deal with each house in unordered_set as with a different attribute of street, so that we avoid collisions (I may have just said something stupid but still let it be this way). Thus, such way of hashing seems to me say more safer. Anyway, it works!
Cheers!

Related

use of 'n' before deduction of 'auto' C++

I'm trying to have my function return 3 values (n, down and across) I've read online how 'auto' can be used but must be doing something wrong.
The function takes in a 2D vector of integers (as well as other variables) and checks for how many numbers are connected to board[0][0] such that they are the same number.
I've tried putting auto in front of the function inside the function itself, tried leaving it blank, tried just having chain = chainNodes(...) but I always seem to get an error. Here's the code:
tuple<int, int, int> chainNodes(vector<vector<int>> board, int originalNum,
unsigned int across, unsigned int down, int ijSum,
int n)
{
struct chain {
int n, down, across;
};
if(down + across > ijSum) {
ijSum = down + across;
} else if((down + across == ijSum) &&
((down - across) * (down - across) < (ijSum) * (ijSum))) {
ijSum = down + across;
}
board[down][across] = 0;
n += 1;
// Check below
if((down != (board.size() - 1)) && (board[down + 1][across]) == originalNum) {
down += 1;
auto [n, iPoint, jPoint] = chainNodes(board, originalNum, across, down, ijSum, n);
down -= 1;
}
// Check right, up and left (I've removed so its not too messy here)
return chain{n, down, across};
}
Sorry, I forgot to include the error message.
error: use of 'n' before deduction of 'auto'
It occurs on the line that uses auto.
Issue with
auto [n, iPoint, jPoint] = chainNodes(board, originalNum, across, down, ijSum, n);
is similar to
auto n = foo(n); // `foo(n)` uses `n` from `auto n`,
// not the one from outer scope as function parameter
The construct int a = a + 1; is legal but lead to UB as reading uninitialized variable.
That kind of construct allows legal and valid behavior void* p = &p;.
Your code has other errors and it is not clear for me expected behavior of the function.
So not sure if following is the correct fix, but you might want:
n = std::get<0>(chainNodes(board, originalNum, across, down, ijSum, n));

Looking for C++ immutable hashset/hashmap

I work on GPL'ed C++ code with heavy data processing. One particular pattern we often have is to collect some amount (thousands to millions) of keys or key/value pairs (usually int32..int128), insert them into hashset/hashmap and then use it without further modifications.
I named it immutable hashtable, although single-assignment hashtable may be even a better name since we don't use it prior to full construction.
Today we are using STL unordered_map/set, but we are looking for a better (especially faster) library. Can you recommend anything suitable for the situation, with GPL-compatible license?
I think that the most efficient approach would be to radix-sort all keys by the bucket num and provide bucket->range mapping, so we can use the following code to search for a key:
bool contains (set,key) {
h = hash(key);
b = h % BUCKETS;
for (i : range(set.bucket[b], set.bucket[b+1]-1)
if (set.keys[i]==key) return true;
return false;
}
Your comments on this approach? Can you propose a faster way to implement immutable map/set?
I think, for your case is more suitable Double Hashing or Robin Hood Hashing. Among lot of possible algorithms, I prefer to use Double Hashing with 2^n table and odd step. This algorithm very efficient and easy to code. Following is just an example of such container for uint32_t keys:
class uint32_DH {
static const int _TABSZ = 1 << 20; // 1M cells, 2^N size
public:
uint32_DH() { bzero(_data, sizeof(_data)); }
bool search(uint32_t key) { return *lookup(key) == key; }
void insert(uint32_t key) { *lookup(key) = key; }
private:
uint32_t* lookup(uint32_t key) {
uint32_t pos = key + (key >> 32) * 7919;
uint32_t step = (key * 7717 ^ (pos >> 16)) | 1;
uint32_t *rc;
do {
rc = _data + ((pos += step) & (_TABSZ - 1));
} while(*rc != 0 && *rc != key);
return rc;
}
uint32_t _data[_TABSZ];
}

Number of buckets of std::unordered_map grows unexpectedly

I'd like to use std::unordered map as a software cache with a limited capacity. Namely, I set the number of buckets in the constructor (doesn't mind that it might become actually larger) and insert new data (if not already there) if the following way:
If the bucket where the data belong is not empty, I replace its node with the inserted data (by C++17 extraction-insertion pattern).
Otherwise, I simply insert data.
The minimal example that simulates this approach is as follows:
#include <iostream>
#include <unordered_map>
std::unordered_map<int, int> m(2);
void insert(int a) {
auto idx = m.bucket(a);
if (m.bucket_size(idx) > 0) {
const auto& key = m.begin(idx)->first;
auto nh = m.extract(key);
nh.key() = a;
nh.mapped() = a;
m.insert(std::move(nh));
}
else
m.insert({a, a});
}
int main() {
for (int i = 0; i < 1000; i++) {
auto bc1 = m.bucket_count();
insert(i);
auto bc2 = m.bucket_count();
if (bc1 != bc2) std::cerr << bc2 << std::endl;
}
}
The problem is, that with GCC 8.1 (that is available for me in the production environment), the bucket count is not fixed and grows instead; the output reads:
7
17
37
79
167
337
709
1493
Live demo: https://wandbox.org/permlink/c8nnEU52NsWarmuD
Updated info: the bucket count is always increased in the else branch: https://wandbox.org/permlink/p2JaHNP5008LGIpL.
However, when I use GCC 9.1 or Clang 8.0, the bucket count remains fixed (no output is printed in the error stream).
My question is whether this is a bug in the older version of libstdc++, or my approach isn't correct and I cannot use std::unordered_map this way.
Moreover, I found out that the problem disappears when I set the max_load_factor to some very high number, such as
m.max_load_factor(1e20f);
But I don't want to rely on such a "fragile" solution in the production code.
Unfortunately the problem you're having appears to be a bug in older implementations of std::unordered_map. This problem disappears in g++-9, but if you're limited to g++-8, I recommend rolling your own hash-cache.
Rolling our own hash-cache
Thankfully, the type of cache you want to write is actually simpler than writing a full hash-table, mainly because it's fine if values occasionally get dropped from the table. To see how difficult it'd be, I wrote my own version.
So what's it look like?
Let's say you have an expensive function you want to cache. The fibbonacci function, when written using the recursive implementation, is notorious for requiring exponential time in terms of the input because it calls itself.
// Uncached version
long long fib(int n) {
if(n <= 1)
return n;
else
return fib(n - 1) + fib(n - 2);
}
Let's transform it to the cached version, using the Cache class which I'll show you in a moment. We actually only need to add one line of code to the function:
// Cached version; much faster
long long fib(int n) {
static auto fib = Cache(::fib, 1024); // fib now refers to the cache, instead of the enclosing function
if(n <= 1)
return n;
else
return fib(n - 1) + fib(n - 2); // Invokes cache
}
The first argument is the function you want to cache (in this case, fib itself), and the second argument is the capacity. For n == 40, the uncached version takes 487,000 microseconds to run. And the cached version? Just 16 microseconds to initialize the cache, fill it, and return the value! You can see it run here.. After that initial access, retrieving a stored value from the cache takes around 6 nanoseconds.
(If Compiler Explorer shows the assembly instead of the output, click on the tab next to it.)
How would we write this Cache class?
Here's a compact implementation of it. The Cache class stores the following
An array of bools, which keeps track of which buckets have values
An array of keys
An array of values
A bitmask & hash function
A function to calculate values that aren't in the table
In order to calculate a value, we:
Check if the key is stored in the table
If the key is not in the table, calculate and store the value
Return the stored value
Here's the code:
template<class Key, class Value, class Func>
class Cache {
static size_t calc_mask(size_t min_cap) {
size_t actual_cap = 1;
while(actual_cap <= min_cap) {
actual_cap *= 2;
}
return actual_cap - 1;
}
size_t mask = 0;
std::unique_ptr<bool[]> isEmpty;
std::unique_ptr<Key[]> keys;
std::unique_ptr<Value[]> values;
std::hash<Key> hash;
Func func;
public:
Cache(Cache const& c)
: mask(c.mask)
, isEmpty(new bool[mask + 1])
, keys(new Key[mask + 1])
, values(new Value[mask + 1])
, hash(c.hash)
, func(c.func)
{
std::copy_n(c.isEmpty.get(), capacity(), isEmpty.get());
std::copy_n(c.keys.get(), capacity(), keys.get());
std::copy_n(c.values.get(), capacity(), values.get());
}
Cache(Cache&&) = default;
Cache(Func func, size_t cap)
: mask(calc_mask(cap))
, isEmpty(new bool[mask + 1])
, keys(new Key[mask + 1])
, values(new Value[mask + 1])
, hash()
, func(func) {
std::fill_n(isEmpty.get(), capacity(), true);
}
Cache(Func func, size_t cap, std::hash<Key> const& hash)
: mask(calc_mask(cap))
, isEmpty(new bool[mask + 1])
, keys(new Key[mask + 1])
, values(new Value[mask + 1])
, hash(hash)
, func(func) {
std::fill_n(isEmpty.get(), capacity(), true);
}
Value operator()(Key const& key) const {
size_t index = hash(key) & mask;
auto& value = values[index];
auto& old_key = keys[index];
if(isEmpty[index] || old_key != key) {
old_key = key;
value = func(key);
isEmpty[index] = false;
}
return value;
}
size_t capacity() const {
return mask + 1;
}
};
template<class Key, class Value>
Cache(Value(*)(Key), size_t) -> Cache<Key, Value, Value(*)(Key)>;

Fixed size container where elements are sorted and can provide a raw pointer to the data in C++

Is there an STL container which size can be limited, where inserting elements keep it sorted and can provide a raw pointer to the data in C++ or can it be built by assembling some stuff from the STL and C++ ?
In fact, I'm receiving real time data (epoch + data), and I noticed that they aren't "always" sent in an increasing order of the epoch.
I only save 1024 data points to plot them with a plotting API, thus, I need a double raw pointer to the data (x => epoch, y => data).
I wrote a class that fills a 1024 double arrays of times and values. After receiving the 1023th data point, the buffer is shifted to receive the next data points.
Adding sorting to the code below, might overcomplicate it, so is there a better way to code it ?
struct TemporalData
{
TemporalData(const unsigned capacity) :
m_timestamps(new double[capacity]),
m_bsl(new double[capacity]),
m_capacity(capacity),
m_size(0),
m_lastPos(capacity - 1)
{
}
TemporalData(TemporalData&& moved) :
m_capacity(moved.m_capacity),
m_lastPos(moved.m_lastPos)
{
m_size = moved.m_size;
m_timestamps = moved.m_timestamps;
moved.m_timestamps = nullptr;
m_bsl = moved.m_bsl;
moved.m_bsl = nullptr;
}
TemporalData(const TemporalData& copied) :
m_capacity(copied.m_capacity),
m_lastPos(copied.m_lastPos)
{
m_size = copied.m_size;
m_timestamps = new double[m_capacity];
m_bsl = new double[m_capacity];
std::copy(copied.m_timestamps, copied.m_timestamps + m_size, m_timestamps);
std::copy(copied.m_bsl, copied.m_bsl + m_size, m_bsl);
}
TemporalData& operator=(const TemporalData& copied) = delete;
TemporalData& operator=(TemporalData&& moved) = delete;
inline void add(const double timestamp, const double bsl)
{
if (m_size >= m_capacity)
{
std::move(m_timestamps + 1, m_timestamps + 1 + m_lastPos, m_timestamps);
std::move(m_bsl + 1, m_bsl + 1 + m_lastPos, m_bsl);
m_timestamps[m_lastPos] = timestamp;
m_bsl[m_lastPos] = bsl;
}
else
{
m_timestamps[m_size] = timestamp;
m_bsl[m_size] = bsl;
++m_size;
}
}
inline void removeDataBefore(const double ptTime)
{
auto itRangeToEraseEnd = std::lower_bound(m_timestamps,
m_timestamps + m_size,
ptTime);
auto timesToEraseCount = itRangeToEraseEnd - m_timestamps;
if (timesToEraseCount > 0)
{
// shift
std::move(m_timestamps + timesToEraseCount, m_timestamps + m_size, m_timestamps);
std::move(m_bsl + timesToEraseCount, m_bsl + m_size, m_bsl);
m_size -= timesToEraseCount;
}
}
inline void clear() { m_size = 0; }
inline double* x() const { return m_timestamps; }
inline double* y() const { return m_bsl; }
inline unsigned size() const { return m_size; }
inline unsigned capacity() const { return m_capacity; }
~TemporalData()
{
delete [] m_timestamps;
delete [] m_bsl;
}
private:
double* m_timestamps; // x axis
double* m_bsl; // y axis
const unsigned m_capacity;
unsigned m_size;
const unsigned m_lastPos;
};
Is there an STL container which size can be limited, where inserting elements keep it sorted and can provide a raw pointer to the data in C++ or can it be built by assembling some stuff from the STL and C++ ?
No, but you can keep a container sorted via e.g. std::lower_bound. If the container can be accessed randomly, the insertion will be O(log(N)) in time.
After receiving the 1023th data point, the buffer is shifted to receive the next data points.
That sounds like a circular buffer. However, if you want to keep the elements sorted, it won't be a circular buffer anymore; unless you are talking about a sorted view on top of a circular buffer.
Is there an STL container which size can be limited, where inserting elements keep it sorted and can provide a raw pointer to the data in C++
No. There is no such standard container.
or can it be built by assembling some stuff from the STL and C++ ?
Sure.
Size limitation can be implemented using an if-statement. Arrays can be iterated using a pointer, and there is standard algorithm for sorting.
What I want is to insert the element at the right place in the fixed size buffer (like a priority queue), starting from its end, I thought it's faster than pushing back the element and then sorting the container.
It depends. If you insert multiple elements at a time, then sorting has better worst case asymptotic complexity.
But if you insert one at a time, and especially if the elements are inserted in "mostly sorted" order, then it may be better for average case complexity to simply search for the correct position, and insert.
The searching can be done linearly (std::find), which may be most efficient depending on how well the input is ordered, or using binary search (std::lower_bound family of functions), which has better worst case complexity. Yet another option is exponential search, but there is no standard implementation of that.
Moreover, as I have a paired data but in two different buffers, I can't use std::sort !
It's unclear why the former would imply the latter.
Following the advice of Acorn, I wrote this (I know it's ugly but it does what I want)
inline void add(const double timestamp, const double bsl)
{
if (m_size >= m_capacity)
{
const auto insertPositionIterator = std::lower_bound(m_timestamps,
m_timestamps + m_size,
timestamp);
if (insertPositionIterator == m_timestamps)
{
if (*insertPositionIterator == timestamp)
{
m_timestamps[0] = timestamp;
m_bsl[0] = bsl;
}
// then return...
}
else
{
const auto shiftIndex = insertPositionIterator - m_timestamps; // for data
std::move(m_timestamps + 1, insertPositionIterator, m_timestamps);
std::move(m_bsl + 1, m_bsl + shiftIndex, m_bsl);
*(insertPositionIterator - 1) = timestamp;
m_bsl[shiftIndex - 1] = bsl;
}
}
else
{
auto insertPositionIterator = std::lower_bound(m_timestamps,
m_timestamps + m_size,
timestamp);
if (insertPositionIterator == m_timestamps + m_size)
{
// the new inserted element is strictly greater than the already
// existing element or the buffer is empty, let's push it at the back
m_timestamps[m_size] = timestamp;
m_bsl[m_size] = bsl;
}
else
{
// the new inserted element is equal or lesser than an already
// existing element, let's insert it at its right place
// to keep the time buffer sorted in ascending order
const auto shiftIndex = insertPositionIterator - m_timestamps; // for data
// shift
assert(insertPositionIterator == m_timestamps + shiftIndex);
std::move_backward(insertPositionIterator, m_timestamps + m_size, m_timestamps + m_size + 1);
std::move_backward(m_bsl + shiftIndex, m_bsl + m_size, m_bsl + m_size + 1);
*insertPositionIterator = timestamp; // or m_timestamps[shiftIndex] = timestamp;
m_bsl[shiftIndex] = bsl;
}
++m_size;
}
}

C++ initialize variable based on condition [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I am currently trying to figure out how to initialize variables based on conditions. So this is the current code that I want to modify:
int dimsOut[4];
dimsOut[0] = data->nDataVar();
dimsOut[1] = dims[0];
dimsOut[2] = dims[1];
dimsOut[3] = dims[2];
const size_t dataSize = data->primType().getTypeSize() * dimsOut[0] * dimsOut[1] * dimsOut[2] * dimsOut[3];
Since this is part of a giant project (mostly C++98 with some parts of C++03) I want to try to modify as less as possible to avoid any problems in the rest of the code.
So what I want to do is simply in case data->nDataVar() returns 1 that the code above executes and in case it returns something else it should
basically do this
int dimsOut[3];
dimsOut[0] = data->nDataVar();
dimsOut[1] = dims[0];
dimsOut[2] = dims[1];
const size_t dataSize = data->primType().getTypeSize() * dimsOut[0] * dimsOut[1] * dimsOut[2];
I am aware that it is not possible to use if-statements since the variables would go out of scope.
Edit: I solved my problem now. It is not beautiful, but it does what it is supposed to do.
Edit2: small change
int decide_dimension = data->nDataVar();
std::vector<int> dimsOut;
dimsOut.resize(3);
dimsOut[0] = dims[0];
dimsOut[1] = dims[1];
dimsOut[2] = dims[2];
if (decide_dimension != 1)
{
dimsOut.push_back(data->nDataVar());
}
const size_t dataSize = data->primType().getTypeSize() * dimsOut[0] * dimsOut[1] * dimsOut[2] * ((decide_dimension == 1) ? 1 : dimsOut[3]);
You can use the ternary or conditional operator. The basic form is:
condition ? valueIfTrue : valueIfFalse
Example:
const char* x = (SomeFunction() == 0) ? "is null" : "is not null";
When SomeFunction() returns 0, x is initialised with "is null", otherwise
with "is not null".
What you want to achieve is not possible. The only thing you can do, is to initialize them with a ternary as suggested or move the initialization into a if block and leave the declaration out.
You say you are modifying an existing old project. In that case it makes sense to keep changes minimal.
However, you can't define the size of a static array at run time. If you want, you can keep the array as it currently is and make sure you don't use the 4th element when data->nDataVar() != 1.
Then:
const size_t dataSize = data->primType().getTypeSize() * dimsOut[0] * dimsOut[1] * dimsOut[2] * (data->nDataVar() != 1 ? 1 : dimsOut[3]);
It's worth mentioning the dimsOut array seems completely unnecessary to calculate the value of dataSize, but who knows what else your code is doing with it. If it is only used inside a single function/method then you could easily replace it with something else, such as std::vector.
Your question is a bit confusing, or rather not enough information was provided. It is not clear if the dimsOut[] variable is needed elsewhere. In case it is used ONLY for the computation of dataSize, you can use directly the dims[] array and do:
int typeSize = data->primType().getTypeSize() * data->nDataVar();
int dimension = dims[0] * dims[1] * ((condition) ? dims[2] : 1);
const size_t dataSize = typeSize * dimension;
In case that dimsOut is used elsewhere, then you can use the first block modifying the dimsOut[3] assignment with the ternary operator:
int dimsOut[4];
dimsOut[0] = data->nDataVar();
dimsOut[1] = dims[0];
dimsOut[2] = dims[1];
dimsOut[3] = (condition) ? dims[2] : 1;
const size_t dataSize = data->primType().getTypeSize() * dimsOut[0] * dimsOut[1] * dimsOut[2] * dimsOut[3];