I want to use a cache, implemented by boost's unordered_map, from a dynamic_bitset to a dynamic_bitset. The problem, of course, is that there is no default hash function from the bitset. It doesn't seem to be like a conceptual problem, but I don't know how to work out the technicalities. How should I do that?
I found an unexpected solution. It turns out boost has an option to #define BOOST_DYNAMIC_BITSET_DONT_USE_FRIENDS. When this is defined, private members including m_bits become public (I think it's there to deal with old compilers or something).
So now I can use #KennyTM's answer, changed a bit:
namespace boost {
template <typename B, typename A>
std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {
return boost::hash_value(bs.m_bits);
}
}
There's to_block_range function that copies out the words that the bitset consists of into some buffer. To avoid actual copying, you could define your own "output iterator" that just processes individual words and computes hash from them. Re. how to compute hash: see e.g. the FNV hash function.
Unfortunately, the design of dynamic_bitset is IMHO, braindead because it does not give you direct access to the underlying buffer (not even as const).
It is a feature request.
One could implement a not-so-efficient unique hash by converting the bitset to a vector temporary:
namespace boost {
template <typename B, typename A>
std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {
std::vector<B, A> v;
boost::to_block_range(bs, std::back_inserter(v));
return boost::hash_value(v);
}
}
We can't directly calculate the hash because the underlying data in dynamic_bitset is private (m_bits)
But we can easily finesse past (subvert!) the c++ access specification system without either
hacking at the code or
pretending your compiler is non-conforming (BOOST_DYNAMIC_BITSET_DONT_USE_FRIENDS)
The key is the template function to_block_range which is a friend to dynamic_bitset. Specialisations of this function, therefore, also have access to its private data (i.e. m_bits).
The resulting code couldn't be simpler
namespace boost {
// specialise dynamic bitset for size_t& to return the hash of the underlying data
template <>
inline void
to_block_range(const dynamic_bitset<>& b, size_t& hash_result)
{
hash_result = boost::hash_value(bs.m_bits);
}
std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs)
{
size_t hash_result;
to_block_range(bs, hash_result);
return hash_result;
}
}
the proposed solution generates the same hash in the following situation.
#define BOOST_DYNAMIC_BITSET_DONT_USE_FRIENDS
namespace boost {
template <typename B, typename A>
std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {
return boost::hash_value(bs.m_bits);
}
}
boost::dynamic_biset<> test(1,false);
auto hash1 = boost::hash_value(test);
test.push_back(false);
auto hash2 = boost::hash_value(test);
// keep continue...
test.push_back(false);
auto hash31 = boost::hash_value(test);
// magically all hash1 to hash31 are the same!
the proposed solution is sometimes improper for hash map.
I read the source code of dynamic_bitset why this happened and realized that dynamic_bitset stores one bit per value as same as vector<bool>. For example, you call dynamic_bitset<> test(1, false), then dynamic_bitset initially allocates 4 bytes with all zero and it holds the size of bits (in this case, size is 1). Note that if the size of bits becomes greater than 32, then it allocates 4 bytes again and push it back into dynamic_bitsets<>::m_bits (so m_bits is a vector of 4 byte-blocks).
If I call test.push_back(x), it sets the second bit to x and increases the size of bits to 2. If x is false, then m_bits[0] does not change at all! In order to correctly compute hash, we need to take m_num_bits in hash computation.
Then, the question is how?
1: Use boost::hash_combine
This approach is simple and straight forward. I did not check this compile or not.
namespace boost {
template <typename B, typename A>
std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {
size_t tmp = 0;
boost::hash_combine(tmp,bs.m_num_bits);
boost::hash_combine(tmp,bs.m_bits);
return tmp;
}
}
2: flip m_num_bits % bits_per_block th bit.
flip a bit based on bit size. I believe this approach is faster than 1.
namespace boost {
template <typename B, typename A>
std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {
// you may need more sophisticated bit shift approach.
auto bit = 1u << (bs.m_num_bits % bs.bits_per_block);
auto return_val = boost::hash_value(bs.m_bits);
// sorry this was wrong
//return (return_val & bit) ? return_val | bit : return_val & (~bit);
return (return_val & bit) ? return_val & (~bit) : return_val | bit;
}
}
Related
Discussion:
Let's say I have a struct/class with an arbitrary number of attributes that I want to use as key to a std::unordered_map e.g.,:
struct Foo {
int i;
double d;
char c;
bool b;
};
I know that I have to define a hasher-functor for it e.g.,:
struct FooHasher {
std::size_t operator()(Foo const &foo) const;
};
And then define my std::unordered_map as:
std::unordered_map<Foo, MyValueType, FooHasher> myMap;
What bothers me though, is how to define the call operator for FooHasher. One way to do it, that I also tend to prefer, is with std::hash. However, there are numerous variations e.g.,:
std::size_t operator()(Foo const &foo) const {
return std::hash<int>()(foo.i) ^
std::hash<double>()(foo.d) ^
std::hash<char>()(foo.c) ^
std::hash<bool>()(foo.b);
}
I've also seen the following scheme:
std::size_t operator()(Foo const &foo) const {
return std::hash<int>()(foo.i) ^
(std::hash<double>()(foo.d) << 1) ^
(std::hash<char>()(foo.c) >> 1) ^
(std::hash<bool>()(foo.b) << 1);
}
I've seen also some people adding the golden ratio:
std::size_t operator()(Foo const &foo) const {
return (std::hash<int>()(foo.i) + 0x9e3779b9) ^
(std::hash<double>()(foo.d) + 0x9e3779b9) ^
(std::hash<char>()(foo.c) + 0x9e3779b9) ^
(std::hash<bool>()(foo.b) + 0x9e3779b9);
}
Questions:
What are they trying to achieve by adding the golden ration or shifting bits in the result of std::hash.
Is there an "official scheme" to std::hash an object with arbitrary number of attributes of fundamental type?
A simple xor is symmetric and behaves badly when fed the "same" value multiple times (hash(a) ^ hash(a) is zero). See here for more details.
This is the question of combining hashes. boost has a hash_combine that is pretty decent. Write a hash combiner, and use it.
There is no "official scheme" to solve this problem.
Myself, I typically write a super-hasher that can take anything and hash it. It hash combines tuples and pairs and collections automatically, where it first hashes the count of elements in the collection, then the elements.
It finds hash(t) via ADL first, and if that fails checks if it has a manually written hash in a helper namespace (used for std containers and types), and if that fails does a std::hash<T>{}(t).
Then my hash for Foo support looks like:
struct Foo {
int i;
double d;
char c;
bool b;
friend auto mytie(Foo const& f) {
return std::tie(f.i, f.d, f.c, f.b);
}
friend std::size_t hash(Foo const& f) {
return hasher::hash(mytie(f));
}
};
where I use mytie to move Foo into a tuple, then use the std::tuple overload of hasher::hash to get the result.
I like the idea of hashes of structurally similar types having the same hash. This lets me act as if my hash is transparent in some cases.
Note that hashing unordered meows in this manner is a bad idea, as an asymmetric hash of an unordered meow may generate spurious misses.
(Meow is the generic name for map and set. Do not ask me why: Ask the STL.)
The standard hash framework is lacking in respect of combining hashes. Combining hashes using xor is sub-optimal.
A better solution is proposed in N3980 "Types Don't Know #".
The main idea is using the same hash function and its state to hash more than one value/element/member.
With that framework your hash function would look:
template <class HashAlgorithm>
void hash_append(HashAlgorithm& h, Foo const& x) noexcept
{
using std::hash_append;
hash_append(h, x.i);
hash_append(h, x.d);
hash_append(h, x.c);
hash_append(h, x.b);
}
And the container:
std::unordered_map<Foo, MyValueType, std::uhash<>> myMap;
I implemented this solution for getting an hash value from vector<T>:
namespace std
{
template<typename T>
struct hash<vector<T>>
{
typedef vector<T> argument_type;
typedef std::size_t result_type;
result_type operator()(argument_type const& in) const
{
size_t size = in.size();
size_t seed = 0;
for (size_t i = 0; i < size; i++)
//Combine the hash of the current vector with the hashes of the previous ones
hash_combine(seed, in[i]);
return seed;
}
};
}
//using boost::hash_combine
template <class T>
inline void hash_combine(std::size_t& seed, T const& v)
{
seed ^= std::hash<T>()(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}
But this solution doesn't scale at all: with a vector<double> of 10 millions elements it's gonna take more than 2.5 s (according to VS).
Does exists a fast hash function for this scenario?
Notice that creating an hash value from the vector reference is not a feasible solution, since the related unordred_map will be used in different runs and in addition two vector<double> with the same content but different addresses will be mapped differently (undesired behavior for this application).
NOTE: As per the comments, you get a 25-50x speed-up by compiling with optimizations. Do that, first. Then, if it's still too slow, see below.
I don't think there's much you can do. You have to touch all the elements, and that combination function is about as fast as it gets.
One option may be to parallelize the hash function. If you have 8 cores, you can run 8 threads to each hash 1/8th of the vector, then combine the 8 resulting values at the end. The synchronization overhead may be worth it for very large vectors.
The approach that MSVC's old hashmap used was to sample less often.
This means that isolated changes won't show up in your hash, but the thing you are trying to avoid is reading and processing the entire 80 mb of data in order to hash your vector. Not reading some characters is pretty unavoidable.
The second thing you should do is not specialize std::hash on all vectors, this may make your program ill-formed (as suggested by a defect resolution whose status I do not recall), and at the least is a bad plan (as the std is sure to permit itself to add hash combining and hashing of vectors).
When I write a custom hash, I usually use ADL (Koenig Lookup) to make it easy to extend.
namespace my_utils {
namespace hash_impl {
namespace details {
namespace adl {
template<class T>
std::size_t hash(T const& t) {
return std::hash<T>{}(t);
}
}
template<class T>
std::size_t hasher(T const& t) {
using adl::hash;
return hash(t);
}
}
struct hash_tag {};
template<class T>
std::size_t hash(hash_tag, T const& t) {
return details::hasher(t);
}
template<class T>
std::size_t hash_combine(hash_tag, std::size_t seed, T const& t) {
seed ^= hash(t) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}
template<class Container>
std::size_t fash_hash_random_container(hash_tag, Container const& c ) {
std::size_t size = c.size();
std::size_t stride = 1 + size/10;
std::size_t r = hash(hash_tag{}, size);
for(std::size_t i = 0; i < size; i += stride) {
r = hash_combine(hash_tag{}, r, c.data()[i])
}
return r;
}
// std specializations go here:
template<class T, class A>
std::size_t hash(hash_tag, std::vector<T,A> const& v) {
return fash_hash_random_container(hash_tag{}, v);
}
template<class T, std::size_t N>
std::size_t hash(hash_tag, std::array<T,N> const& a) {
return fash_hash_random_container(hash_tag{}, a);
}
// etc
}
struct my_hasher {
template<class T>
std::size_t operator()(T const& t)const {
return hash_impl::hash(hash_impl::hash_tag{}, t);
}
};
}
now my_hasher is a universal hasher. It uses either hashes declared in my_utils::hash_impl (for std types), or free functions called hash that will hash a given type, to hash things. Failing that, it tries to use std::hash<T>. If that fails, you get a compile-time error.
Writing a free hash function in the namespace of the type you want to hash tends to be less annoying than having to go off and open std and specialize std::hash in my experience.
It understands vectors and arrays, recursively. Doing tuples and pairs requires a bit more work.
It samples said vectors and arrays at about 10 times.
(Note: hash_tag is both a bit of a joke, and a way to force ADL and prevent having to forward-declare the hash specializations in the hash_impl namespace, because that requirement sucks.)
The price of sampling is that you could get more collisions.
Another approach if you have a huge amount of data is to hash them once, and keep track of when they are modified. To do this approach, use a copy-on-write monad interface for your type that keeps track of if the hash is up to date. Now a vector gets hashed once; if you modify it, the hash is discarded.
One can go futher and have a random-access hash (where it is easy to predict what happens when you edit a given value hash-wise), and mediate all access to the vector. That is tricky.
You could also multi-thread the hashing, but I would guess that your code is probably memory-bandwidth bound, and multi-threading won't help much there. Worth trying.
You could use a fancier structure than a flat vector (something tree like), where changes to the values bubble-up in a hash-like way to a root hash value. This would add a lg(n) overhead to all element access. Again, you'd have to wrap the raw data up in controls that keep the hashing up to date (or, keep track of what ranges are dirty and needs to be updated).
Finally, because you are working with 10 million elements at a time, consider moving over to a strong large-scale storage solution, like databases or what have you. Using 80 megabyte keys in a map seems strange to me.
Right now I have this code:
uint64_t buffer = 0;
const uint8_t * data = reinterpret_cast<uint8_t*>(&buffer);
And this works, but it seems risky due to the hanging pointer (and looks ugly too). I don't want naked pointers sitting around. I want to do something like this:
uint64_t buffer = 0;
const std::array<uint8_t, 8> data = partition_me_softly(buffer);
Is there and c++11 style construct that allows for me to get this into a safe container, preferable a std::array out of an unsigned int like this without inducing overhead?
If not, what would be the ideal way to improve this code to be more safe?
So I modified dauphic's answer to be a little more generic:
template <typename T, typename U>
std::array<T, sizeof(U) / sizeof(T)> ScalarToByteArray(const U v)
{
static_assert(std::is_integral<T>::value && std::is_integral<U>::value,
"Template parameter must be a scalar type");
std::array<T, sizeof(U) / sizeof(T)> ret;
std::copy((T*)&v, ((T*)&v) + sizeof(U), ret.begin());
return ret;
}
This way you can use it with more types like so:
uint64_t buffer = 0;
ScalarToByteArray<uint8_t>(buffer);
If you want to store an integer in a byte array, the best approach is probably to just cast the integer to a uint8_t* and copy it into an std::array. You're going to have to use raw pointers at some point, so your best option is to encapsulate the operation into a function.
template<typename T>
std::array<uint8_t, sizeof(T)> ScalarToByteArray(const T value)
{
static_assert(std::is_integral<T>::value,
"Template parameter must be a scalar type");
std::array<uint8_t, sizeof(T)> result;
std::copy((uint8_t*)&value, ((uint8_t*)&value) + sizeof(T), result.begin());
return result;
}
Not entirely a question, although just something I have been pondering on how to write such code more elegantly by style and at the same time fully making use of the new c++ standard etc. Here is the example
Returning Fibonacci sequence to a container upto N values (for those not mathematically inclined, this is just adding the previous two values with the first two values equal to 1. i.e. 1,1,2,3,5,8,13, ...)
example run from main:
std::vector<double> vec;
running_fibonacci_seq(vec,30000000);
1)
template <typename T, typename INT_TYPE>
void running_fibonacci_seq(T& coll, const INT_TYPE& N)
{
coll.resize(N);
coll[0] = 1;
if (N>1) {
coll[1] = 1;
for (auto pos = coll.begin()+2;
pos != coll.end();
++pos)
{
*pos = *(pos-1) + *(pos-2);
}
}
}
2) the same but using rvalue && instead of & 1.e.
void running_fibonacci_seq(T&& coll, const INT_TYPE& N)
EDIT: as noticed by the users who commented below, the rvalue and lvalue play no role in timing - the speeds were actually the same for reasons discussed in the comments
results for N = 30,000,000
Time taken for &:919.053ms
Time taken for &&: 800.046ms
Firstly I know this really isn't a question as such, but which of these or which is best modern c++ code? with the rvalue reference (&&) it appears that move semantics are in place and no unnecessary copies are being made which makes a small improvement on time (important for me due to future real-time application development). some specific ''questions'' are
a) passing a container (which was vector in my example) to a function as a parameter is NOT an elegant solution on how rvalue should really be used. is this fact true? if so how would rvalue really show it's light in the above example?
b) coll.resize(N); call and the N=1 case, is there a way to avoid these calls so the user is given a simple interface to only use the function without creating size of vector dynamically. Can template metaprogramming be of use here so the vector is allocated with a particular size at compile time? (i.e. running_fibonacci_seq<30000000>) since the numbers can be large is there any need to use template metaprogramming if so can we use this (link) also
c) Is there an even more elegant method? I have a feeling std::transform function could be used by using lambdas e.g.
void running_fibonacci_seq(T&& coll, const INT_TYPE& N)
{
coll.resize(N);
coll[0] = 1;
coll[1] = 1;
std::transform (coll.begin()+2,
coll.end(), // source
coll.begin(), // destination
[????](????) { // lambda as function object
return ????????;
});
}
[1] http://cpptruths.blogspot.co.uk/2011/07/want-speed-use-constexpr-meta.html
Due to "reference collapsing" this code does NOT use an rvalue reference, or move anything:
template <typename T, typename INT_TYPE>
void running_fibonacci_seq(T&& coll, const INT_TYPE& N);
running_fibonacci_seq(vec,30000000);
All of your questions (and the existing comments) become quite meaningless when you recognize this.
Obvious answer:
std::vector<double> running_fibonacci_seq(uint32_t N);
Why ?
Because of const-ness:
std::vector<double> const result = running_fibonacci_seq(....);
Because of easier invariants:
void running_fibonacci_seq(std::vector<double>& t, uint32_t N) {
// Oh, forgot to clear "t"!
t.push_back(1);
...
}
But what of speed ?
There is an optimization called Return Value Optimization that allows the compiler to omit the copy (and build the result directly in the caller's variable) in a number of cases. It is specifically allowed by the C++ Standard even when the copy/move constructors have side effects.
So, why passing "out" parameters ?
you can only have one return value (sigh)
you may wish the reuse the allocated resources (here the memory buffer of t)
Profile this:
#include <vector>
#include <cstddef>
#include <type_traits>
template <typename Container>
Container generate_fibbonacci_sequence(std::size_t N)
{
Container coll;
coll.resize(N);
coll[0] = 1;
if (N>1) {
coll[1] = 1;
for (auto pos = coll.begin()+2;
pos != coll.end();
++pos)
{
*pos = *(pos-1) + *(pos-2);
}
}
return coll;
}
struct fibbo_maker {
std::size_t N;
fibbo_maker(std::size_t n):N(n) {}
template<typename Container>
operator Container() const {
typedef typename std::remove_reference<Container>::type NRContainer;
typedef typename std::decay<NRContainer>::type VContainer;
return generate_fibbonacci_sequence<VContainer>(N);
}
};
fibbo_maker make_fibbonacci_sequence( std::size_t N ) {
return fibbo_maker(N);
}
int main() {
std::vector<double> tmp = make_fibbonacci_sequence(30000000);
}
the fibbo_maker stuff is just me being clever. But it lets me deduce the type of fibbo sequence you want without you having to repeat it.
A demo problem: Given two std::bitset<N>s, a and b check if any bit is set in both a and b.
There are two rather obvious solutions to this problem. This is bad because it creates a new temporary bitset, and copies values all sorts of places just to throw them away.
template <size_t N>
bool any_both_new_temp(const std::bitset<N>& a, const std::bitset<N>& b)
{
return (a & b).any();
}
This solution is bad because it goes one bit at a time, which is less than ideal:
template <size_t N>
bool any_both_bit_by_bit(const std::bitset<N>& a, const std::bitset<N>& b)
{
for (size_t i = 0; i < N; ++i)
if (a[i] && b[i])
return true;
return false;
}
Ideally, I would be able to do something like this, where block_type is uint32_t or whatever type the bitset is storing:
template <size_t N>
bool any_both_by_block(const std::bitset<N>& a, const std::bitset<N>& b)
{
typedef std::bitset<N>::block_type block_type;
for (size_t i = 0; i < a.block_count(); ++i)
if (a.get_block(i) & b.get_block(i))
return true;
return false;
}
Is there an easy way to go about doing this?
I compiled your first example with optimization in g++ and it produced code identical to your third solution. In fact, with a smallish bitset (320 bits) it fully unrolled it. Without calling a function to ensure that the contents of a and b were unknown in main it actually optimized the entire thing away (knowing both were all 0).
Lesson: Write the obvious, readable code and let the compiler deal with it.
You say that your first approach "copies values all sorts of places just to throw them away." But there's really only one extra value-copy (when the result of operator& is returned to any_both_new_temp), and it can be eliminated by using a reference instead of a value:
template <size_t N>
bool any_both_new_temp(const std::bitset<N>& a, const std::bitset<N>& b)
{
std::bitset<N> tmp = a;
tmp &= b;
return tmp.any();
}
(But obviously it will still create a temporary bitset and copy a into it.)