C++ Fast Bitset Short-Circuit Bitwise Operations - c++

A demo problem: Given two std::bitset<N>s, a and b check if any bit is set in both a and b.
There are two rather obvious solutions to this problem. This is bad because it creates a new temporary bitset, and copies values all sorts of places just to throw them away.
template <size_t N>
bool any_both_new_temp(const std::bitset<N>& a, const std::bitset<N>& b)
{
return (a & b).any();
}
This solution is bad because it goes one bit at a time, which is less than ideal:
template <size_t N>
bool any_both_bit_by_bit(const std::bitset<N>& a, const std::bitset<N>& b)
{
for (size_t i = 0; i < N; ++i)
if (a[i] && b[i])
return true;
return false;
}
Ideally, I would be able to do something like this, where block_type is uint32_t or whatever type the bitset is storing:
template <size_t N>
bool any_both_by_block(const std::bitset<N>& a, const std::bitset<N>& b)
{
typedef std::bitset<N>::block_type block_type;
for (size_t i = 0; i < a.block_count(); ++i)
if (a.get_block(i) & b.get_block(i))
return true;
return false;
}
Is there an easy way to go about doing this?

I compiled your first example with optimization in g++ and it produced code identical to your third solution. In fact, with a smallish bitset (320 bits) it fully unrolled it. Without calling a function to ensure that the contents of a and b were unknown in main it actually optimized the entire thing away (knowing both were all 0).
Lesson: Write the obvious, readable code and let the compiler deal with it.

You say that your first approach "copies values all sorts of places just to throw them away." But there's really only one extra value-copy (when the result of operator& is returned to any_both_new_temp), and it can be eliminated by using a reference instead of a value:
template <size_t N>
bool any_both_new_temp(const std::bitset<N>& a, const std::bitset<N>& b)
{
std::bitset<N> tmp = a;
tmp &= b;
return tmp.any();
}
(But obviously it will still create a temporary bitset and copy a into it.)

Related

A compilation issue regarding constexpr expressions in Visual Studio 2017

I'm working on a rendering engine using Vulkan and Visual Studio 2017, and I bumped into the following type of problem recently.
I have a template struct template<uint32_t id> struct A;. This struct is defined (in separate header files) for id=0, ... , N-1. All of the definitions have a static constexpr std::array<B, M(id)> member for some struct B and number M depending on id. I have a constexpr function (and a helper function) which for a given value b of type B counts how many elements of all of these arrays equal to b. It looks something like this:
Helper function:
template<size_t Size>
constexpr void count_in_array(B b, const std::array<B, Size>& a, uint32_t& count)
{
for(auto& e : a)
{
if(e==b)
++count;
}
}
Main function:
template<uint32_t... ids>
constexpr uint32_t count_in_arrays(B b, std::index_sequence<ids...>)
{
uint32_t count=0;
auto l ={ (count_in_array(b, A<ids>::member, count), 0)... };
return count;
}
When I compile, I get a C1001 internal compiler error. The strange thing is that my funcions work, because if I use them to define a constexpr variable
constexpr uint32_t var=count_in_arrays(b, std::make_index_sequence<N>());
(for a constexpr B b),
and I hoover the mouse over that variable, I see the computed (and correct) number in the appearing rectangle.
I am not familiar with compiler switches, I only tried to use #pragma optimize("", on/off) around the above functions, but that didn't help. Does somebody have an idea how to make Visual Studio to compile my code?
Remark: I am pretty sure that the struct B is not important here, in my case, it is a simple data struct containing some built-in variables.
First, an internal compiler error is always a compiler bug. Please report this to MSVC.
Second, this implementation is a bit odd. When you write constexpr functions you want to think in a more functionally-oriented way - input-only arguments, output-only results. count_in_array should surely just return a number:
template <size_t Size>
constexpr uint32_t count_in_array(B b, const std::array<B, Size>& a)
{
uint32_t count = 0;
for(auto& e : a)
{
if(e==b)
++count;
}
return count;
}
This is a more reasonable implementation - count returns a count. Not only that, but it composes really nicely. How do you get all the counts? You sum them:
template <size_t... Ids>
constexpr uint32_t count_in_arrays(B b, std::index_sequence<Ids...>)
{
return (count_in_array(b, A<Ids>::member) + ...);
}
Much clearer.
Note that, while I think fold-expressions don't quite work in MSVC yet (though might soon?), that in of itself is not a reason to implement this differently. It just means that you need to manually sum - not that count_in_array() shouldn't return a count.

Building data structures at compile time with template-metaprogramming, constexpr or macros

I want to optimize a little programm/library i'm writing and since 2 weeks i'm somewhat stuck and now wondering if what i had in mind is even possible like that.
(Please be gentle i don't have very much experience in meta-programming.)
My goal is of course to have certain computations be done by the compiler, so that the programmer - hopefully - only has to edit code at one point in the program and have the compiler "create" all the boilerplate. I do have a resonably good idea how to do what i want with macros, but it is wished that i do it with templates if possible.
My goal is:
Lets say i have a class that a using programmer can derive from. There he can have multiple incoming and outgoing datatypes that i want to register somehow so that the base class can do i'ts operations on them.
class my_own_multiply : function_base {
in<int> a;
in<float> b;
out<double> c;
// ["..."] // other content of the class that actually does something but is irrelevant
register_ins<a, b> ins_of_function; // example meta-function calls
register_outs<c> outs_of_function;
}
The meta-code i have up till now is this: (but it's not jet working/complete)
template <typename... Ts>
struct register_ins {
const std::array<std::unique_ptr<in_type_erasured>, sizeof...(Ts)> ins;
constexpr std::array<std::unique_ptr<in_type_erasured>, sizeof...(Ts)>
build_ins_array() {
std::array<std::unique_ptr<in_type_erasured>, sizeof...(Ts)> ins_build;
for (unsigned int i = 0; i < sizeof...(Ts); ++i) {
ins_build[i] = std::make_unique<in_type_erasured>();
}
return ins_build;
}
constexpr register_ins() : ins(build_ins_array()) {
}
template <typename T>
T getValueOf(unsigned int in_nr) {
return ins[in_nr]->getValue();
}
};
As you may see, i want to call my meta-template-code with a variable number of ins. (Variable in the sens that the programmer can put however many he likes in there, but they won't change at runtime so they can be "baked" in at compile time)
The meta-code is supposed to be creating an array, that is of the lengt of the number of ins and is initialized so that every field points to the original in in the my_own_multiply class. Basically giving him an indexable data structure that will always have the correct size. And that i could access from the function_base class to use all ins for certain functions wich are also iterable making things convinient for me.
Now i have looked into how one might do that, but i now am getting the feeling that i might not really be allowed to "create" this array at compile time in a fashion that allows me to still have the ins a and b be non static and non const so that i can mutate them. From my side they wouldn't have to be const anyway, but my compliler seems to not like them to be free. The only thing i need const is the array with the pointers. But using constexpr possibly "makes" me make them const?
Okay, i will clarify what i don't get:
When i'm trying to create an "instance" of my meta-stuff-structure then it fails because it expects all kinds of const, constexpr and so on. But i don't want them since i need to be able to mutate most of those variables. I only need this meta-stuff to create an array of the correct size already at compile time. But i don't want to sacrifice having to make everything static and const in order to achive this. So is this even possible under these kinds of terms?
I do not get all the things you have in mind (also regarding that std::unique_ptr in your example), but maybe this helps:
Starting from C++14 (or C++11, but that is strictly limited) you may write constexpr functions which can be evaluated at compile-time. As a precondition (in simple words), all arguments "passed by the caller" must be constexpr. If you want to enforce that the compiler replaces that "call" by the result of a compile-time computation, you must assign the result to a constexpr.
Writing usual functions (just with constexpr added) allows to write code which is simple to read. Moreover, you can use the same code for both: compile-time computations and run-time computations.
C++17 example (similar things are possible in C++14, although some stuff from std is just missing the constexpr qualifier):
http://coliru.stacked-crooked.com/a/154e2dfcc41fb6c7
#include <cassert>
#include <array>
template<class T, std::size_t N>
constexpr std::array<T, N> multiply(
const std::array<T, N>& a,
const std::array<T, N>& b
) {
// may be evaluated in `constexpr` or in non-`constexpr` context
// ... in simple man's words this means:
// inside this function, `a` and `b` are not `constexpr`
// but the return can be used as `constexpr` if all arguments are `constexpr` for the "caller"
std::array<T, N> ret{};
for(size_t n=0; n<N; ++n) ret[n] = a[n] * b[n];
return ret;
}
int main() {
{// compile-time evaluation is possible if the input data is `constexpr`
constexpr auto a = std::array{2, 4, 6};
constexpr auto b = std::array{1, 2, 3};
constexpr auto c = multiply(a, b);// assigning to a `constexpr` guarantees compile-time evaluation
static_assert(c[0] == 2);
static_assert(c[1] == 8);
static_assert(c[2] == 18);
}
{// for run-time data, the same function can be used
auto a = std::array{2, 4, 6};
auto b = std::array{1, 2, 3};
auto c = multiply(a, b);
assert(c[0] == 2);
assert(c[1] == 8);
assert(c[2] == 18);
}
return 0;
}

How to reduce boilerplate for iterators?

Mainly as an exercise I am implementing a conversion from base B to base 10:
unsigned fromBaseB(std::vector<unsigned> x,unsigned b){
unsigned out = 0;
unsigned pow = 1;
for (size_t i=0;i<x.size();i++){
out += pow * x[i];
pow *= b;
}
return out;
}
int main() {
auto z = std::vector<unsigned>(9,0);
z[3] = 1;
std::cout << fromBaseB(z,3) << std::endl;
}
Now I would like to write this using algorithms. E.g. using accumulate I could write
unsigned fromBaseB2(std::vector<unsigned> x,unsigned b){
unsigned pow = 1;
return std::accumulate(x.begin(),
x.end(),0u,
[pow,b](unsigned sum,unsigned v) mutable {
unsigned out = pow*v;
pow *= b;
return out+sum;
});
}
However, imho thats not nicer code at all. Actually it would be more natural to write it as an inner product, because thats just what we have to calculate to make the basis transformation. But to use inner_product I need an iterator:
template <typename T> struct pow_iterator{
typedef T value_type;
pow_iterator(T base) : base(base),value(1) {}
T base,value;
pow_iterator& operator++(){ value *= base;return *this; }
T operator*() {return value; }
bool operator==(const pow_iterator& other) const { return value == other.value;}
};
unsigned fromBaseB3(std::vector<unsigned> x,unsigned b){
return std::inner_product(x.begin(),x.end(),pow_iterator<unsigned>(b),0u);
}
Using that iterator, now calling the algorithm is nice an clean, but I had to write a lot of boilerplate code for the iterator. Maybe it is just my misunderstanding of how algorithms and iterators are supposed to be used... Actually this is just an example of a general problem I am facing sometimes: I have a sequence of numbers that is calculated based on a simple pattern and I would like to have a iterator that when dereferenced returns the corresponding number from that sequence. When the sequence is stored in a container I simply use the iterators provided by the container, but I would like to do the same, also when there is no container where the values are stored. I could of course try to write my own generic iterator that does the job, but isnt there something existing in the standard library that can help here?
To me it feels a bit strange, that I can use a lambda to cheat accumulate into calculating an inner product, but to use inner_product directly I have to do something extra (either precalculate the powers and store them in a container, or write an iterator ie. a seperate class).
tl;dr: Is there a easy way to reduce the boilerplate for the pow_iterator above?
the more general (but maybe too broad) question: Is it "ok" to use an iterator for a sequence of values that is not stored in a container, but that is calculated only if the iterator is dereferenced? Is there a "C++ way" of implementing it?
As Richard Hodges wrote in the comments, you can look at boost::iterator. Alternatively, there is range-v3. If you go with boost, there are a few possible ways to go. The following shows how to do so with boost::iterator::counting_iterator and boost::iterator::transform_iterator (C++ 11):
#include <iostream>
#include <cmath>
#include <boost/iterator/counting_iterator.hpp>
#include <boost/iterator/transform_iterator.hpp>
int main() {
const std::size_t base = 2;
auto make_it = [](std::size_t i) {
return boost::make_transform_iterator(
boost::make_counting_iterator(i),
[](std::size_t j){return std::pow(base, j);});};
for(auto b = make_it(0); b != make_it(10); ++b)
std::cout << *b << std::endl;
}
Here's the output:
$ ./a.out
1
2
4
8
16
32
64
128
256
512

Returning container from function: optimizing speed and modern style

Not entirely a question, although just something I have been pondering on how to write such code more elegantly by style and at the same time fully making use of the new c++ standard etc. Here is the example
Returning Fibonacci sequence to a container upto N values (for those not mathematically inclined, this is just adding the previous two values with the first two values equal to 1. i.e. 1,1,2,3,5,8,13, ...)
example run from main:
std::vector<double> vec;
running_fibonacci_seq(vec,30000000);
1)
template <typename T, typename INT_TYPE>
void running_fibonacci_seq(T& coll, const INT_TYPE& N)
{
coll.resize(N);
coll[0] = 1;
if (N>1) {
coll[1] = 1;
for (auto pos = coll.begin()+2;
pos != coll.end();
++pos)
{
*pos = *(pos-1) + *(pos-2);
}
}
}
2) the same but using rvalue && instead of & 1.e.
void running_fibonacci_seq(T&& coll, const INT_TYPE& N)
EDIT: as noticed by the users who commented below, the rvalue and lvalue play no role in timing - the speeds were actually the same for reasons discussed in the comments
results for N = 30,000,000
Time taken for &:919.053ms
Time taken for &&: 800.046ms
Firstly I know this really isn't a question as such, but which of these or which is best modern c++ code? with the rvalue reference (&&) it appears that move semantics are in place and no unnecessary copies are being made which makes a small improvement on time (important for me due to future real-time application development). some specific ''questions'' are
a) passing a container (which was vector in my example) to a function as a parameter is NOT an elegant solution on how rvalue should really be used. is this fact true? if so how would rvalue really show it's light in the above example?
b) coll.resize(N); call and the N=1 case, is there a way to avoid these calls so the user is given a simple interface to only use the function without creating size of vector dynamically. Can template metaprogramming be of use here so the vector is allocated with a particular size at compile time? (i.e. running_fibonacci_seq<30000000>) since the numbers can be large is there any need to use template metaprogramming if so can we use this (link) also
c) Is there an even more elegant method? I have a feeling std::transform function could be used by using lambdas e.g.
void running_fibonacci_seq(T&& coll, const INT_TYPE& N)
{
coll.resize(N);
coll[0] = 1;
coll[1] = 1;
std::transform (coll.begin()+2,
coll.end(), // source
coll.begin(), // destination
[????](????) { // lambda as function object
return ????????;
});
}
[1] http://cpptruths.blogspot.co.uk/2011/07/want-speed-use-constexpr-meta.html
Due to "reference collapsing" this code does NOT use an rvalue reference, or move anything:
template <typename T, typename INT_TYPE>
void running_fibonacci_seq(T&& coll, const INT_TYPE& N);
running_fibonacci_seq(vec,30000000);
All of your questions (and the existing comments) become quite meaningless when you recognize this.
Obvious answer:
std::vector<double> running_fibonacci_seq(uint32_t N);
Why ?
Because of const-ness:
std::vector<double> const result = running_fibonacci_seq(....);
Because of easier invariants:
void running_fibonacci_seq(std::vector<double>& t, uint32_t N) {
// Oh, forgot to clear "t"!
t.push_back(1);
...
}
But what of speed ?
There is an optimization called Return Value Optimization that allows the compiler to omit the copy (and build the result directly in the caller's variable) in a number of cases. It is specifically allowed by the C++ Standard even when the copy/move constructors have side effects.
So, why passing "out" parameters ?
you can only have one return value (sigh)
you may wish the reuse the allocated resources (here the memory buffer of t)
Profile this:
#include <vector>
#include <cstddef>
#include <type_traits>
template <typename Container>
Container generate_fibbonacci_sequence(std::size_t N)
{
Container coll;
coll.resize(N);
coll[0] = 1;
if (N>1) {
coll[1] = 1;
for (auto pos = coll.begin()+2;
pos != coll.end();
++pos)
{
*pos = *(pos-1) + *(pos-2);
}
}
return coll;
}
struct fibbo_maker {
std::size_t N;
fibbo_maker(std::size_t n):N(n) {}
template<typename Container>
operator Container() const {
typedef typename std::remove_reference<Container>::type NRContainer;
typedef typename std::decay<NRContainer>::type VContainer;
return generate_fibbonacci_sequence<VContainer>(N);
}
};
fibbo_maker make_fibbonacci_sequence( std::size_t N ) {
return fibbo_maker(N);
}
int main() {
std::vector<double> tmp = make_fibbonacci_sequence(30000000);
}
the fibbo_maker stuff is just me being clever. But it lets me deduce the type of fibbo sequence you want without you having to repeat it.

Unordered (hash) map from bitset to bitset on boost

I want to use a cache, implemented by boost's unordered_map, from a dynamic_bitset to a dynamic_bitset. The problem, of course, is that there is no default hash function from the bitset. It doesn't seem to be like a conceptual problem, but I don't know how to work out the technicalities. How should I do that?
I found an unexpected solution. It turns out boost has an option to #define BOOST_DYNAMIC_BITSET_DONT_USE_FRIENDS. When this is defined, private members including m_bits become public (I think it's there to deal with old compilers or something).
So now I can use #KennyTM's answer, changed a bit:
namespace boost {
template <typename B, typename A>
std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {
return boost::hash_value(bs.m_bits);
}
}
There's to_block_range function that copies out the words that the bitset consists of into some buffer. To avoid actual copying, you could define your own "output iterator" that just processes individual words and computes hash from them. Re. how to compute hash: see e.g. the FNV hash function.
Unfortunately, the design of dynamic_bitset is IMHO, braindead because it does not give you direct access to the underlying buffer (not even as const).
It is a feature request.
One could implement a not-so-efficient unique hash by converting the bitset to a vector temporary:
namespace boost {
template <typename B, typename A>
std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {
std::vector<B, A> v;
boost::to_block_range(bs, std::back_inserter(v));
return boost::hash_value(v);
}
}
We can't directly calculate the hash because the underlying data in dynamic_bitset is private (m_bits)
But we can easily finesse past (subvert!) the c++ access specification system without either
hacking at the code or
pretending your compiler is non-conforming (BOOST_DYNAMIC_BITSET_DONT_USE_FRIENDS)
The key is the template function to_block_range which is a friend to dynamic_bitset. Specialisations of this function, therefore, also have access to its private data (i.e. m_bits).
The resulting code couldn't be simpler
namespace boost {
// specialise dynamic bitset for size_t& to return the hash of the underlying data
template <>
inline void
to_block_range(const dynamic_bitset<>& b, size_t& hash_result)
{
hash_result = boost::hash_value(bs.m_bits);
}
std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs)
{
size_t hash_result;
to_block_range(bs, hash_result);
return hash_result;
}
}
the proposed solution generates the same hash in the following situation.
#define BOOST_DYNAMIC_BITSET_DONT_USE_FRIENDS
namespace boost {
template <typename B, typename A>
std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {
return boost::hash_value(bs.m_bits);
}
}
boost::dynamic_biset<> test(1,false);
auto hash1 = boost::hash_value(test);
test.push_back(false);
auto hash2 = boost::hash_value(test);
// keep continue...
test.push_back(false);
auto hash31 = boost::hash_value(test);
// magically all hash1 to hash31 are the same!
the proposed solution is sometimes improper for hash map.
I read the source code of dynamic_bitset why this happened and realized that dynamic_bitset stores one bit per value as same as vector<bool>. For example, you call dynamic_bitset<> test(1, false), then dynamic_bitset initially allocates 4 bytes with all zero and it holds the size of bits (in this case, size is 1). Note that if the size of bits becomes greater than 32, then it allocates 4 bytes again and push it back into dynamic_bitsets<>::m_bits (so m_bits is a vector of 4 byte-blocks).
If I call test.push_back(x), it sets the second bit to x and increases the size of bits to 2. If x is false, then m_bits[0] does not change at all! In order to correctly compute hash, we need to take m_num_bits in hash computation.
Then, the question is how?
1: Use boost::hash_combine
This approach is simple and straight forward. I did not check this compile or not.
namespace boost {
template <typename B, typename A>
std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {
size_t tmp = 0;
boost::hash_combine(tmp,bs.m_num_bits);
boost::hash_combine(tmp,bs.m_bits);
return tmp;
}
}
2: flip m_num_bits % bits_per_block th bit.
flip a bit based on bit size. I believe this approach is faster than 1.
namespace boost {
template <typename B, typename A>
std::size_t hash_value(const boost::dynamic_bitset<B, A>& bs) {
// you may need more sophisticated bit shift approach.
auto bit = 1u << (bs.m_num_bits % bs.bits_per_block);
auto return_val = boost::hash_value(bs.m_bits);
// sorry this was wrong
//return (return_val & bit) ? return_val | bit : return_val & (~bit);
return (return_val & bit) ? return_val & (~bit) : return_val | bit;
}
}