How to query if any bit in a range is set in a C++ std::bitset? - c++

I am looking for a C++ bitset implementation that can answer if a bit is set in a range. std::bitset, vector, and boost::dynamic_bitset all give access to individual bits that I can loop over, but that isn't the most efficient way to query a range of bits to ask if any bit is set- I don't even need to know which one.
bitset b;
if(b.any(33, 199))
{
// ...
}
Is there a library that provides this? I would like to run some benchmarks against other implementations (including one I may have to write), but I can't find any that appear to implement this functionality.

Unfortunately in C++11 bitset it is not possible to set a range of bits to the given value by just specifying the boundaries of the range. Iterating over individual bits seems the most we can do. It is also not possible to check if all bits inside the range are set to the same value (1,0).
There is an open source project in Git that provides an alternative implementation of the BitSet (RangedBitset) that supports these operations. It uses an array of uint_64t_ words of any size internally, but can also handle ranges specified with the accuracy of the single bit. There you can do the things like
a.set(4, 8, true); // set the range [ 4 .. 8 [ to true
bool is_all_range = a.check(2, 6, true); // check if all range is set to 1.

To check if some bit is set in the range [x,y] in bitset, you can use bs._Find_next(x-1). It returns the next set bit after the position x-1.
Then you can check if the returned value is <=y or not.
bool Find_if_bitset_has_any_set_bit_in_range(bitset<M> &bs, int x, int y){
if(bs._Find_next(x-1)<=y) return 1; //TRUE
return 0; //FALSE
}

C++11's bitset provides the any() method that you are after, but if that isn't an option then just use b.to_ulong() and check for non zero.

Related

SSE optimisation for a loop that finds zeros in an array and toggles a flag + updates another array

A piece of C++ code determines the occurances of zero and keeps a binary flag variable for each number that is checked. The value of the flag toggles between 0 and 1 each time a zero is encountered in a 1 dimensional array.
I am attempting to use SSE to speed it up, but I am unsure of how to go about this. Evaluating the individual fields of __m128i is inefficient, I've read.
The code in C++ is:
int flag = 0;
int var_num2[1000];
for(int i = 0; i<1000; i++)
{
if (var[i] == 0)
{
var_num2[i] = flag;
flag = !flag; //toggle value upon encountering a 0
}
}
How should I go about this using SSE intrinsics?
You'd have to recognize the problem, but this is a variation of a well-known problem. I'll first give a theoretical description
Introduce a temporary array not_var[] which contains 1 if var contains 0 and 0 otherwise.
Introduce a temporary array not_var_sum[] which holds the partial sum of not_var.
var_num2 is now the LSB of not_var_sum[]
The first and third operation are trivially parallelizable. Parallelizing a partial sum is only a bit harder.
In a practical implementation, you wouldn't construct not_var[], and you'd write the LSB directly to var_num2 in all iterations of step 2. This is valid because you can discard the higher bits. Keeping just the LSB is equivalent to taking the result modulo 2, and (a+b)%2 == ((a%2) + (b%2))%s.
What type are the elements of var[]? int? Or char? Are zeroes frequent?
A SIMD prefix sum aka partial is possible (with log2(vector_width) work per element, e.g. 2 shuffles and 2 adds for a vector of 4 float), but the conditional-store based on the result is the other major problem. (Your array of 1000 elements is probably too small for multi-threading to be profitable.)
An integer prefix-sum is easier to do efficiently, and the lower latency of integer ops helps. NOT is just adding without carry, i.e. XOR, so use _mm_xor_si128 instead of _mm_add_ps. (You'd be using this on the integer all-zero/all-one compare result vector from _mm_cmpeq_epi32 (or epi8 or whatever, depending on the element size of var[]. You didn't specify, but different choices of strategy are probably optimal for different sizes).
But, just having a SIMD prefix sum actually barely helps: you'd still have to loop through and figure out where to store and where to leave unmodified.
I think your best bet is to generate a list of indices where you need to store, and then
for (size_t j = 0 ; j < scatter_count ; j+=2) {
var_num2[ scatter_element[j+0] ] = 0;
var_num2[ scatter_element[j+1] ] = 1;
}
You could generate the whole list if indices up-front, or you could work in small batches to overlap the search work with the store work.
The prefix-sum part of the problem is handled by alternately storing 0 and 1 in an unrolled loop. The real trick is avoiding branch mispredicts, and generating the indices efficiently.
To generate scatter_element[], you've transformed the problem into left-packing (filtering) an (implicit) array of indices based on the corresponding _mm_cmpeq_epi32( var[i..i+3], _mm_setzero_epi32() ). To generate the indices you're filtering, start with a vector of [0,1,2,3] and add [4,4,4,4] to it (_mm_add_epi32). I'm assuming the element size of var[] is 32 bits. If you have smaller elements, this require unpacking.
BTW, AVX512 has scatter instructions which you could use here, otherwise doing the store part with scalar code is your best bet. (But beware of Unexpectedly poor and weirdly bimodal performance for store loop on Intel Skylake when just storing without loading.)
To overlap the left-packing with the storing, I think you want to left-pack until you have maybe 64 indices in a buffer. Then leave that loop and run another loop that left-packs indices and consumes indices, only stopping if your circular buffer is full (then just store) or empty (then just left-pack). This lets you overlap the vector compare / lookup-table work with the scatter-store work, but without too much unpredictable branching.
If zeros are very frequent, and var_num2[] elements are 32 or 64 bits, and you have AVX or AVX2 available, you could consider doing an standard prefix sum and using AVX masked stores. e.g. vpmaskmovd. Don't use SSE maskmovdqu, though: it has an NT hint, so it bypasses and evicts data from cache, and is quite slow.
Also, because your prefix sum is mod 2, i.e. boolean, you could use a lookup table based on the packed-compare result mask. Instead of horizontal ops with shuffles, use the 4-bit movmskps result of a compare + a 5th bit for the initial state as an index to a lookup table of 32 vectors (assuming 32-bit element size for var[]).

C++: I need some guidance in how to create dynamic sized bitmaps

I'm trying to create a simple DBMS and although I've read a lot about it and have already designed the system, I have some issues about the implementation.
I need to know what's the best method in C++ to use a series of bits whose length will be dynamic. This series of bits will be saved in order to figure out which pages in the files are free and not free. For a single file the number of pages used will be fixed, so I can probably use a bitset for that. However the number of records per page AND file will not be fixed. So I don't think bitset would be the best way to do this.
I thought maybe to just use a sequence of characters, since each character is 1 byte = 8 bits maybe if I use an array of them I would be able to create the bit map that I want.
I never had to manipulate bits at such a low level, so I don't really know if there is some other better method to do this, or even if this method would work at all.
thanks in advance
If you are just wanting the basics on the bit twiddling, the following is one way of doing it using an array of characters.
Assume you have an array for the bits (the length needs to be (totalitems / 8 )):
unsigned char *bits; // this of course needs to be allocated somewhere
You can compute the index into the array and the specific bit within that position as follows:
// compute array position
int pos = item / 8; // 8 bits per byte
// compute the bit within the byte. Could use "item & 7" for the same
// result, however modern compilers will typically already make
// that optimization.
int bit = item % 8;
And then you can check if a bit is set with the following (assumes zero-based indexing):
if ( bits[pos] & ( 1 << bit ))
return 1; // it is set
else
return 0; // it is not set
The following will set a specific bit:
bits[pos] |= ( 1 << bit );
And the following can be used to clear a specific bit:
bits[pos] &= ~( 1 << bit );
I would implement a wrapper class and simply store your bitmap in a linked list of chunks where each chunk would hold a fixed size array (I would use a stdint type like uint32_t to ensure a given number of bits) then you simply add links to your list to expand. I'll leave contracting as an exercise to the reader.

Keeping track of boolean data

I need to keep track of n samples. The information I am keeping track of is of boolean type, i.e. something is true or false. As soon as I am on sample n+1, i basically want to ignore the oldest sample and record information about the newest one.
So say I keep track of samples, I may have something like
OLDEST 0 0 1 1 0 NEWEST
If the next sample is 1, this will become
OLDEST 0 1 1 0 1 NEWEST
if the next one is 0, this will become...
OLDEST 1 1 0 1 0 NEWEST
So what is the best way to implement this in terms of simplicity and memory?
Some ideas I had:
Vector of bool (this would require shifting elements so seems expensive)
Storing it as bits...and using bit shifting (memorywise --cheap? but is there a limit on the number of samples?)
Linked lists? (might be an overkill for the task)
Thanks for the ideas and suggestions :)
You want a set of bits. Maybe you can look into a std::bitset
http://www.sgi.com/tech/stl/bitset.html
Very straightfoward to use, optimal memory consumption and probably the best performance
The only limitation is that you need to know at compile-time the value of n. If you want to set it on runtime, have a look at boost http://www.boost.org/doc/libs/1_36_0/libs/dynamic_bitset/dynamic_bitset.html
Sounds like a perfect use of a ring buffer. Unfortunately there isn't one in the standard library, but you could use boost.
Alternately roll your own using a fixed-length std::list and splice the head node to the tail when you need to overwrite an old element.
It really depends on how many samples you want to keep.
vector<bool> could be a valid option; I would expect an
erase() on the first element to be reasonably efficient.
Otherwise, there's deque<bool>. If you know how many elements
you want to keep at compile time, bitset<N> is probably better
than either.
In any case, you'll have to wrap the standard container in some
additional logic; none have the actual logic you need (that of
a ring buffer).
If you only need 8 bits... then use a char and do logical shifts "<<, >>" and do a mask to look at the one you need.
16 Bits - short
32 Bits - int
64 Bits - long
etc...
Example:
Oldest 00110010 Newest -> Oldest 1001100101 Newest
Done by:
char c = 0x32; // 50 decimal or 00110010 in binary
c<<1; // Logical shift left once.
c++; // Add one, sense LSB is the newest.
//Now look at the 3rd newest bit
print("The 3rd newest bit is: %d\n", (c & 0x4));
Simple and EXTREMELY cheap on resources. Will be VERY VERY high performance.
From your question, it's not clear what you intend to do with the samples. If all you care about is storing the N most recent samples, you could try the following. I'll do it for "chars" and let you figure out how to optimize for "bool" should you need that.
char buffer[N];
int samples = 0;
void record_sample( char value )
{
buffer[samples%N] = value;
samples = samples + 1;
}
Once you've stored N samples (once you've called record_sample N times) you can read the oldest and newest samples like so:
char oldest_sample()
{
return buffer[samples%N];
}
char newest_sample()
{
return buffer[(samples+N-1)%N];
}
Things get a little trickier if you intend to read the oldest sample before you've already stored N samples - but not that much trickier. For that, you want a "ring buffer" which you can find in boost and on wikipedia.

Iterating through a boost::dynamic_bitset

I have a boost dynamic_bitset that I am trying to extract the set bits from:
boost::dynamic_bitset<unsigned long> myBitset(1000);
My first thought was to do a simple 'dump' loop through each index and ask if it was set:
for(size_t index = 0 ; index < 1000 ; ++index)
{
if(myBitset.test(index))
{
/* do something */
}
}
But then I saw two interesting methods, find_first() and find_next() that I thought for sure were meant for this purpose:
size_t index = myBitset.find_first();
while(index != boost::dynamic_bitset::npos)
{
/* do something */
index = myBitset.find_next(index);
}
I ran some tests and it seems like the second method is more efficient, but this concerns me that there might be another 'more correct' way to perform this iteration. I wasn't able to find any examples or notes in the documentation indicating the correct way to iterate over the set bits.
So, is using find_first() and find_next() the best way to iterate over a dynamic_bitset, or is there another way?
find_first and find_next are the fastest way. The reason is that these can skip over an entire block (of dynamic_bitset::bits_per_block bits, probably 32 or 64) if none of them are set.
Note that dynamic_bitset does not have iterators, so it will behave a bit un-C++'ish no matter what.
Depends on your definition of more correct. A correct method probably must yield correct results on all valid inputs and be fast enough.
find_first and find_next are there so that they can be optimized to scan entire blocks of bits in one comparison. If a block is, say, an unsigned long of 64 bits, one block comparison analyses 64 bits at once, where a straightforward loop like you posted would do 64 iterations for that.

How do you use bitwise flags in C++?

As per this website, I wish to represent a Maze with a 2 dimensional array of 16 bit integers.
Each 16 bit integer needs to hold the following information:
Here's one way to do it (this is by no means the only way): a 12x16 maze grid can be represented as an array m[16][12] of 16-bit integers. Each array element would contains all the information for a single corresponding cell in the grid, with the integer bits mapped like this:
(source: mazeworks.com)
To knock down a wall, set a border, or create a particular path, all we need to do is flip bits in one or two array elements.
How do I use bitwise flags on 16 bit integers so I can set each one of those bits and check if they are set.
I'd like to do it in an easily readable way (ie, Border.W, Border.E, Walls.N, etc).
How is this generally done in C++? Do I use hexidecimal to represent each one (ie, Walls.N = 0x02, Walls.E = 0x04, etc)? Should I use an enum?
See also How do you set, clear, and toggle a single bit?.
If you want to use bitfields then this is an easy way:
typedef struct MAZENODE
{
bool backtrack_north:1;
bool backtrack_south:1;
bool backtrack_east:1;
bool backtrack_west:1;
bool solution_north:1;
bool solution_south:1;
bool solution_east:1;
bool solution_west:1;
bool maze_north:1;
bool maze_south:1;
bool maze_east:1;
bool maze_west:1;
bool walls_north:1;
bool walls_south:1;
bool walls_east:1;
bool walls_west:1;
};
Then your code can just test each one for true or false.
Use std::bitset
Use hex constants/enums and bitwise operations if you care about which particular bits mean what.
Otherwise, use C++ bitfields (but be aware that the ordering of bits in the integer will be compiler-dependent).
Learn your bitwise opertors: &, |, ^, and !.
At the top of a lot of C/C++ files I have seen flags defined in hex to mask each bit.
#define ONE 0x0001
To see if a bit is turned on, you AND it with 1. To turn it on, you OR it with 1. To toggle like a switch, XOR it with 1.
To manipulate sets of bits, you can also use ....
std::bitset<N>
std::bitset<4*4> bits;
bits[ 10 ] = false;
bits.set(10);
bits.flip();
assert( !bits.test(10) );
You can do it with hexadecimal flags or enums as you suggested, but the most readable/self-documenting is probably to use what are called "bitfields" (for details, Google for C++ bitfields).
Yes a good way is to use hex decimal to represent the bit patterns. Then you use the bitwise operators to manipulate your 16-bit ints.
For example:
if(x & 0x01){} // tests if bit 0 is set using bitwise AND
x ^= 0x02; // toggles bit 1 (0 based) using bitwise XOR
x |= 0x10; // sets bit 4 (0 based) using bitwise OR
I'm not a huge fan of bitset. It's just more typing in my opinion. And it doesn't hide what you are doing anyway. You still have to & && | bits. Unless you are picking on just 1 bit. That may work for small groups of flags. Not that we need to hide what we are doing either. But the intention of a class is usually to make something easier for it's users. I don't think this class accomplishes it.
Say for instance, you have a flag system with .. 64 flags. If you want to test.. I don't know.. 39 of them in 1 if statement to see if they are all on... using bitfields is a huge pain. You have to type them all out.. Course. I'm making the assumption you use only bitfields functionality and not mix and match methods. Same thing with bitset. Unless I am missing something with the class.. which is quite possible since I rarely use it.. I don't see a way you can test all 39 flags unless you type out the hole thing or resort to "standard methods" (using enum flag lists or some defined value for 39 bits and using the bitsets && operator). This can start to get messy depending on your approach. And I know.. 64 flags sounds like a lot. And well. It is.. depending on what you are doing. Personally speaking, most of the projects I'm involved with depend on flag systems. So actually.. 64 is not that unheard of. Though 16~32 is far more common in my experience. I'm actually helping out in a project right now where one flag system has 640 bits. It's basically a privilege system. So it makes some sense to arrange them all together... However.. admittedly.. I would like to break that up a bit.. but.. eh... I'm helping.. not creating.