I am trying to write a function that counts some bit flags while avoiding the use of branching or conditionals:
uint8_t count_descriptors(uint8_t n)
{
return
((n & 2) && !(n & 1)) +
((n & 4) && !(n & 1)) +
((n & 8) && !(n & 1)) +
((n & 16) && !(n & 1)) +
((n & 32) && 1 ) +
((n & 64) || (n & 128)) ;
}
Bit zero is not directly counted, but bits 1-4 are only considered if bit 0 is not set, bit 5 is considered unconditionally, bit 6-7 can only counted once.
However, I understand that the boolean && and || use short-circuit evaluation. This means that their use creates a conditional branch, as you would see in such examples: if( ptr != nullptr && ptr->predicate()) that guarantees code in the second sub-expression is not executed if the result is short-circuit evaluated from the first sub-expression.
The first part of the question: do I need to do anything? Since these are purely arithmetic operations with no side-effects, will the compiler create conditional branches?
Second part: I understand that bitwise boolean operators do not short-circuit evaluate, but the only problem the bits do not line up. The result of masking the nth bit is either 2^n or zero.
What is the best way to make an expression such as (n & 16) evaluate to 1 or 0?
I assume with "bit 6-7 can only counted once" you mean only one of them is being counted
In this case something like this should work
uint8_t count_descriptors(uint8_t n)
{
uint8_t retVar;
retVar = (n&1)*(n&2 >> 1) +
(n&1)*(n&4 >> 2) +
(n&1)*(n&8 >> 3) +
(n&1)*(n&16 >> 4) +
(n&32 >> 5) +
(int)((n&64 >> 6) + (n&128 >> 7) >= 1)
return retVar;
}
What is the best way to make an expression such as (n & 16) evaluate
to 1 or 0?
By shifting it right the required number of bits: either (n>>4)&1 or (n&16)>>4.
I'd probably use a lookup table, either for all 256 values, or at least for the group of 4.
nbits[16]={0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4};
//count bits 1..4 iff bit 0 is 0, bit 5 always, and bit 6 or 7
return (!(n&1) * nbits[(n>>1)&0xF]) + ((n>>5)&1) + (((n>>6)|(n>>7))&1)
I think the cleanest way to convert (n & 16) into 0 or 1 is to just use int(bool(n & 16)). The cast to int can be dropped if you are using them in an arithmetic expression (like bool(n & 2) + bool(n & 4)).
For your function of counting bits set I would recommend using the popcount intrinsic function, available as __builtin_popcount on gcc and __popcnt on MSVC. Below is my understanding of the function you described, changed to use popcount.
f(n & 1)
{
//clear out first 4 bits and anything > 255
n &= 0xF0;
}
else
{
//clear out anything > 255 and bottom bit is already clear
n &= 0xFF;
}
return __builtin_popcount(n); //intrinsic function to count the set bits in a number
This doesn't quite match the function you wrote, but hopefully from here you get the idea.
Related
I've a 8-digit BCD number and need to check it out to see if it is a valid BCD number. How can I programmatically (C/C++) make this?
Ex: 0x12345678 is valid, but 0x00f00abc isn't.
Thanks in advance!
You need to check each 4-bit quantity to make sure it's less than 10. For efficiency you want to work on as many bits as you can at a single time.
Here I break the digits apart to leave a zero between each one, then add 6 to each and check for overflow.
uint32_t highs = (value & 0xf0f0f0f0) >> 4;
uint32_t lows = value & 0x0f0f0f0f;
bool invalid = (((highs + 0x06060606) | (lows + 0x06060606)) & 0xf0f0f0f0) != 0;
Edit: actually we can do slightly better. It doesn't take 4 bits to detect overflow, only 1. If we divide all the digits by 2, it frees a bit and we can check all the digits at once.
uint32_t halfdigits = (value >> 1) & 0x77777777;
bool invalid = ((halfdigits + 0x33333333) & 0x88888888) != 0;
The obvious way to do this is:
/* returns 1 if x is valid BCD */
int
isvalidbcd (uint32_t x)
{
for (; x; x = x>>4)
{
if ((x & 0xf) >= 0xa)
return 0;
}
return 1;
}
This link tells you all about BCD, and recommends something like this asa more optimised solution (reworking to check all the digits, and hence using a 64 bit data type, and untested):
/* returns 1 if x is valid BCD */
int
isvalidbcd (uint32_t x)
{
return !!(((uint64_t)x + 0x66666666ULL) ^ (uint64_t)x) & 0x111111110ULL;
}
For a digit to be invalid, it needs to be 10-15. That in turn means 8 + 4 or 8+2 - the low bit doesn't matter at all.
So:
long mask8 = value & 0x88888888;
long mask4 = value & 0x44444444;
long mask2 = value & 0x22222222;
return ((mask8 >> 2) & ((mask4 >>1) | mask2) == 0;
Slightly less obvious:
long mask8 = (value>>2);
long mask42 = (value | (value>>1);
return (mask8 & mask42 & 0x22222222) == 0;
By shifting before masking, we don't need 3 different masks.
Inspired by #Mark Ransom
bool invalid = (0x88888888 & (((value & 0xEEEEEEEE) >> 1) + (0x66666666 >> 1))) != 0;
// or
bool valid = !((((value & 0xEEEEEEEEu) >> 1) + 0x33333333) & 0x88888888);
Mask off each BCD digit's 1's place, shift right, then add 6 and check for BCD digit overflow.
How this works:
By adding +6 to each digit, we look for an overflow * of the 4-digit sum.
abcd
+ 110
-----
*efgd
But the bit value of d does not contribute to the sum, so first mask off that bit and shift right. Now the overflow bit is in the 8's place. This all is done in parallel and we mask these carry bits with 0x88888888 and test if any are set.
0abc
+ 11
-----
*efg
Using bitwise operator how can I test if the n least significant bits of an integer are either all sets or all not sets.
For example if n = 3 I only care about the 3 least significant bits the test should return true for 0 and 7 and false for all other values between 0 and 7.
Of course I could do if x = 0 or x = 7, but I would prefer something using bitwise operators.
Bonus points if the technique can be adapted to take into accounts all the bits defined by a mask.
Clarification :
If I wanted to test if bit one or two is set I could to if ((x & 1 != 0) && (x & 2 != 0)). But I could do the "more efficient" if ((x & 3) != 0).
I'm trying to find a "hack" like this to answer the question "Are all bits of x that match this mask all set or all unset?"
The easy way is if ((x & mask) == 0 || (x & mask) == mask). I'd like to find a way to do this in a single test without the || operator.
Using bitwise operator how can I test if the n least significant bits of an integer are either all sets or all not sets.
To get a mask for the last n significant bits, thats
(1ULL << n) - 1
So the simple test is:
bool test_all_or_none(uint64_t val, uint64_t n)
{
uint64_t mask = (1ULL << n) - 1;
val &= mask;
return val == mask || val == 0;
}
If you want to avoid the ||, we'll have to take advantage of integer overflow. For the cases we want, after the &, val is either 0 or (let's say n == 8) 0xff. So val - 1 is either 0xffffffffffffffff or 0xfe. The failure causes are 1 thru 0xfe, which become 0 through 0xfd. Thus the success cases are call at least 0xfe, which is mask - 1:
bool test_all_or_none(uint64_t val, uint64_t n)
{
uint64_t mask = (1ULL << n) - 1;
val &= mask;
return (val - 1) >= (mask - 1);
}
We can also test by adding 1 instead of subtracting 1, which is probably the best solution (here once we add one to val, val & mask should become either 0 or 1 for our success cases):
bool test_all_or_none(uint64_t val, uint64_t n)
{
uint64_t mask = (1ULL << n) - 1;
return ((val + 1) & mask) <= 1;
}
For an arbitrary mask, the subtraction method works for the same reason that it worked for the specific mask case: the 0 flips to be the largest possible value:
bool test_all_or_none(uint64_t val, uint64_t mask)
{
return ((val & mask) - 1) >= (mask - 1);
}
How about?
int mask = (1<<n)-1;
if ((x&mask)==mask || (x&mask)==0) { /*do whatever*/ }
The only really tricky part is the calculation of the mask. It basically just shifts a 1 over to get 0b0...0100...0 and then subtracts one to make it 0b0...0011...1.
Maybe you can clarify what you wanted for the test?
Here's what you wanted to do, in one function (untested, but you should get the idea). Returns 0 if the n last bits are not set, 1 if they are all set, -1 otherwise.
int lastBitsSet(int num, int n){
int mask = (1 << n) - 1; //n 1-s
if (!(num & mask)) //we got all 0-s
return 0;
if (!(~num & mask)) //we got all 1-s
return 1;
else
return -1;
}
To test if all aren't set, you just need to mask-in only the bits you want, then you just need to compare to zero.
The fun starts when you define the oposite function by just inverting the input :)
//Test if the n least significant bits arent set:
char n_least_arent_set(unsigned int n, unsigned int value){
unsigned int mask = pow(2, n) - 1; // e. g. 2^3 - 1 = b111
int masked_value = value & mask;
return masked_value == 0; // if all are zero, the mask operation returns a full-zero.
}
//test if the n least significant bits are set:
char n_least_are_set(unsigned int n, unsigned int value){
unsigned int rev_value = ~value;
return n_least_arent_set(n, rev_value);
}
I am doing bitwise & between two bit arrays saving the result in old_array and I want to get rid of the if/else statement. I should probably make use of the BIT_STATE macro, but how?
#define BYTE_POS(pos) (pos / CHAR_BIT)
#define BIT_POS(pos) (1 << (CHAR_BIT - 1 - (pos % CHAR_BIT)))
#define BIT_STATE(pos, state) (state << (CHAR_BIT - 1 - (pos % CHAR_BIT)))
if (((old_array[BYTE_POS(old_pos)] & BIT_POS(old_pos)) != 0) &&
((new_array[BYTE_POS(new_pos)] & BIT_POS(new_pos)) != 0))
{
old_array[BYTE_POS(old_pos)] |= BIT_POS(old_pos);
}
else
{
old_array[BYTE_POS(old_pos)] &= ~(BIT_POS(old_pos));
}
You can always calculate both results and then combine it. The biggest problem is to compute a fitting bitmask.
E.g.
const uint32_t a = 41,
uint32_t b = 8;
const uint32_t mask[2] = { 0, 0xffffffff };
const uint32_t result = (a&mask[condition])
| (b&mask[!condition]);
or to avoid the unary not
const uint32_t mask_a[2] = { 0, 0xffffffff },
mask_b[2] = { mask_a[1], mask_a[0] };
const uint32_t result = (a&mask_a[condition])
| (b&mask_b[condition]);
However: When doing bitwise manipulations, always be careful with the number of bits involved. One way to be careful is fixed size types like uint32_t, who may or may not be defined on your platform (but if not, the good thing is you get a compile error), or use templates carefully. Other types, including char, int and even bool can have any size beyond some defined minimum.
Yes, such code looks somewhat ugly.
I don't think BIT_STATE is useful here. (State MUST BE 0 or 1 to work as expected)
I see following approaches to get rid of them
a) Use C++ bitfields
For example
http://en.wikipedia.org/wiki/Bit_field
b)
"Hide" that code in a class/method/function
c)
I think this is equivalent to your code
if ((new_array[BYTE_POS(new_pos)] & BIT_POS(new_pos)) == 0))
{
old_array[BYTE_POS(old_pos)] &= ~(BIT_POS(old_pos));
}
or as inliner
old_array[BYTE_POS(old_pos)] &=
~((new_array[BYTE_POS(new_pos)] & BIT_POS(new_pos)) ? 0 : BIT_POS(old_pos));
Take the expression
(new_array[BYTE_POS(new_pos)] & BIT_POS(new_pos))
which is either 0 or has 1 in bit BIT_POS(new_pos) and shift it until the bit, if set is in BIT_POS( old_pos )
(new_array[BYTE_POS(new_pos)] & BIT_POS(new_pos)) << ( old_pos - new_pos )
now and the result with old_array[BYTE_POS(old_pos)]
old_array[BYTE_POS(old_pos)] &= old_array[BYTE_POS(old_pos)]
THe only trick is that it is implementation dependent (at least it used to be) what happens if you shift by a negative amount. So if you already know whether old_pos is greater or less than new_pos you can substitute >> ( new_pos - old_pos ) when appropriate.
I've not tried this out. I may have << and >> swapped.
I was reading one question on the blog and the solution of the question was to check whether 1 to n bits in 'k' are set or not.
For ex.
k = 3 and n = 2; then "True" since 1st and 2nd bit are set in k
k = 3 and n = 3; then "False" since 3rd bit in k is not set
The solution as provided by the author is:
if (((1 << (n-1)) ^ (k & ((1 << n)-1))) == ((1 << (n-1))-1))
std::cout<<"true"<<std::endl;
else
std::cout<<"false"<<std::endl;
I am not sure what's going on here.
Could someone please help me understand this?
If you draw out the binary representations on pen and paper, you'll see that (1 << (n-1)) always sets a single bit to 1 (the n-th bit), whereas (1 << n) - 1 sets the first n bits.
These are bitmasks; they're being used to manipulate certain sections of the input (k) via bitwise operations (&, | and ^).
Note
I think the example is needlessly complicated. This should be sufficient:
if ((k & ((1 << n) - 1)) == ((1 << n) - 1))
...
Or to make it even cleaner:
unsigned int mask = (1 << n) - 1;
if ((k & mask) == mask)
...
(assuming that k is of type unsigned int).
I am using an unsigned char to store 8 flags. Each flag represents the corner of a cube. So 00000001 will be corner 1 01000100 will be corners 3 and 7 etc. My current solution is to & the result with 1,2,4,8,16,32,64 and 128, check whether the result is not zero and store the corner. That is, if (result & 1) corners.push_back(1);. Any chance I can get rid of that 'if' statement? I was hoping I could get rid of it with bitwise operators but I could not think of any.
A little background on why I want to get rid of the if statement. This cube is actually a Voxel which is part of a grid that is at least 512x512x512 in size. That is more than 134 million Voxels. I am performing calculations on each one of the Voxels (well, not exactly, but I won't go into too much detail as it is irrelevant here) and that is a lot of calculations. And I need to perform these calculations per frame. Any speed boost that is minuscule per function call will help with these amount of calculations. To give you an idea, my algorithm (at some point) needed to determine whether a float was negative, positive or zero (within some error). I had if statements in there and greater/smaller than checks. I replaced that with a fast float to int function and shaved of a quarter of a second. Currently, each frame in a 128x128x128 grid takes a little more than 4 seconds.
I would consider a different approach to it entirely: there are only 256 possibilities for different combinations of flags. Precalculate 256 vectors and index into them as needed.
std::vector<std::vector<int> > corners(256);
for (int i = 0; i < 256; ++i) {
std::vector<int>& v = corners[i];
if (i & 1) v.push_back(1);
if (i & 2) v.push_back(2);
if (i & 4) v.push_back(4);
if (i & 8) v.push_back(8);
if (i & 16) v.push_back(16);
if (i & 32) v.push_back(32);
if (i & 64) v.push_back(64);
if (i & 128) v.push_back(128);
}
for (int i = 0; i < NumVoxels(); ++i) {
unsigned char flags = GetFlags(i);
const std::vector& v = corners[flags];
... // do whatever with v
}
This would avoid all the conditionals and having push_back call new which I suspect would be more expensive anyway.
If there's some operation that needs to be done if the bit is set and not if it's not, it seems you'll have to have a conditional of some kind somewhere. If it could be expressed as a calculation somehow, you could get around it like this, for example:
numCorners = ((result >> 0) & 1) + ((result >> 1) & 1) + ((result >> 2) & 1) + ...
Hackers's Delight, first page:
x & (-x) // isolates the lowest set bit
x & (x - 1) // clears the lowest set bit
Inlining your push_back method would also help (better create a function that receives all the flags together).
Usually if you need performance, you should design the whole system with that in mind. Maybe if you post more code it will be easier to help.
EDIT: here is a nice idea:
unsigned char LOG2_LUT[256] = {...};
int t;
switch (count_set_bits(flags)){
case 8: t = flags;
flags &= (flags - 1); // clearing a bit that was set
t ^= flags; // getting the changed bit
corners.push_back(LOG2_LUT[t]);
case 7: t = flags;
flags &= (flags - 1);
t ^= flags;
corners.push_back(LOG2_LUT[t]);
case 6: t = flags;
flags &= (flags - 1);
t ^= flags;
corners.push_back(LOG2_LUT[t]);
// etc...
};
count_set_bits() is a very known function: http://www-graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
There is a way, it's not "pretty", but it works.
(result & 1) && corners.push_back(1);
(result & 2) && corners.push_back(2);
(result & 4) && corners.push_back(3);
(result & 8) && corners.push_back(4);
(result & 16) && corners.push_back(5);
(result & 32) && corners.push_back(6);
(result & 64) && corners.push_back(7);
(result & 128) && corners.push_back(8);
it uses a seldom known feature of the C++ language: the boolean shortcut.
I've noted a similar algorithm in the OpenTTD code. It turned out to be utterly useless: you're faster off by not breaking down numbers like that. Instead, replace the iteration over the vector<> you have now by an iteration over the bits of the byte. This is far more cache-friendly.
I.e.
unsigned char flags = Foo(); // the value you didn't put in a vector<>
for (unsigned char c = (UCHAR_MAX >> 1) + 1; c !=0 ; c >>= 1)
{
if (flags & c)
Bar(flags&c);
}