Find "edges" in 32 bits word bitpattern - c++

Im trying to find the most efficient algorithm to count "edges" in a bit-pattern. An edge meaning a change from 0 to 1 or 1 to 0. I am sampling each bit every 250 us and shifting it into a 32 bit unsigned variable.
This is my algorithm so far
void CountEdges(void)
{
uint_least32_t feedback_samples_copy = feedback_samples;
signal_edges = 0;
while (feedback_samples_copy > 0)
{
uint_least8_t flank_information = (feedback_samples_copy & 0x03);
if (flank_information == 0x01 || flank_information == 0x02)
{
signal_edges++;
}
feedback_samples_copy >>= 1;
}
}
It needs to be at least 2 or 3 times as fast.

You should be able to bitwise XOR them together to get a bit pattern representing the flipped bits. Then use one of the bit counting tricks on this page: http://graphics.stanford.edu/~seander/bithacks.html to count how many 1's there are in the result.

One thing that may help is to precompute the edge count for all possible 8-bit value (a 512 entry lookup table, since you have to include the bit the precedes each value) and then sum up the count 1 byte at a time.
// prevBit is the last bit of the previous 32-bit word
// edgeLut is a 512 entry precomputed edge count table
// Some of the shifts and & are extraneous, but there for clarity
edgeCount =
edgeLut[(prevBit << 8) | (feedback_samples >> 24) & 0xFF] +
edgeLut[(feedback_samples >> 16) & 0x1FF] +
edgeLut[(feedback_samples >> 8) & 0x1FF] +
edgeLut[(feedback_samples >> 0) & 0x1FF];
prevBit = feedback_samples & 0x1;

My suggestion:
copy your input value to a temp variable, left shifted by one
copy the LSB of your input to yout temp variable
XOR the two values. Every bit set in the result value represents one edge.
use this algorithm to count the number of bits set.
This might be the code for the first 3 steps:
uint32 input; //some value
uint32 temp = (input << 1) | (input & 0x00000001);
uint32 result = input ^ temp;
//continue to count the bits set in result
//...

Create a look-up table so you can get the transitions within a byte or 16-bit value in one shot - then all you need to do is look at the differences in the 'edge' bits between bytes (or 16-bit values).

You are looking at only 2 bits during every iteration.
The fastest algorithm would probably be to build a hash table for all possibles values. Since there are 2^32 values that is not the best idea.
But why don't you look at 3, 4, 5 ... bits in one step? You can for instance precalculate for all 4 bit combinations your edgecount. Just take care of possible edges between the pieces.

you could always use a lookup table for say 8 bits at a time
this way you get a speed improvement of around 8 times
don't forget to check for bits in between those 8 bits though. These then have to be checked 'manually'

Related

Does a 64 bit packed structure contains a field set to specified value

I have an odd structure with 5 fields of bit length 12 and 4 boolean flags stored in the high bits. This all fits nicely into a 64 bit long, and as such they are stored as a 64 bit word array. What I want to do is search the array and find if any of the 12 bit fields are set to a given value.
I have tried the obvious solution of using bit shifts and masks, however this is a very hot function and needs to be optimized for speed. This led me to the this page containing a way to check for a byte in a word in very few operations. This makes me think it is possible to do something similar with the 12 bit fields, however I am struggling to find what constants I would replace the ones given on that page with.
I'm not very versed in low level languages, but I'm in the mood to fiddle with some bits so I thought I'd give it a try.
POC: JS can't do 64bit longs, but we can check if we can adapt the algorithm to deal with 2x12bit fields + 8boolean flags (noise) in an 32bit (u)int.
The noise because the original algorithm. Dealt with exactly 4 bytes and no further bits, but neither 32 nor 64 can be divided by 12 so we need to ensure that these additional bits don't interfere. Or worse, get matched.
function hasValue(x, n) { return hasZero(x ^ (0x001001 * n)); }
function hasZero(v) { return ((v - 0x001001) & ~(v) & 0x800800); }
function hex(v) { return "0x" + v.toString(16) }
// create a random value, 2x12bit fields plus 8 random flags.
var v = Math.floor(Math.random() * 0x100000000);
console.log("value", hex(v));
// get the two fields
var a = v & 0xFFF;
console.log("check", hex(a), !!hasValue(v, a));
var b = (v >> 12) & 0xFFF;
console.log("check", hex(b), !!hasValue(v, b));
// brute force.
// check if any other value is matched.
// these should only return the 2 values from above.
for (var i = 0; i < 0x1000; ++i) {
if (hasValue(v, i)) {
console.log("matched", hex(i));
}
}
extrapolating from this, your solution should be
#define hasValue(x,n) hasZero(x ^ (0x001001001001001 * n))
#define hasZero(v) ((v - 0x001001001001001) & ~(v) & 0x800800800800800)
where all values are unsigned longs. (sorry don't know if you somehow have to annotate any of these numbers)

Implement bit vector using bitwise logical operations

This question is asked on Pearls of programming Question 2. And I am having trouble understanding its solution.
Here is the solution written in the book.
#define BITSPERWORD 32
#define SHIFT 5
#define MASK 0x1F
#define N 10000000
int a[1 + N/BITSPERWORD];
void set(int i) { a[i>>SHIFT] |= (1<<(i & MASK)); }
void clr(int i) { a[i>>SHIFT]&=~(1<<(i & MASK)); }
int test(int i) { return a[i>>SHIFT]&(1<<(i & MASK)); }
I have ran this in my compiler and I have looked at another question that talks about this problem, but I still dont understand how this solution works.
Why does it do a[i>>SHIFT]? Why cant it just be a[i]=1; Why does i need to shifted right 5 times?
32 is 25, so a right-shift of 5 bits is equivalent to dividing by 32. So by doing a[i>>5], you are dividing i by 32 to figure out which element of the array contains bit i -- there are 32 bits per element.
Meanwhile & MASK is equivalent to mod 32, so 1<<(i & MASK) builds a 1-bit mask for the particular bit within the word.
Divide the 32 bits of int i (starting form bit 0 to bit 31) into two parts.
First part is the most significant bits 31 to 5. Use this part to find the index in the array of ints (called a[] here) that you are using to implement the bit array. Initially, the entire array of ints is zeroed out.
Since every int in a[] is 32 bits, it can keep track of 32 ints with those 32 bits. We divide every input i with 32 to find the int in a[] that is supposed to keep track of this i.
Every time a number is divided by 2, it is effectively right shifted once. To divide a number by 32, you simply right shift it 5 times. And that is exactly what we get by filtering out the first part.
Second part is the least significant bits 0 to 4. After a number has been binned into the correct index, use this part to set the specific bit of the zero stored in a[] at this index. Obviously, if some bit of the zero at this index has already been set, the value at that index will not be zero anymore.
How to get the first part? Right shifting i by 5 (i.e. i >> SHIFT).
How to get the second part? Do bitwise AND of i by 11111. (11111)2 = 0x1F, defined as MASK. So, i & MASK will give the integer value represented by the last 5 bits of i.
The last 5 bits tell you how many bits to go inside the number in a[]. For example, if i is 5, you want to set the bit in the index 0 of a[] and you specifically want to set the 5th bit of the int value a[0].
Index to set = 5 / 32 = (0101 >> 5) = 0000 = 0.
Bit to set = 5th bit inside a[0]
= a[0] & (1 << 5)
= a[0] & (1 << (00101 & 11111)).
Setting the bit for given i
Get the int to set by a[i >> 5]
Get the bit to set by pushing a 1 a total of i % 32 times to the left i.e. 1 << (i & 0x1F)
Simply set the bit as a[i >> 5] = a[i >> 5] | (1 << (i & 0x1F));
That can be shortened to a[i >> 5] |= (1 << (i & 0x1F));
Getting/Testing the bit for given i
Get the int where the desired bit lies by a[i >> 5]
Generate a number where all bits except for the i & 0x1F bit are 0. You can do that by negating 1 << (i & 0x1F).
AND the number generated above with the value stored at this index in a[]. If the value is 0, this particular bit was 0. If the value is non-zero, this bit was 1.
In code you would simply, return a[i >> 5] & (1 << (i & 0x1F)) != 0;
Clearing the bit for given i: It means setting the bit for that i to 0.
Get the int where the bit lies by a[i >> 5]
Get the bit by 1 << (i & 0x1F)
Invert all the bits of 1 << (i & 0x1F) so that the i's bit is 0.
AND the number at this index and the number generated in step 3. That will clear i's bit, leaving all other bits intact.
In code, this would be: a[i >> 5] &= ~(1 << (i & 0x1F));

Fastest Way to XOR all bits from value based on bitmask?

I've got an interesting problem that has me looking for a more efficient way of doing things.
Let's say we have a value (in binary)
(VALUE) 10110001
(MASK) 00110010
----------------
(AND) 00110000
Now, I need to be able to XOR any bits from the (AND) value that are set in the (MASK) value (always lowest to highest bit):
(RESULT) AND1(0) xor AND4(1) xor AND5(1) = 0
Now, on paper, this is certainly quick since I can see which bits are set in the mask. It seems to me that programmatically I would need to keep right shifting the MASK until I found a set bit, XOR it with a separate value, and loop until the entire byte is complete.
Can anyone think of a faster way? I'm looking for the way to do this with the least number of operations and stored values.
If I understood this question correctly, what you want is to get every bit from VALUE that is set in the MASK, and compute the XOR of those bits.
First of all, note that XOR'ing a value with 0 will not change the result. So, to ignore some bits, we can treat them as zeros.
So, XORing the bits set in VALUE that are in MASK is equivalent to XORing the bits in VALUE&MASK.
Now note that the result is 0 if the number of set bits is even, 1 if it is odd.
That means we want to count the number of set bits. Some architectures/compilers have ways to quickly compute this value. For instance, on GCC this can be obtained with __builtin_popcount.
So on GCC, this can be computed with:
int set_bits = __builtin_popcount(value & mask);
return set_bits % 2;
If you want the code to be portable, then this won't do. However, a comment in this answer suggests that some compilers can inline std::bitset::count to efficiently obtain the same result.
If I'm understanding you right, you have
result = value & mask
and you want to XOR the 1 bits of mask & result together. The XOR of a series of bits is the same as counting the number of bits and checking if that count is even or odd. If it's odd, the XOR would be 1; if even, XOR would give 0.
count_bits(mask & result) % 2 != 0
mask & result can be simplified to simply result. You don't need to AND it with mask again. The % 2 != 0 can be alternately written as & 1.
count_bits(result) & 1
As far as how to count bits, the Bit Twiddling Hacks web page gives a number of bit counting algorithms.
Counting bits set, Brian Kernighan's way
unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
for (c = 0; v; c++)
{
v &= v - 1; // clear the least significant bit set
}
Brian Kernighan's method goes through as many iterations as there are
set bits. So if we have a 32-bit word with only the high bit set, then
it will only go once through the loop.
If you were to use that implementation, you could optimize it a bit further. If you think about it, you don't need the full count of bits. You only need to track their parity. Instead of counting bits you could just flip c each iteration.
unsigned bit_parity(unsigned v) {
unsigned c;
for (c = 0; v; c ^= 1) {
v &= v - 1;
}
}
(Thanks to Slava for the suggestion.)
Using that the XOR with 0 doesn't change anything, it's OK to apply the mask and then unconditionally XOR all bits together, which can be done in a parallel-prefix way. So something like this (not tested):
x = m & v;
x ^= x >> 16;
x ^= x >> 8;
x ^= x >> 4;
x ^= x >> 2;
x ^= x >> 1;
result = x & 1
You can use more (or fewer) steps as needed, this is for 32 bits.
One significant issue to be aware of if using v &= v - 1 in the main body of your code is it will change the value of v to 0 in conducting the count. With other methods, the value is changed to the number of 1's. While count logic is generally wrapped as a function, where that is no longer a concern, if you are required to present your counting logic in the main body of your code, you must preserve a copy of v if that value is needed again.
In addition to the other two methods presented, the following is another favorite from bit-twiddling hacks that generally has a bit better performance than the loop method for larger numbers:
/* get the population 1's in the binary representation of a number */
unsigned getn1s (unsigned int v)
{
v = v - ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
v = (v + (v >> 4)) & 0x0F0F0F0F;
v = v + (v << 8);
v = v + (v << 16);
return v >> 24;
}

Merge bits, then determine how many 0s the result has

I am trying to write a function which takes in three bit vectors representing the digits in use in the row, col and block of a Sudoku puzzle from positions 1-9. A cell can only use digits that are unused, and the function is supposed to return whether the digits in all the vectors force one possibility or whether there is more than one possibility. I took this to mean that I would have to merge all three vectors, and then determine where there were "unset" bits in the resulting pattern.
However, my function does not seem in gdb to be returning the correct mask even though it was inspired by this derivation: http://graphics.stanford.edu/~seander/bithacks.html#MaskedMerge
I am trying to merge one set among two, then the third set into the previous merge, derive the number of 1s in the final merge, and subtract it to derive how many 0s there are.
Then, I wrote the following function:
bool possibilities(unsigned short row, unsigned short col, unsigned short bloc)
{
unsigned int mask = (1 << col) - 1;
unsigned int inbw = (row & ~mask) | (col & mask);
unsigned int mask2 = (1 << bloc) - 1;
unsigned int final = (inbw & ~mask2) | (bloc & mask2);
int num_1s;
while (result != 0) {
result &= (result - 1);
num_1s++;
}
int zeroes = 32 - num_1s; // 32 being the presumed number of bits in a short.
if (zeroes == 1) return true;
return false;
}
According to this document:
http://www.cplusplus.com/doc/tutorial/variables/
A short is not smaller than char. At least 16 bits.
So you could be wrong calculating the zeroes as 32 - num_1s.
Instead of doing so, you can get an unsigned short and fill it with 1s, setting 0s at first 9 bits.
var = 0xFFFFFE00
By that way you avoid that the solution depends strongly on the size of the variable you use.
A solution to that problem could be this (assuming row, col and bloc like above):
possibilities = row | col | bloc;
while (possibilities != 0) {
num_0s += ((~possibilities)&1);
possibilities = (possibilities >> 1);
}
If I understood correctly that each of row, col, and bloc (sic) are bit masks with individual bits (presumably bits 0-8) representing the presence of digits 1-9, your masks are wrong (and indeed quite pointless). For example, if col has bit 8 set, then mask = (1 << col) - 1 shifts 1 to the left by 256 – since it is extremely unlikely that unsigned short be over 256 bits wide, this results in 0 after the shift and then in a mask with all bits set after you subtract the 1. After this (row & ~mask) | (col & mask) will be only col since ~mask is 0.
A couple of simple options come to mind:
1) Don't merge at all, simply do the popcount on each of the three variables individually. Some modern processors have an instruction for popcount, so if you manage use that, e.g., through a compiler's built-in function (e.g., __builtin_popcount), it will even be faster.
2) Mask the bits on each variable individually and shift them to position, e.g.:
const unsigned int mask = 0x1FF;
unsigned int final = (col & mask) | ((row & mask) << 9) | ((bloc & mask) << 18);
Also, don't subtract the number of 1's from 32 but from 27 (= 3×9) - that's the maximum number of 1 bits if each of the three variables can have at most 9 bits set.
Edit: Could be that I've misunderstood what you are trying to do by merging. If you mean a simple union of all 1 bits in the three variables, then it would be just unsigned int final = col | row | bloc with no need to mask. Then you would subtract the popcount (number of 1 bits) from 9.

Swapping pair of bits in a Byte

I have an arbitrary 8-bit binary number e.g., 11101101
I have to swap all the pair of bits like:
Before swapping: 11-10-11-01
After swapping: 11-01-11-10
I was asked this in an interview !
In pseudo-code:
x = ((x & 0b10101010) >> 1) | ((x & 0b01010101) << 1)
It works by handling the low bits and high bits of each bit-pair separately and then combining the result:
The expression x & 0b10101010 extracts the high bit from each pair, and then >> 1 shifts it to the low bit position.
Similarly the expression (x & 0b01010101) << 1 extracts the low bit from each pair and shifts it to the high bit position.
The two parts are then combined using bitwise-OR.
Since not all languages allow you to write binary literals directly, you could write them in for example hexadecimal:
Binary Hexadecimal Decimal
0b10101010 0xaa 170
0b01010101 0x55 85
Make two bit masks, one containing all the even bits and one containing the uneven bits (10101010 and 01010101).
Use bitwise-and to filter the input into two numbers, one having all the even bits zeroed, the other having all the uneven bits zeroed.
Shift the number that contains only even bits one bit to the left, and the other one one bit to the right
Use bitwise-or to combine them back together.
Example for 16 bits (not actual code):
short swap_bit_pair(short i) {
return ((i & 0101010110101010b) >> 1) | ((i & 0x0101010101010101b) << 1));
}
b = (a & 170 >> 1) | (a & 85 << 1)
The most elegant and flexible solution is, as others have said, to apply an 'comb' mask to both the even and odd bits seperately and then, having shifted them left and right respectively one place to combine them using bitwise or.
One other solution you may want to think about takes advantage of the relatively small size of your datatype. You can create a look up table of 256 values which is statically initialised to the values you want as output to your input:
const unsigned char lookup[] = { 0x02, 0x01, 0x03, 0x08, 0x0A, 0x09, 0x0B ...
Each value is placed in the array to represent the transformation of the index. So if you then do this:
unsigned char out = lookup[ 0xAA ];
out will contain 0x55
This is more cumbersome and less flexible than the first approach (what if you want to move from 8 bits to 16?) but does have the approach that it will be measurably faster if performing a large number of these operations.
Suppose your number is num.
First find the even position bit:
num & oxAAAAAAAA
Second step find the odd position bit:
num & ox55555555
3rd step change position odd position to even position bit and even position bit to odd position bit:
Even = (num & oxAAAAAAAA)>>1
Odd = (num & 0x55555555)<<1
Last step ... result = Even | Odd
Print result
I would first code it 'longhand' - that is to say in several obvious, explicit stages, and use that to validate that the unit tests I had in place were functioning correctly, and then only move to more esoteric bit manipulation solutions if I had a need for performance (and that extra performance was delivered by said improvments)
Code for people first, computers second.