How to de-interleave bits (UnMortonizing?) - bit-manipulation

What is the most efficient way to de-interleave bits from a 32 bit int? For this particular case, I'm only concerned about the odd bits, although I'm sure it's simple to generalize any solution to both sets.
For example, I want to convert 0b01000101 into 0b1011. What's the quickest way?
EDIT:
In this application, I can guarantee that the even bits are all zeros. Can I take advantage of that fact to improve speed or reduce space?

Given that you know that every other bit is 0 in your application, you can do it like this:
x = (x | (x >> 1)) & 0x33333333;
x = (x | (x >> 2)) & 0x0f0f0f0f;
x = (x | (x >> 4)) & 0x00ff00ff;
x = (x | (x >> 8)) & 0x0000ffff;
The first step looks like this:
0a0b0c0d0e0f0g0h0i0j0k0l0m0n0o0p x
| 00a0b0c0d0e0f0g0h0i0j0k0l0m0n0o0 x >> 1
--------------------------------
= 0aabbccddeeffgghhiijjkkllmmnnoop x | (x >> 1)
& 00110011001100110011001100110011 0x33333333
--------------------------------
= 00ab00cd00ef00gh00ij00kl00mn00op (x | (x >> 1)) & 0x33333333
Then the second step works with two bits at a time, and so on.

In terms of speed, a lookup table 16 bits wide with 2^32 entries will be hard to beat!
But if you don't have that much memory to spare, four lookups in a 256-entry table,
plus a few shifts and ANDs to stitch them together, might be a better choice. Or perhaps the sweet spot is somewhere in between...it depends on the resources you have available, and
how the cost of initializing the lookup table will be amortized over the number of lookups you need to perform.

I'm not sure how quick it would be, but you could do something like
int a = 0b01000101;
int b = 0;
int i = 0;
while (a > 0) {
b |= (a & 1) << i;
a >>= 2;
}
Which would pull all the odd bits off of a and put them on b.

Related

The fastest way to swap the two lowest bits in an unsigned int in C++

Assume that I have:
unsigned int x = 883621;
which in binary is :
00000000000011010111101110100101
I need the fastest way to swap the two lowest bits:
00000000000011010111101110100110
Note: To clarify: If x is 7 (0b111), the output should be still 7.
If you have few bytes of memory to spare, I would start with a lookup table:
constexpr unsigned int table[]={0b00,0b10,0b01,0b11};
unsigned int func(unsigned int x){
auto y = (x & (~0b11)) |( table[x&0b11]);
return y;
}
Quickbench -O3 of all the answers so far.
Quickbench -Ofast of all the answers so far.
(Plus my ifelse naive idea.)
[Feel free to add yourself and edit my answer].
Please do correct me if you believe the benchmark is incorrect, I am not an expert in reading assembly. So hopefully volatile x prevented caching the result between loops.
I'll ignore the top bits for a second - there's a trick using multiplication. Multiplication is really a convolution operation, and you can use that to shuffle bits.
In particular, assume the two lower bits are AB. Multiply that by 0b0101, and you get ABAB. You'll see that the swapped bits BA are the middle bits.
Hence,
x = (x & ~3U) | ((((x&3)*5)>>1)&3)
[edit] The &3 is needed to strip the top A bit, but with std::uint_32_t you can use overflow to lose that bit for free - multiplication then gets you the result BAB0'0000'0000'0000'0000'0000'0000'0000'0000' :
x = (x & ~3U) | ((((x&3)*0xA0000000)>>30));
I would use
x = (x & ~0b11) | ((x & 0b10) >> 1) | ((x & 0b01) << 1);
Inspired by the table idea, but with the table as a simple constant instead of an array. We just need mask(00)==00, mask(01)==11, mask(10)=11, masK(11)==11.
constexpr unsigned int table = 0b00111100;
unsigned int func(unsigned int x) {
auto xormask = (table >> ((x&3) * 2)) &3;
x ^= xormask;
return x;
}
This also uses the xor-trick from dyungwang to avoid isolating the top bits.
Another idea, to avoid stripping the top bits. Assume x has the bits XXXXAB, then we want to x-or it with 0000(A^B)(A^B). Thus
auto t = x^(x>>1); // Last bit is now A^B
t &=1; // take just that bit
t *= 3; // Put in the last two positions
x ^= t; // Change A to B and B to A.
Just looking from a mathematical point of view, I would start with a rotate_left() function, which rotates a list of bits one place to the left (011 becomes 110, then 101, and then back 011), and use this as follows:
int func(int input){
return rotate_left(rotate_left((input / 4))) + rotate_left(input % 4);
}
Using this on the author's example 11010111101110100101:
input = 11010111101110100101;
input / 4 = 110101111011101001;
rotate_left(input / 4) = 1101011110111010010;
rotate_left(rotate_left(input / 4) = 11010111101110100100;
input % 4 = 01;
rotate_left(input % 4) = 10;
return 11010111101110100110;
There is also a shift() function, which can be used (twice!) for replacing the integer division.

Fastest Way to XOR all bits from value based on bitmask?

I've got an interesting problem that has me looking for a more efficient way of doing things.
Let's say we have a value (in binary)
(VALUE) 10110001
(MASK) 00110010
----------------
(AND) 00110000
Now, I need to be able to XOR any bits from the (AND) value that are set in the (MASK) value (always lowest to highest bit):
(RESULT) AND1(0) xor AND4(1) xor AND5(1) = 0
Now, on paper, this is certainly quick since I can see which bits are set in the mask. It seems to me that programmatically I would need to keep right shifting the MASK until I found a set bit, XOR it with a separate value, and loop until the entire byte is complete.
Can anyone think of a faster way? I'm looking for the way to do this with the least number of operations and stored values.
If I understood this question correctly, what you want is to get every bit from VALUE that is set in the MASK, and compute the XOR of those bits.
First of all, note that XOR'ing a value with 0 will not change the result. So, to ignore some bits, we can treat them as zeros.
So, XORing the bits set in VALUE that are in MASK is equivalent to XORing the bits in VALUE&MASK.
Now note that the result is 0 if the number of set bits is even, 1 if it is odd.
That means we want to count the number of set bits. Some architectures/compilers have ways to quickly compute this value. For instance, on GCC this can be obtained with __builtin_popcount.
So on GCC, this can be computed with:
int set_bits = __builtin_popcount(value & mask);
return set_bits % 2;
If you want the code to be portable, then this won't do. However, a comment in this answer suggests that some compilers can inline std::bitset::count to efficiently obtain the same result.
If I'm understanding you right, you have
result = value & mask
and you want to XOR the 1 bits of mask & result together. The XOR of a series of bits is the same as counting the number of bits and checking if that count is even or odd. If it's odd, the XOR would be 1; if even, XOR would give 0.
count_bits(mask & result) % 2 != 0
mask & result can be simplified to simply result. You don't need to AND it with mask again. The % 2 != 0 can be alternately written as & 1.
count_bits(result) & 1
As far as how to count bits, the Bit Twiddling Hacks web page gives a number of bit counting algorithms.
Counting bits set, Brian Kernighan's way
unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
for (c = 0; v; c++)
{
v &= v - 1; // clear the least significant bit set
}
Brian Kernighan's method goes through as many iterations as there are
set bits. So if we have a 32-bit word with only the high bit set, then
it will only go once through the loop.
If you were to use that implementation, you could optimize it a bit further. If you think about it, you don't need the full count of bits. You only need to track their parity. Instead of counting bits you could just flip c each iteration.
unsigned bit_parity(unsigned v) {
unsigned c;
for (c = 0; v; c ^= 1) {
v &= v - 1;
}
}
(Thanks to Slava for the suggestion.)
Using that the XOR with 0 doesn't change anything, it's OK to apply the mask and then unconditionally XOR all bits together, which can be done in a parallel-prefix way. So something like this (not tested):
x = m & v;
x ^= x >> 16;
x ^= x >> 8;
x ^= x >> 4;
x ^= x >> 2;
x ^= x >> 1;
result = x & 1
You can use more (or fewer) steps as needed, this is for 32 bits.
One significant issue to be aware of if using v &= v - 1 in the main body of your code is it will change the value of v to 0 in conducting the count. With other methods, the value is changed to the number of 1's. While count logic is generally wrapped as a function, where that is no longer a concern, if you are required to present your counting logic in the main body of your code, you must preserve a copy of v if that value is needed again.
In addition to the other two methods presented, the following is another favorite from bit-twiddling hacks that generally has a bit better performance than the loop method for larger numbers:
/* get the population 1's in the binary representation of a number */
unsigned getn1s (unsigned int v)
{
v = v - ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
v = (v + (v >> 4)) & 0x0F0F0F0F;
v = v + (v << 8);
v = v + (v << 16);
return v >> 24;
}

8-digit BCD check

I've a 8-digit BCD number and need to check it out to see if it is a valid BCD number. How can I programmatically (C/C++) make this?
Ex: 0x12345678 is valid, but 0x00f00abc isn't.
Thanks in advance!
You need to check each 4-bit quantity to make sure it's less than 10. For efficiency you want to work on as many bits as you can at a single time.
Here I break the digits apart to leave a zero between each one, then add 6 to each and check for overflow.
uint32_t highs = (value & 0xf0f0f0f0) >> 4;
uint32_t lows = value & 0x0f0f0f0f;
bool invalid = (((highs + 0x06060606) | (lows + 0x06060606)) & 0xf0f0f0f0) != 0;
Edit: actually we can do slightly better. It doesn't take 4 bits to detect overflow, only 1. If we divide all the digits by 2, it frees a bit and we can check all the digits at once.
uint32_t halfdigits = (value >> 1) & 0x77777777;
bool invalid = ((halfdigits + 0x33333333) & 0x88888888) != 0;
The obvious way to do this is:
/* returns 1 if x is valid BCD */
int
isvalidbcd (uint32_t x)
{
for (; x; x = x>>4)
{
if ((x & 0xf) >= 0xa)
return 0;
}
return 1;
}
This link tells you all about BCD, and recommends something like this asa more optimised solution (reworking to check all the digits, and hence using a 64 bit data type, and untested):
/* returns 1 if x is valid BCD */
int
isvalidbcd (uint32_t x)
{
return !!(((uint64_t)x + 0x66666666ULL) ^ (uint64_t)x) & 0x111111110ULL;
}
For a digit to be invalid, it needs to be 10-15. That in turn means 8 + 4 or 8+2 - the low bit doesn't matter at all.
So:
long mask8 = value & 0x88888888;
long mask4 = value & 0x44444444;
long mask2 = value & 0x22222222;
return ((mask8 >> 2) & ((mask4 >>1) | mask2) == 0;
Slightly less obvious:
long mask8 = (value>>2);
long mask42 = (value | (value>>1);
return (mask8 & mask42 & 0x22222222) == 0;
By shifting before masking, we don't need 3 different masks.
Inspired by #Mark Ransom
bool invalid = (0x88888888 & (((value & 0xEEEEEEEE) >> 1) + (0x66666666 >> 1))) != 0;
// or
bool valid = !((((value & 0xEEEEEEEEu) >> 1) + 0x33333333) & 0x88888888);
Mask off each BCD digit's 1's place, shift right, then add 6 and check for BCD digit overflow.
How this works:
By adding +6 to each digit, we look for an overflow * of the 4-digit sum.
abcd
+ 110
-----
*efgd
But the bit value of d does not contribute to the sum, so first mask off that bit and shift right. Now the overflow bit is in the 8's place. This all is done in parallel and we mask these carry bits with 0x88888888 and test if any are set.
0abc
+ 11
-----
*efg

Calculate how many ones in bits and bits inverting [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How many 1s in an n-bit integer?
Hello
How to calculate how many ones in bits?
1100110 -> 4
101 -> 2
And second question:
How to invert bits?
1100110 -> 0011001
101 -> 010
Thanks
If you can get your bits into a std::bitset, you can use the flip method to invert, and the count method to count the bits.
The book Hacker's Delight by Henry S Warren Jr. contains lots of useful little gems on computing this sort of thing - and lots else besides. Everyone who does low level bit twiddling should have a copy :)
The counting-1s section is 8 pages long!
One of them is:
int pop(unsigned x)
{
x = x - ((x >> 1) & 0x55555555);
x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
x = (x + (x >> 4)) & 0x0F0F0F0F;
x = x + (x >> 8);
x = x + (x >> 16);
return x & 0x0000003F;
}
A potentially critical advantage compared to the looping options already presented is that the runtime is not variable. If it's inside a hard-real-time interrupt service routine this is much more important than "fastest-average-computation" time.
There's also a long thread on bit counting here:
How to count the number of set bits in a 32-bit integer?
You can loop while the number is non-zero, and increment a counter when the last bit is set. Or if you are working on Intel architecture, you can use the popcnt instruction in inline assembly.
int count_bit_set(unsigned int x) {
int count = 0;
while (x != 0) {
count += (x & 1);
x = x >> 1;
}
return count;
}
You use the ~ operator.
Counting bits: http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
Inverting bits: x = ~x;
For the first question, Fast Bit Counting has a few ways of doing it, the simplest being:
int bitcount (unsigned int n) {
int count = 0;
while (n) {
count += n & 0x1u;
n >>= 1;
}
return count;
}
For the second question, use the ´~´ (bitwise negation) operator.
To count the number of set bits in a number you can use the hakmem parallel counting which is the fastest approach not using predefined tables for parallel counting:
http://tekpool.wordpress.com/2006/09/25/bit-count-parallel-counting-mit-hakmem/
while inverting bits is really easy:
i = ~i;
A somewhat trikcy (but faster) solution would be:
int setbitcount( unsigned int x )
{
int result;
for( result=0; x; x&=x-1, ++result )
;
return result;
}
Compared to sylvain's soultion, this function iterates in the loop only the number of set bits. That is: for the number 1100110, it will do only 4 iteration (compared to 32 in Sylvain's algorithm).
The key is the expression x&=x-1, which will clear the least significant set bit. i.e.:
1) 1100110 & 1100101 = 1100100
2) 1100100 & 1100011 = 1100000
3) 1100000 & 1011111 = 1000000
4) 1000000 & 0111111 = 0
You can also inverse bits by XOR'ing them with some number. For example - inversing byte:
INVERTED_BYTE = BYTE ^ 0xFF
How to calculate how many ones in bits?
Hamming weight.
How to invert bits?
i = ~i;

Fastest way to convert unsigned char 8 bits to actual numbers

I am using an unsigned char to store 8 flags. Each flag represents the corner of a cube. So 00000001 will be corner 1 01000100 will be corners 3 and 7 etc. My current solution is to & the result with 1,2,4,8,16,32,64 and 128, check whether the result is not zero and store the corner. That is, if (result & 1) corners.push_back(1);. Any chance I can get rid of that 'if' statement? I was hoping I could get rid of it with bitwise operators but I could not think of any.
A little background on why I want to get rid of the if statement. This cube is actually a Voxel which is part of a grid that is at least 512x512x512 in size. That is more than 134 million Voxels. I am performing calculations on each one of the Voxels (well, not exactly, but I won't go into too much detail as it is irrelevant here) and that is a lot of calculations. And I need to perform these calculations per frame. Any speed boost that is minuscule per function call will help with these amount of calculations. To give you an idea, my algorithm (at some point) needed to determine whether a float was negative, positive or zero (within some error). I had if statements in there and greater/smaller than checks. I replaced that with a fast float to int function and shaved of a quarter of a second. Currently, each frame in a 128x128x128 grid takes a little more than 4 seconds.
I would consider a different approach to it entirely: there are only 256 possibilities for different combinations of flags. Precalculate 256 vectors and index into them as needed.
std::vector<std::vector<int> > corners(256);
for (int i = 0; i < 256; ++i) {
std::vector<int>& v = corners[i];
if (i & 1) v.push_back(1);
if (i & 2) v.push_back(2);
if (i & 4) v.push_back(4);
if (i & 8) v.push_back(8);
if (i & 16) v.push_back(16);
if (i & 32) v.push_back(32);
if (i & 64) v.push_back(64);
if (i & 128) v.push_back(128);
}
for (int i = 0; i < NumVoxels(); ++i) {
unsigned char flags = GetFlags(i);
const std::vector& v = corners[flags];
... // do whatever with v
}
This would avoid all the conditionals and having push_back call new which I suspect would be more expensive anyway.
If there's some operation that needs to be done if the bit is set and not if it's not, it seems you'll have to have a conditional of some kind somewhere. If it could be expressed as a calculation somehow, you could get around it like this, for example:
numCorners = ((result >> 0) & 1) + ((result >> 1) & 1) + ((result >> 2) & 1) + ...
Hackers's Delight, first page:
x & (-x) // isolates the lowest set bit
x & (x - 1) // clears the lowest set bit
Inlining your push_back method would also help (better create a function that receives all the flags together).
Usually if you need performance, you should design the whole system with that in mind. Maybe if you post more code it will be easier to help.
EDIT: here is a nice idea:
unsigned char LOG2_LUT[256] = {...};
int t;
switch (count_set_bits(flags)){
case 8: t = flags;
flags &= (flags - 1); // clearing a bit that was set
t ^= flags; // getting the changed bit
corners.push_back(LOG2_LUT[t]);
case 7: t = flags;
flags &= (flags - 1);
t ^= flags;
corners.push_back(LOG2_LUT[t]);
case 6: t = flags;
flags &= (flags - 1);
t ^= flags;
corners.push_back(LOG2_LUT[t]);
// etc...
};
count_set_bits() is a very known function: http://www-graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
There is a way, it's not "pretty", but it works.
(result & 1) && corners.push_back(1);
(result & 2) && corners.push_back(2);
(result & 4) && corners.push_back(3);
(result & 8) && corners.push_back(4);
(result & 16) && corners.push_back(5);
(result & 32) && corners.push_back(6);
(result & 64) && corners.push_back(7);
(result & 128) && corners.push_back(8);
it uses a seldom known feature of the C++ language: the boolean shortcut.
I've noted a similar algorithm in the OpenTTD code. It turned out to be utterly useless: you're faster off by not breaking down numbers like that. Instead, replace the iteration over the vector<> you have now by an iteration over the bits of the byte. This is far more cache-friendly.
I.e.
unsigned char flags = Foo(); // the value you didn't put in a vector<>
for (unsigned char c = (UCHAR_MAX >> 1) + 1; c !=0 ; c >>= 1)
{
if (flags & c)
Bar(flags&c);
}