How to replace manual bit manipulation by bit_permute_step - bit-manipulation

How can I do replace the following manual logic by http://programming.sirrida.de/perm_fn.html#bit_permute_step ?
unsigned int ConvertRGBAToBGRA(unsigned int v) {
unsigned char r = (v)& 0xFF;
unsigned char g = (v >> 8) & 0xFF;
unsigned char b = (v >> 16) & 0xFF;
unsigned char a = (v >> 24) & 0xFF;
return (a << 24) | (r << 16) | (g << 8) | b;
};
Is there a better way to do the above using http://programming.sirrida.de/perm_fn.html#bit_permute_step ?

Yes, namely:
return bit_permute_step(v, 0x000000ff, 16);
The bits of v indicated by the mask 0x000000ff contain the r component, bit_permute_step will exchange them with the corresponding bits 16 (the distance parameter) places to the left, which corresponds to the b component of v. So bit_permute_step(v, 0x000000ff, 16) will swap the r with the b, and hence turn RGBA into BGRA (and also BGRA into RGBA, because a swap is its own inverse).
This could also be found via the permutation calculator: http://programming.sirrida.de/calcperm.php
Use the indices 16 17 18 19 20 21 22 23 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 24 25 26 27 28 29 30 31 (source indices) and disable bit-group moving.
A C++ implementation (also usable as C code) of bit_permute_step for 32-bit integers could be:
uint32_t bit_permute_step(uint32_t x, uint32_t m, uint32_t shift) {
uint32_t t;
t = ((x >> shift) ^ x) & m;
x = (x ^ t) ^ (t << shift);
return x;
}

Related

C++ Bitshift 4 int_8t into a normal integer (32 bit )

I had already asked a question how to get 4 int8_t into a 32bit int, I was told that I have to cast the int8_t to a uint8_t first to pack it with bitshifting into a 32bit integer.
int8_t offsetX = -10;
int8_t offsetY = 120;
int8_t offsetZ = -60;
using U = std::uint8_t;
int toShader = (U(offsetX) << 24) | (U(offsetY) << 16) | (U(offsetZ) << 8) | (0 << 0);
std::cout << (int)(toShader >> 24) << " "<< (int)(toShader >> 16) << " " << (int)(toShader >> 8) << std::endl;
My Output is
-10 -2440 -624444
It's not what I expected, of course, does anyone have a solution?
In the shader I want to unpack the int16 later and that is only possible with a 32bit integer because glsl does not have any other data types.
int offsetX = data[gl_InstanceID * 3 + 2] >> 24;
int offsetY = data[gl_InstanceID * 3 + 2] >> 16 ;
int offsetZ = data[gl_InstanceID * 3 + 2] >> 8 ;
What is written in the square bracket does not matter it is about the correct shifting of the bits or casting after the bracket.
If any of the offsets is negative, then the shift results in undefined behaviour.
Solution: Convert the offsets to an unsigned type first.
However, this brings another potential problem: If you convert to unsigned, then negative numbers will have very large values with set bits in most significant bytes, and OR operation with those bits will always result in 1 regardless of offsetX and offsetY. A solution is to convert into a small unsigned type (std::uint8_t), and another is to mask the unused bytes. Former is probably simpler:
using U = std::uint8_t;
int third = U(offsetX) << 24u
| U(offsetY) << 16u
| U(offsetZ) << 8u
| 0u << 0u;
I think you're forgetting to mask the bits that you care about before shifting them.
Perhaps this is what you're looking for:
int32 offsetX = (data[gl_InstanceID * 3 + 2] & 0xFF000000) >> 24;
int32 offsetY = (data[gl_InstanceID * 3 + 2] & 0x00FF0000) >> 16 ;
int32 offsetZ = (data[gl_InstanceID * 3 + 2] & 0x0000FF00) >> 8 ;
if (offsetX & 0x80) offsetX |= 0xFFFFFF00;
if (offsetY & 0x80) offsetY |= 0xFFFFFF00;
if (offsetZ & 0x80) offsetZ |= 0xFFFFFF00;
Without the bit mask, the X part will end up in offsetY, and the X and Y part in offsetZ.
on CPU side you can use union to avoid bit shifts and bit masking and branches ...
int8_t x,y,z,w; // your 8bit ints
int32_t i; // your 32bit int
union my_union // just helper union for the casting
{
int8_t i8[4];
int32_t i32;
} a;
// 4x8bit -> 32bit
a.i8[0]=x;
a.i8[1]=y;
a.i8[2]=z;
a.i8[3]=w;
i=a.i32;
// 32bit -> 4x8bit
a.i32=i;
x=a.i8[0];
y=a.i8[1];
z=a.i8[2];
w=a.i8[3];
If you do not like unions the same can be done with pointers...
Beware on GLSL side is this not possible (nor unions nor pointers) and you have to use bitshifts and masks like in the other answer...

Generalizing binary left shift for octal representation without conversion

Currently I have a few lines of code for working with binary strings in their decimal representation, namely I have functions to rotate the binary string to the left, flip a specific bit, flip all bits and reverse order of the binary string all working on the decimal representation. They are defined as follows:
inline u64 rotate_left(u64 n, u64 maxPower) {
return (n >= maxPower) ? (((int64_t)n - (int64_t)maxPower) * 2 + 1) : n * 2;
}
inline bool checkBit(u64 n, int k) {
return n & (1ULL << k);
}
inline u64 flip(u64 n, u64 maxBinaryNum) {
return maxBinaryNum - n - 1;
}
inline u64 flip(u64 n, u64 kthPower, int k) {
return checkBit(n, k) ? (int64_t(n) - (int64_t)kthPower) : (n + kthPower);
}
inline u64 reverseBits(u64 n, int L) {
u64 rev = (lookup[n & 0xffULL] << 56) | // consider the first 8 bits
(lookup[(n >> 8) & 0xffULL] << 48) | // consider the next 8 bits
(lookup[(n >> 16) & 0xffULL] << 40) | // consider the next 8 bits
(lookup[(n >> 24) & 0xffULL] << 32) | // consider the next 8 bits
(lookup[(n >> 32) & 0xffULL] << 24) | // consider the next 8 bits
(lookup[(n >> 40) & 0xffULL] << 16) | // consider the next 8 bits
(lookup[(n >> 48) & 0xffULL] << 8) | // consider the next 8 bits
(lookup[(n >> 54) & 0xffULL]); // consider last 8 bits
return (rev >> (64 - L)); // get back to the original maximal number
}
WIth the lookup[] list defined as:
#define R2(n) n, n + 2*64, n + 1*64, n + 3*64
#define R4(n) R2(n), R2(n + 2*16), R2(n + 1*16), R2(n + 3*16)
#define R6(n) R4(n), R4(n + 2*4 ), R4(n + 1*4 ), R4(n + 3*4 )
#define REVERSE_BITS R6(0), R6(2), R6(1), R6(3)
const u64 lookup[256] = { REVERSE_BITS };
All but the last one are easy to implement.
My question is whether you know any generalization of the above functions for the octal string of a number, while working only on the decimal representation as above? Obviously without doing a conversion and storing the octal string itself (mainly due to performance boost)
With flip() in octal code a would need to return the number with 8-x at the specified place in the string (for intstance: flip(2576, 2nd power, 2nd position) = 2376, i.e. 3 = 8-5).
I do understand that in octal representation the any similar formulas as for rotate_left or flip are not possible (maybe?), that is why I look for alternative implementation.
A possibility would be to represent each number in the octal string by their binary string, in other words to write: 29 --octal-> 35 --bin-> (011)(101)
Thus working on sets of binary numbers. Would that be a good idea?
If you have any suggestions for the code above for binary representation, I welcome any piece of advice.
Thanks in advance and sorry for the long post!
my understand of rotate_left, do not know my understand of question is correct, hope this will help you.
// maxPower: 8
// n < maxPower:
// 0001 -> 0010
//
// n >= maxPower
// n: 1011
// n - maxPower: 0011
// (n - maxPower) * 2: 0110
// (n - maxPower) * 2 + 1: 0111
inline u64 rotate_left(u64 n, u64 maxPower) {
return (n >= maxPower) ? (((int64_t)n - (int64_t)maxPower) * 2 + 1) : n * 2;
}
// so rotate_left for octadecimal, example: 3 digit octadecimal rotate left.
// 0 1 1 -> 1 1 0
// 000 001 001 -> 001 001 000
// 4 4 0 -> 4 0 4
// 100 100 000 -> 100 000 100
// so, keep:
// first digit of octadecimal number is:
// fisrt_digit = n & (7 << ((digit-1) * 3))
// other digit of octadecimal number is:
// other_digit = n - first_digit
// example for 100 100 000:
// first_digit is 100 000 000
// other_digit is 000 100 000
// so rotate left result is:
// (other_digit << 3) | (first_digit >> ((digit-1) * 3))
//
inline u64 rotate_left_oct(u64 n, u64 digit) {
u64 rotate = 3 * (digit - 1);
u64 first_digit = n & (7 << rotate);
u64 other_digit = n - first_digit;
return (other_digit << 3) | (first_digit >> rotate);
}
flip, for base 8, flip should be 7-x instead of 8-x:
// oct flip same with binary flip:
// (111)8 -> (001 001 001)2
// flip,
// (666)8 -> (110 110 110)2
// this should be 7 - 1, not 8 - 1, indead.
//
inline u64 flip_oct(u64 n, u64 digit) {
u64 maxNumber = (1 << (3 * digit)) - 1;
assert(n <= maxNumber);
return maxNumber - n;
}
// otc flip one digit
// (111)8 -> (001 001 001)2
// flip 2nd number of it
// (161)8 -> (001 110 001)2
// just need do xor of nth number of octadecimal number.
//
inline u64 flip_oct(u64 n, u64 nth, u64 digit) {
return (7 << (3 * (nth - 1))) ^ n;
}
simple reverse.
inline u64 reverse_oct(u64 n, u64 digit) {
u64 m = 0;
while (digit > 0) {
m = (m << 3) | (n & 7);
n = n >> 3;
--digit;
}
return m;
}

Bit Reversal - not clear what the output is

I am reading the following example :
Var1(REG1, 0U, 16U);
Var2(REG2, 0U, 8U);
UINT32 FirstReg = Getaddress1(Var1); //the dimension is 16 bit
FirstReg = ((FirstReg >> 1) & 0x5555) | ((FirstReg << 1) & 0xaaaa);
FirstReg = ((FirstReg >> 2) & 0x3333) | ((FirstReg << 2) & 0xcccc);
FirstReg = ((FirstReg >> 4) & 0x0f0f) | ((FirstReg << 4) & 0xf0f0);
FirstReg = ((FirstReg >> 8) & 0x00ff) | ((FirstReg << 8) & 0xff00);
FirstReg = (FirstReg << 8);
UINT32 SecondReg = Getaddress2(Var2);//the dimension is 8 bit
SecondReg = ((SecondReg >> 1) & 0x5555) | ((SecondReg << 1) & 0xaaaa);
SecondReg = ((SecondReg >> 2) & 0x3333) | ((SecondReg << 2) & 0xcccc);
SecondReg = ((SecondReg >> 4) & 0x0f0f) | ((SecondReg << 4) & 0xf0f0);
SecondReg = ((SecondReg >> 8) & 0x00ff) | ((SecondReg << 8) & 0xff00);
SecondReg = (SecondReg >> 8);
return (FirstReg | SecondReg);
Basically as far as i undestand the intention is to reverse the bits read in the 2 UINT32 Reg(s) variables and collect in only 1 variable of UINT32 type.
I don't get if the first bit (for example) of SecondReg will become the 17th bit of the returned variable or the first one.
First, even if the algorythm works with 32 bits integers, only the 16 least significant bits are used because they are anded with 16 bits only values.
So after the first part (before the last shift) FirstReg and SecondReg contains the 16 least significant bits of the original values reversed.
Then FirstReg is shifted left 8 bits and SecondReg is shifted right 8 bits and both are ored. The result is a 32 bits values composed with (most significant byte to least): O, high order byte of FirstReg, low order byte of FirstReg, high order byte of SecondReg

Extract n most significant non-zero bits from int in C++ without loops

I want to extract the n most significant bits from an integer in C++ and convert those n bits to an integer.
For example
int a=1200;
// its binary representation within 32 bit word-size is
// 00000000000000000000010010110000
Now I want to extract the 4 most significant digits from that representation, i.e. 1111
00000000000000000000010010110000
^^^^
and convert them again to an integer (1001 in decimal = 9).
How is possible with a simple c++ function without loops?
Some processors have an instruction to count the leading binary zeros of an integer, and some compilers have instrinsics to allow you to use that instruction. For example, using GCC:
uint32_t significant_bits(uint32_t value, unsigned bits) {
unsigned leading_zeros = __builtin_clz(value);
unsigned highest_bit = 32 - leading_zeros;
unsigned lowest_bit = highest_bit - bits;
return value >> lowest_bit;
}
For simplicity, I left out checks that the requested number of bits are available. For Microsoft's compiler, the intrinsic is called __lzcnt.
If your compiler doesn't provide that intrinsic, and you processor doesn't have a suitable instruction, then one way to count the zeros quickly is with a binary search:
unsigned leading_zeros(int32_t value) {
unsigned count = 0;
if ((value & 0xffff0000u) == 0) {
count += 16;
value <<= 16;
}
if ((value & 0xff000000u) == 0) {
count += 8;
value <<= 8;
}
if ((value & 0xf0000000u) == 0) {
count += 4;
value <<= 4;
}
if ((value & 0xc0000000u) == 0) {
count += 2;
value <<= 2;
}
if ((value & 0x80000000u) == 0) {
count += 1;
}
return count;
}
It's not fast, but (int)(log(x)/log(2) + .5) + 1 will tell you the position of the most significant non-zero bit. Finishing the algorithm from there is fairly straight-forward.
This seems to work (done in C# with UInt32 then ported so apologies to Bjarne):
unsigned int input = 1200;
unsigned int most_significant_bits_to_get = 4;
// shift + or the msb over all the lower bits
unsigned int m1 = input | input >> 8 | input >> 16 | input >> 24;
unsigned int m2 = m1 | m1 >> 2 | m1 >> 4 | m1 >> 6;
unsigned int m3 = m2 | m2 >> 1;
unsigned int nbitsmask = m3 ^ m3 >> most_significant_bits_to_get;
unsigned int v = nbitsmask;
unsigned int c = 32; // c will be the number of zero bits on the right
v &= -((int)v);
if (v>0) c--;
if ((v & 0x0000FFFF) >0) c -= 16;
if ((v & 0x00FF00FF) >0) c -= 8;
if ((v & 0x0F0F0F0F) >0 ) c -= 4;
if ((v & 0x33333333) >0) c -= 2;
if ((v & 0x55555555) >0) c -= 1;
unsigned int result = (input & nbitsmask) >> c;
I assumed you meant using only integer math.
I used some code from #OliCharlesworth's link, you could remove the conditionals too by using the LUT for trailing zeroes code there.

Given 2 16-bit ints, can I interleave those bits to form a single 32 bit int?

Whats the proper way about going about this? Lets say I have ABCD and abcd and the output bits should be something like AaBbCcDd.
unsigned int JoinBits(unsigned short a, unsigned short b) { }
#include <stdint.h>
uint32_t JoinBits(uint16_t a, uint16_t b) {
uint32_t result = 0;
for(int8_t ii = 15; ii >= 0; ii--){
result |= (a >> ii) & 1;
result <<= 1;
result |= (b >> ii) & 1;
if(ii != 0){
result <<= 1;
}
}
return result;
}
also tested on ideone here: http://ideone.com/lXTqB.
First, spread your bits:
unsigned int Spread(unsigned short x)
{
unsigned int result=0;
for (unsigned int i=0; i<15; ++i)
result |= ((x>>i)&1)<<(i*2);
return result;
}
Then merge the two with an offset in your function like this:
Spread(a) | (Spread(b)<<1);
If you want true bitwise interleaving, the simplest and elegant way might be this:
unsigned int JoinBits(unsigned short a, unsigned short b)
{
unsigned int r = 0;
for (int i = 0; i < 16; i++)
r |= ((a & (1 << i)) << i) | ((b & (1 << i)) << (i + 1));
return r;
}
Without any math trick to exploit, my first naive solution would be to use a BitSet like data structure to compute the output number bit by bit. This would take looping over lg(a) + lg(b) bits which would give you the complexity.
Quite possible with some bit manipulation, but the exact code depends on the byte order of the platform. Assuming little-endian (which is the most common), you could do:
unsigned int JoinBits(unsigned short x, unsigned short y) {
// x := AB-CD
// y := ab-cd
char bytes[4];
/* Dd */ bytes[0] = ((x & 0x000F) << 4) | (y & 0x000F);
/* Cc */ bytes[1] = (x & 0x00F0) | ((y & 0x00F0) >> 4);
/* Bb */ bytes[2] = ((x & 0x0F00) >> 4) | ((y & 0x0F00) >> 8);
/* Aa */ bytes[3] = ((x & 0xF000) >> 8) | ((y & 0xF000) >> 12);
return *reinterpret_cast<unsigned int *>(bytes);
}
From Sean Anderson's website :
static const unsigned short MortonTable256[256] =
{
0x0000, 0x0001, 0x0004, 0x0005, 0x0010, 0x0011, 0x0014, 0x0015,
0x0040, 0x0041, 0x0044, 0x0045, 0x0050, 0x0051, 0x0054, 0x0055,
0x0100, 0x0101, 0x0104, 0x0105, 0x0110, 0x0111, 0x0114, 0x0115,
0x0140, 0x0141, 0x0144, 0x0145, 0x0150, 0x0151, 0x0154, 0x0155,
0x0400, 0x0401, 0x0404, 0x0405, 0x0410, 0x0411, 0x0414, 0x0415,
0x0440, 0x0441, 0x0444, 0x0445, 0x0450, 0x0451, 0x0454, 0x0455,
0x0500, 0x0501, 0x0504, 0x0505, 0x0510, 0x0511, 0x0514, 0x0515,
0x0540, 0x0541, 0x0544, 0x0545, 0x0550, 0x0551, 0x0554, 0x0555,
0x1000, 0x1001, 0x1004, 0x1005, 0x1010, 0x1011, 0x1014, 0x1015,
0x1040, 0x1041, 0x1044, 0x1045, 0x1050, 0x1051, 0x1054, 0x1055,
0x1100, 0x1101, 0x1104, 0x1105, 0x1110, 0x1111, 0x1114, 0x1115,
0x1140, 0x1141, 0x1144, 0x1145, 0x1150, 0x1151, 0x1154, 0x1155,
0x1400, 0x1401, 0x1404, 0x1405, 0x1410, 0x1411, 0x1414, 0x1415,
0x1440, 0x1441, 0x1444, 0x1445, 0x1450, 0x1451, 0x1454, 0x1455,
0x1500, 0x1501, 0x1504, 0x1505, 0x1510, 0x1511, 0x1514, 0x1515,
0x1540, 0x1541, 0x1544, 0x1545, 0x1550, 0x1551, 0x1554, 0x1555,
0x4000, 0x4001, 0x4004, 0x4005, 0x4010, 0x4011, 0x4014, 0x4015,
0x4040, 0x4041, 0x4044, 0x4045, 0x4050, 0x4051, 0x4054, 0x4055,
0x4100, 0x4101, 0x4104, 0x4105, 0x4110, 0x4111, 0x4114, 0x4115,
0x4140, 0x4141, 0x4144, 0x4145, 0x4150, 0x4151, 0x4154, 0x4155,
0x4400, 0x4401, 0x4404, 0x4405, 0x4410, 0x4411, 0x4414, 0x4415,
0x4440, 0x4441, 0x4444, 0x4445, 0x4450, 0x4451, 0x4454, 0x4455,
0x4500, 0x4501, 0x4504, 0x4505, 0x4510, 0x4511, 0x4514, 0x4515,
0x4540, 0x4541, 0x4544, 0x4545, 0x4550, 0x4551, 0x4554, 0x4555,
0x5000, 0x5001, 0x5004, 0x5005, 0x5010, 0x5011, 0x5014, 0x5015,
0x5040, 0x5041, 0x5044, 0x5045, 0x5050, 0x5051, 0x5054, 0x5055,
0x5100, 0x5101, 0x5104, 0x5105, 0x5110, 0x5111, 0x5114, 0x5115,
0x5140, 0x5141, 0x5144, 0x5145, 0x5150, 0x5151, 0x5154, 0x5155,
0x5400, 0x5401, 0x5404, 0x5405, 0x5410, 0x5411, 0x5414, 0x5415,
0x5440, 0x5441, 0x5444, 0x5445, 0x5450, 0x5451, 0x5454, 0x5455,
0x5500, 0x5501, 0x5504, 0x5505, 0x5510, 0x5511, 0x5514, 0x5515,
0x5540, 0x5541, 0x5544, 0x5545, 0x5550, 0x5551, 0x5554, 0x5555
};
unsigned short x; // Interleave bits of x and y, so that all of the
unsigned short y; // bits of x are in the even positions and y in the odd;
unsigned int z; // z gets the resulting 32-bit Morton Number.
z = MortonTable256[y >> 8] << 17 |
MortonTable256[x >> 8] << 16 |
MortonTable256[y & 0xFF] << 1 |
MortonTable256[x & 0xFF];