How to split an unsigned long int (32 bit) into 8 nibbles? - c++

I am sorry if my question is confusing but here is the example of what I want to do,
lets say I have an unsigned long int = 1265985549
in binary I can write this as 01001011011101010110100000001101
now I want to split this binary 32 bit number into 4 bits like this and work separately on those 4 bits
0100 1011 0111 0101 0110 1000 0000 1101
any help would be appreciated.

You can get a 4-bit nibble at position k using bit operations, like this:
uint32_t nibble(uint32_t val, int k) {
return (val >> (4*k)) & 0x0F;
}
Now you can get the individual nibbles in a loop, like this:
uint32_t val = 1265985549;
for (int k = 0; k != 8 ; k++) {
uint32_t n = nibble(val, k);
cout << n << endl;
}
Demo on ideone.

short nibble0 = (i >> 0) & 15;
short nibble1 = (i >> 4) & 15;
short nibble2 = (i >> 8) & 15;
short nibble3 = (i >> 12) & 15;
etc

Based on the comment explaining the actual use for this, here's an other way to count how many nibbles have an odd parity: (not tested)
; compute parities of nibbles
x ^= x >> 2;
x ^= x >> 1;
x &= 0x11111111;
; add the parities
x = (x + (x >> 4)) & 0x0F0F0F0F;
int count = x * 0x01010101 >> 24;
The first part is just a regular "xor all the bits" type of parity calculation (where "all bits" refers to all the bits in a nibble, not in the entire integer), the second part is based on this bitcount algorithm, skipping some steps that are unnecessary because certain bits are always zero and so don't have to be added.

Related

Difference between bitshifting mask vs unsigned int

For a project, I had to find the individual 8-bits of a unsigned int. I first tried bit-shifting the mask to find the numbers, but that didn't work, so I tried bit-shifting the value and it worked.
What's the difference between these two? Why didn't the first one work?
ExampleFunk(unsigned int value){
for (int i = 0; i < 4; i++) {
ExampleSubFunk(value & (0x00FF << (i * 8)));
}
}
ExampleFunk(unsigned int value){
for (int i = 0; i < 4; i++) {
ExampleSubFunk((value >> (i * 8)) & 0x00FF);
}
}
Take the value 0xAABBCCDD as an example.
The expression value & (0xFF << (i * 8)) assumes the values:
0xAABBCCDD & 0x000000FF = 0x000000DD
0xAABBCCDD & 0x0000FF00 = 0x0000CC00
0xAABBCCDD & 0x00FF0000 = 0x00BB0000
0xAABBCCDD & 0xFF000000 = 0xAA000000
While the expression (value >> (i * 8)) & 0xFF assumes the values:
0xAABBCCDD & 0x000000FF = 0x000000DD
0x00AABBCC & 0x000000FF = 0x000000CC
0x0000AABB & 0x000000FF = 0x000000BB
0x000000AA & 0x000000FF = 0x000000AA
As you can see, the results are quite different after i = 0, because the first expression is only "selecting" 8 bits from value, while the second expression is shifting them down to the least significant byte first.
Note that in the first case, the expression (0xFF << (i * 8)) is shifting an int literal (0xFF) left. You should cast the literal to unsigned int to avoid signed integer overflow, which is undefined behavior:
value & ((unsigned int)0xFF << (i * 8))
In this code:
ExampleFunk(unsigned int value){
for (int i = 0; i < 4; i++) {
ExampleSubFunk(value & (0x00FF << (i * 8)));
}
}
You are shifting the bits of 0x00FF itself, producing new masks of 0x00FF, 0xFF00, 0xFF0000, and 0xFF000000, and then you are masking value with each of those masks. The result contains only the 8 bits of value that you are interested in, but those 8 bits are not moving position at all.
In this code:
ExampleFunk(unsigned int value){
for (int i = 0; i < 4; i++) {
ExampleSubFunk((value >> (i * 8)) & 0x00FF);
}
}
You are shifting the bits of value, thus moving those 8 bits that you want, and then you are masking the result with 0x00FF to extract those 8 bits.

stretch mask - bit manipulation

I want to stretch a mask in which every bit represents 4 bits of stretched mask.
I am looking for an elegant bit manipulation to stretch using c++ and systemC
for example:
input:
mask (32 bits) = 0x0000CF00
output:
stretched mask (128 bits) = 0x00000000 00000000 FF00FFFF 00000000
and just to clarify the example let's look at the the byte C:
0xC = 1100 after stretching: 1111111100000000 = 0xFF00
Do this in a elegant form is not easy.
The simple mode maybe is create a loop with shift bit
sc_biguint<128> result = 0;
for(int i = 0; i < 32; i++){
if(bit_test(var, i)){
result +=0x0F;
}
result << 4;
}
Here's a way of stretching a 16-bit mask into 64 bits where every bit represents 4 bits of stretched mask:
uint64_t x = 0x000000000000CF00LL;
x = (x | (x << 24)) & 0x000000ff000000ffLL;
x = (x | (x << 12)) & 0x000f000f000f000fLL;
x = (x | (x << 6)) & 0x0303030303030303LL;
x = (x | (x << 3)) & 0x1111111111111111LL;
x |= x << 1;
x |= x << 2;
It starts of with the mask in the bottom 16 bits. Then it moves the top 8 bits of the mask into the top 32 bits, like this:
0000000000000000 0000000000000000 0000000000000000 ABCDEFGHIJKLMNOP
becomes
0000000000000000 00000000ABCDEFGH 0000000000000000 00000000IJKLMNOP
Then it solves the similar problem of stretching a mask from the bottom 8 bits of a 32 bit word, to the top and bottom 32-bits simultaneously:
000000000000ABCD 000000000000EFGH 000000000000IJKL 000000000000MNOP
Then it does it for 4 bits inside 16 and so on until the bits are spread out:
000A000B000C000D 000E000F000G000H 000I000J000K000L 000M000N000O000P
Then it "smears" them across 4 bits by ORing the result with itself twice:
AAAABBBBCCCCDDDD EEEEFFFFGGGGHHHH IIIIJJJJKKKKLLLL MMMMNNNNOOOOPPPP
You could extend this to 128 bits by adding an extra first step where you shift by 48 bits and mask with a 128-bit constant:
x = (x | (x << 48)) & 0x000000000000ffff000000000000ffffLLL;
You'd also have to stretch the other constants out to 128 bits just by repeating the bit patterns. However (as far as I know) there is no way to declare a 128-bit constant in C++, but perhaps you could do it with macros or something (see this question). You could also make a 128-bit version just by using the 64-bit version on the top and bottom 16 bits separately.
If loading the masking constants turns out to be a difficulty or bottleneck you can generate each one from the previous one using shifting and masking:
uint64_t m = 0x000000ff000000ffLL;
m &= m >> 4; m |= m << 16; // gives 0x000f000f000f000fLL
m &= m >> 2; m |= m << 8; // gives 0x0303030303030303LL
m &= m >> 1; m |= m << 4; // gives 0x1111111111111111LL
Does this work for you?
#include <stdio.h>
long long Stretch4x(int input)
{
long long output = 0;
while (input & -input)
{
int b = (input & -input);
long long s = 0;
input &= ~b;
s = b*15;
while(b>>=1)
{
s <<= 3;
}
output |= s;
}
return output;
}
int main(void) {
int input = 0xCF00;
printf("0x%0x ==> 0x%0llx\n", input, Stretch4x(input));
return 0;
}
Output:
0xcf00 ==> 0xff00ffff00000000
The other solutions are good. However, most them are more C than C++. This solution is pretty straight forward: it uses std::bitset and set four bits for each input bit.
#include <bitset>
#include <iostream>
std::bitset<128>
starch_32 (const std::bitset<32> &input)
{
std::bitset<128> output;
for (size_t i = 0; i < input.size(); ++i) {
// If `input[N]` is `true`, set `output[N*4, N*4+4]` to true.
if (input.test (i)) {
const size_t output_index = i * 4;
output.set (output_index);
output.set (output_index + 1);
output.set (output_index + 2);
output.set (output_index + 3);
}
}
return output;
}
// Example with 0xC.
int main() {
std::bitset<32> input{0b1100};
auto result = starch_32 (input);
std::cout << "0x" << std::hex << result.to_ullong() << "\n";
}
Try it online!
On x86 you could use the PDEP intrinsic to move the 16 mask bits into the correct nibble (into the low bit of each nibble, for example) of a 64-bit word, and then use a couple of shift + or to smear them into the rest of the word:
unsigned long x = _pdep_u64(m, 0x1111111111111111);
x |= x << 1;
x |= x << 2;
You could also replace those two OR and two shift by a single multiplication by 0xF which accomplishes the same smearing.
Finally, you could consider a SIMD approach: solutions such as samgak's above should map naturally to SIMD.

Calculating morton code

i am trying to interleave(For calculating morton code) 2 signed long numbers say x and y (32 bits) with values
case 1 :
x = 10; //1010
y = 10; //1010
result will be :
11001100
case 2:
x = -10;
y = 10;
Binary representation are,
x = 1111111111111111111111111111111111111111111111111111111111110110
y = 1010
For interleaving ,i am considering only 32 bit representation where i can interleave 31st bit of x with 31st bit of y ,
using the following code,
signed long long x_y;
for (int i = 31; i >= 0; i--)
{
unsigned long long xbit = ((unsigned long) x)& (1 << i);
x_y|= (xbit << i);
unsigned long long ybit = ((unsigned long) y)& (1 << i);
if (i != 0)
{
x_y|= (x_y<< (i - 1));
}
else
{
(x_y= x_y<< 1) |= ybit;
}
}
The above code works fine ,if we have x positive and y negative but the case 2 is failing ,Please help me ,what is going wrong?
The negative numbers uses 64 bits ,whereas positive numbers uses 32 bits.Correct me if iam wrong.
I think below code work according to your requirement,
Morton code is 64 bits and we are making 64 bit number from two 32 bits numbers by interleaving.
Since numbers are signed ,we have to consider negative numbers as,
if (x < 0) //value will be represented as 2's compliment,hence uses all 64 bits
{
value = x; //value is of 32 bit,so use only first lower 32 bits
cout << value;
value &= ~(1 << 31); //make sign bit to 0,as it does not contribute to real value.
}
similarly do for y.
Following code does the interleaving,
unsigned long long x_y_copy = 0; //make a copy of ur morton code
//looping for each bit of two 32 bit numbers starting from MSB.
for (int i = 31; i >=0; i--)
{
//making mort to 0,so because shifting causes loss of data
mort = 0;
//take 32 bit from x
int xbit = ((unsigned long)x)& (1 << i);
mort = (mort |= xbit)<<i+1; /*shifting*/
//copy formed code to copy ,so that next time the value is preserved for appending
x_y_copy|= mort;
mort =0;
//take 32nd bit from 'y' also
int ybit = ((unsigned long)y)& (1 << i);
mort = (mort |= ybit)<<i;
x_y_copy |= mort;
}
//this is important,when 'y' is negative because the 32nd bit of 'y' is set to 0 by above first code,and while moving 32 bit of 'y' to morton code,the value 0 is copied to 63rd bit,which has to be made to 1,as sign bit is not 63rd bit.
if (mapu_y < 0)
{
x_y_copy = (x_y_copy) | (4611686018427387904);//4611686018427387904 = pow(2,63)
}
I hope this helps.:)

High Order Bits - Take them and make a uint64_t into a uint8_t

Let's say you have a uint64_t and care only about the high order bit for each byte in your uint64_t. Like so:
uint32_t:
0000 ... 1000 0000 1000 0000 1000 0000 1000 0000 ---> 0000 1111
Is there a faster way than:
return
(
((x >> 56) & 128)+
((x >> 49) & 64)+
((x >> 42) & 32)+
((x >> 35) & 16)+
((x >> 28) & 8)+
((x >> 21) & 4)+
((x >> 14) & 2)+
((x >> 7) & 1)
)
Aka shifting x, masking, and adding the correct bit for each byte? This will compile to a lot of assembly and I'm looking for a quicker way... The machine I'm using only has up to SSE2 instructions and I failed to find helpful SIMD ops.
Thanks for the help.
As I mentioned in a comment, pmovmskb does what you want. Here's how you could use it:
MMX + SSE1:
movq mm0, input ; input can be r/m
pmovmskb output, mm0 ; output must be r
SSE2:
movq xmm0, input
pmovmskb output, xmm0
And I looked up the new way
BMI2:
mov rax, 0x8080808080808080
pext output, input, rax ; input must be r
return ((x & 0x8080808080808080) * 0x2040810204081) >> 56;
works. The & selects the bits you want to keep. The multiplications all the bits into the most significant byte, and the shift moves them to the least significant byte. Since multiplication is fast on most modern CPUs this shouldn't be much slower than using assembly.
And here's how to do it using SSE intrinsics:
#include <xmmintrin.h>
#include <inttypes.h>
#include <stdio.h>
int main (void)
{
uint64_t x
= 0b0000000010000000000000001000000000000000100000000000000010000000;
printf ("%x\n", _mm_movemask_pi8 ((__m64) x));
return 0;
}
Works fine with:
gcc -msse
You don't need all the separate logical ANDs, you can simplify it to:
x &= 0x8080808080808080;
return (x >> 7) | (x >> 14) | (x >> 21) | (x >> 28) |
(x >> 35) | (x >> 42) | (x >> 49) | (x >> 56);
(assuming that the function return type is uint8_t).
You can also convert that to an unrolled loop:
uint8_t r = 0;
x &= 0x8080808080808080;
x >>= 7; r |= x;
x >>= 7; r |= x;
x >>= 7; r |= x;
x >>= 7; r |= x;
x >>= 7; r |= x;
x >>= 7; r |= x;
x >>= 7; r |= x;
x >>= 7; r |= x;
return r;
I'm not sure which will perform better in practice, though I'd tend to bet on the first - the second might produce shorter code but with a long dependency chain.
First you don't really need so many operations. You can act on more than one bit at a time:
x = (x >> 7) & 0x0101010101010101; // 0x0101010101010101
x |= x >> 28; // 0x????????11111111
x |= x >> 14; // 0x????????????5555
x |= x >> 7; // 0x??????????????FF
return x & 0xFF;
An alternative is to use modulo to do sideway additions. The first thing is to note that x % n is the sum of the digits in base n+1, so if n+1 is 2^k, you are adding groups of k bits. If you start with
t = (x >> 7) & 0x0101010101010101 like above, you want to sum groups of 7 bits, thus t % 127 would be the solution. But t%127 works only for result up to 126. 0x8080808080808080 and anything above will gives incorrect result. I've tried some corrections, none where easy.
Trying to use modulo to put us in the situation where there is just the last step of the previous algorithm to was possible. What we want is to keep the two less significant bits, and then have the sum of the other one, grouped by 14. So
ull t = (x & 0x8080808080808080) >> 7;
ull u = (t & 3) | (((t>>2) % 0x3FFF) << 2);
return (u | (u>>7)) & 0xFF;
But t>>2 is t/4 and << 2 is multiplying by 4. And if we have (a % b)*c == (a*c % b*c), thus (((t>>2) % 0x3FFF) << 2) is (t & ~3) % 0xFFFC. But we also have the fact that a + b%c = (a+b)%c if it is less than c. So we have simply u = t % FFFC. Giving:
ull t = ((x & 0x8080808080808080) >> 7) % 0xFFFC;
return (t | (t>>7)) & 0xFF;
This seems to work:
return (x & 0x8080808080808080) % 127;

Extract n most significant non-zero bits from int in C++ without loops

I want to extract the n most significant bits from an integer in C++ and convert those n bits to an integer.
For example
int a=1200;
// its binary representation within 32 bit word-size is
// 00000000000000000000010010110000
Now I want to extract the 4 most significant digits from that representation, i.e. 1111
00000000000000000000010010110000
^^^^
and convert them again to an integer (1001 in decimal = 9).
How is possible with a simple c++ function without loops?
Some processors have an instruction to count the leading binary zeros of an integer, and some compilers have instrinsics to allow you to use that instruction. For example, using GCC:
uint32_t significant_bits(uint32_t value, unsigned bits) {
unsigned leading_zeros = __builtin_clz(value);
unsigned highest_bit = 32 - leading_zeros;
unsigned lowest_bit = highest_bit - bits;
return value >> lowest_bit;
}
For simplicity, I left out checks that the requested number of bits are available. For Microsoft's compiler, the intrinsic is called __lzcnt.
If your compiler doesn't provide that intrinsic, and you processor doesn't have a suitable instruction, then one way to count the zeros quickly is with a binary search:
unsigned leading_zeros(int32_t value) {
unsigned count = 0;
if ((value & 0xffff0000u) == 0) {
count += 16;
value <<= 16;
}
if ((value & 0xff000000u) == 0) {
count += 8;
value <<= 8;
}
if ((value & 0xf0000000u) == 0) {
count += 4;
value <<= 4;
}
if ((value & 0xc0000000u) == 0) {
count += 2;
value <<= 2;
}
if ((value & 0x80000000u) == 0) {
count += 1;
}
return count;
}
It's not fast, but (int)(log(x)/log(2) + .5) + 1 will tell you the position of the most significant non-zero bit. Finishing the algorithm from there is fairly straight-forward.
This seems to work (done in C# with UInt32 then ported so apologies to Bjarne):
unsigned int input = 1200;
unsigned int most_significant_bits_to_get = 4;
// shift + or the msb over all the lower bits
unsigned int m1 = input | input >> 8 | input >> 16 | input >> 24;
unsigned int m2 = m1 | m1 >> 2 | m1 >> 4 | m1 >> 6;
unsigned int m3 = m2 | m2 >> 1;
unsigned int nbitsmask = m3 ^ m3 >> most_significant_bits_to_get;
unsigned int v = nbitsmask;
unsigned int c = 32; // c will be the number of zero bits on the right
v &= -((int)v);
if (v>0) c--;
if ((v & 0x0000FFFF) >0) c -= 16;
if ((v & 0x00FF00FF) >0) c -= 8;
if ((v & 0x0F0F0F0F) >0 ) c -= 4;
if ((v & 0x33333333) >0) c -= 2;
if ((v & 0x55555555) >0) c -= 1;
unsigned int result = (input & nbitsmask) >> c;
I assumed you meant using only integer math.
I used some code from #OliCharlesworth's link, you could remove the conditionals too by using the LUT for trailing zeroes code there.