stretch mask - bit manipulation - c++

I want to stretch a mask in which every bit represents 4 bits of stretched mask.
I am looking for an elegant bit manipulation to stretch using c++ and systemC
for example:
input:
mask (32 bits) = 0x0000CF00
output:
stretched mask (128 bits) = 0x00000000 00000000 FF00FFFF 00000000
and just to clarify the example let's look at the the byte C:
0xC = 1100 after stretching: 1111111100000000 = 0xFF00

Do this in a elegant form is not easy.
The simple mode maybe is create a loop with shift bit
sc_biguint<128> result = 0;
for(int i = 0; i < 32; i++){
if(bit_test(var, i)){
result +=0x0F;
}
result << 4;
}

Here's a way of stretching a 16-bit mask into 64 bits where every bit represents 4 bits of stretched mask:
uint64_t x = 0x000000000000CF00LL;
x = (x | (x << 24)) & 0x000000ff000000ffLL;
x = (x | (x << 12)) & 0x000f000f000f000fLL;
x = (x | (x << 6)) & 0x0303030303030303LL;
x = (x | (x << 3)) & 0x1111111111111111LL;
x |= x << 1;
x |= x << 2;
It starts of with the mask in the bottom 16 bits. Then it moves the top 8 bits of the mask into the top 32 bits, like this:
0000000000000000 0000000000000000 0000000000000000 ABCDEFGHIJKLMNOP
becomes
0000000000000000 00000000ABCDEFGH 0000000000000000 00000000IJKLMNOP
Then it solves the similar problem of stretching a mask from the bottom 8 bits of a 32 bit word, to the top and bottom 32-bits simultaneously:
000000000000ABCD 000000000000EFGH 000000000000IJKL 000000000000MNOP
Then it does it for 4 bits inside 16 and so on until the bits are spread out:
000A000B000C000D 000E000F000G000H 000I000J000K000L 000M000N000O000P
Then it "smears" them across 4 bits by ORing the result with itself twice:
AAAABBBBCCCCDDDD EEEEFFFFGGGGHHHH IIIIJJJJKKKKLLLL MMMMNNNNOOOOPPPP
You could extend this to 128 bits by adding an extra first step where you shift by 48 bits and mask with a 128-bit constant:
x = (x | (x << 48)) & 0x000000000000ffff000000000000ffffLLL;
You'd also have to stretch the other constants out to 128 bits just by repeating the bit patterns. However (as far as I know) there is no way to declare a 128-bit constant in C++, but perhaps you could do it with macros or something (see this question). You could also make a 128-bit version just by using the 64-bit version on the top and bottom 16 bits separately.
If loading the masking constants turns out to be a difficulty or bottleneck you can generate each one from the previous one using shifting and masking:
uint64_t m = 0x000000ff000000ffLL;
m &= m >> 4; m |= m << 16; // gives 0x000f000f000f000fLL
m &= m >> 2; m |= m << 8; // gives 0x0303030303030303LL
m &= m >> 1; m |= m << 4; // gives 0x1111111111111111LL

Does this work for you?
#include <stdio.h>
long long Stretch4x(int input)
{
long long output = 0;
while (input & -input)
{
int b = (input & -input);
long long s = 0;
input &= ~b;
s = b*15;
while(b>>=1)
{
s <<= 3;
}
output |= s;
}
return output;
}
int main(void) {
int input = 0xCF00;
printf("0x%0x ==> 0x%0llx\n", input, Stretch4x(input));
return 0;
}
Output:
0xcf00 ==> 0xff00ffff00000000

The other solutions are good. However, most them are more C than C++. This solution is pretty straight forward: it uses std::bitset and set four bits for each input bit.
#include <bitset>
#include <iostream>
std::bitset<128>
starch_32 (const std::bitset<32> &input)
{
std::bitset<128> output;
for (size_t i = 0; i < input.size(); ++i) {
// If `input[N]` is `true`, set `output[N*4, N*4+4]` to true.
if (input.test (i)) {
const size_t output_index = i * 4;
output.set (output_index);
output.set (output_index + 1);
output.set (output_index + 2);
output.set (output_index + 3);
}
}
return output;
}
// Example with 0xC.
int main() {
std::bitset<32> input{0b1100};
auto result = starch_32 (input);
std::cout << "0x" << std::hex << result.to_ullong() << "\n";
}
Try it online!

On x86 you could use the PDEP intrinsic to move the 16 mask bits into the correct nibble (into the low bit of each nibble, for example) of a 64-bit word, and then use a couple of shift + or to smear them into the rest of the word:
unsigned long x = _pdep_u64(m, 0x1111111111111111);
x |= x << 1;
x |= x << 2;
You could also replace those two OR and two shift by a single multiplication by 0xF which accomplishes the same smearing.
Finally, you could consider a SIMD approach: solutions such as samgak's above should map naturally to SIMD.

Related

Bit Manipulation in Integer Array

#define SET_BIT(byte, bit) (byte |= (1 << bit))
#define CLEAR_BIT(byte,bit) (byte &= ~(1 << bit))
uint8_t data [5];
for (int i = 0; i < 5; ++i)
{
for(int j = 7; j >= 0; --j)
{
if (some condition)
--> CLEAR_BIT(data[i],j);
else if (some condition)
--> SET_BIT(data[i],j);
}
}
I want to understand how is bit manipulation taking place in arrowed lines?
When I declare uint8_t data [5]; does it mean... An array of name data and I can store 5 uint8_t value(char basically) in it and in the location of each array index there will be 8 bits.Like This
Because you use define the line SET_BIT(data[i],j) is being replaces during pre-processing with data[i] |= (1 << j) (consider using inline instead and have the benefit of type safety).
The best way to understand is to make an example of a simple case (avoid these nested loops).
(lets assume you're data array is initialized to 0, which is not currently the case)
for instance: i = 2, j = 3:
1 << 3 = 00001000 shift the value 1 three times to the left , data[2] = 00000000
|= is a bit-wise operator, there for - a logical OR between each corresponding bits will take place and data[2] will be equal to 00001000
now, for i = 2, j = 6:
1 << 6 = 01000000, data[2] = 00001000
A bit-wise OR will take place and yield 1 for bits 3,6.
data[2] will be equal to 01001000.
With this example you can understand the more complex example.

c++: how to put relevant bits from uint32 into uint8?

I have a uint32 that I've flagged some bits on:
uint32 i = 0;
i |= (1 << 0);
i |= (1 << 5);
i |= (1 << 13);
i |= (1 << 19);
...
I want to convert it to a uint8 (by getting the state of its first 8 bits and disregarding the rest). Obviously I could do this:
uint8 j = 0;
for (int q = 0; q < 8; q++)
{
if (i & (1 << q))
{
j |= (1 << q);
}
}
But is there a fancy bitwise operation I can use to transfer the bits over in one fell swoop, without a loop?
You can achieve the same result by simply assigning the uint32 value to uint8.
int main()
{
unsigned int i = 0x00000888;
unsigned char j = i;
cout<<hex<<i<<endl;
cout<<hex<<+j<<endl;
return 0;
}
output:
888
88
Why not just mask those last 8 bits instead of running a loop over to see if individual bits are set?
const unsigned char bitMask = 0xFF;
j = (i & bitMask);
Note that C++ 14 though allows you to define binary literals right away
const unsigned char bitMask = 0b1111'1111;
The above is all you need. Just in case, if you need to get the subsequent byte positions, use the same mask 0xFF and make sure to right shift back the result to get the desired byte value.

Calculating morton code

i am trying to interleave(For calculating morton code) 2 signed long numbers say x and y (32 bits) with values
case 1 :
x = 10; //1010
y = 10; //1010
result will be :
11001100
case 2:
x = -10;
y = 10;
Binary representation are,
x = 1111111111111111111111111111111111111111111111111111111111110110
y = 1010
For interleaving ,i am considering only 32 bit representation where i can interleave 31st bit of x with 31st bit of y ,
using the following code,
signed long long x_y;
for (int i = 31; i >= 0; i--)
{
unsigned long long xbit = ((unsigned long) x)& (1 << i);
x_y|= (xbit << i);
unsigned long long ybit = ((unsigned long) y)& (1 << i);
if (i != 0)
{
x_y|= (x_y<< (i - 1));
}
else
{
(x_y= x_y<< 1) |= ybit;
}
}
The above code works fine ,if we have x positive and y negative but the case 2 is failing ,Please help me ,what is going wrong?
The negative numbers uses 64 bits ,whereas positive numbers uses 32 bits.Correct me if iam wrong.
I think below code work according to your requirement,
Morton code is 64 bits and we are making 64 bit number from two 32 bits numbers by interleaving.
Since numbers are signed ,we have to consider negative numbers as,
if (x < 0) //value will be represented as 2's compliment,hence uses all 64 bits
{
value = x; //value is of 32 bit,so use only first lower 32 bits
cout << value;
value &= ~(1 << 31); //make sign bit to 0,as it does not contribute to real value.
}
similarly do for y.
Following code does the interleaving,
unsigned long long x_y_copy = 0; //make a copy of ur morton code
//looping for each bit of two 32 bit numbers starting from MSB.
for (int i = 31; i >=0; i--)
{
//making mort to 0,so because shifting causes loss of data
mort = 0;
//take 32 bit from x
int xbit = ((unsigned long)x)& (1 << i);
mort = (mort |= xbit)<<i+1; /*shifting*/
//copy formed code to copy ,so that next time the value is preserved for appending
x_y_copy|= mort;
mort =0;
//take 32nd bit from 'y' also
int ybit = ((unsigned long)y)& (1 << i);
mort = (mort |= ybit)<<i;
x_y_copy |= mort;
}
//this is important,when 'y' is negative because the 32nd bit of 'y' is set to 0 by above first code,and while moving 32 bit of 'y' to morton code,the value 0 is copied to 63rd bit,which has to be made to 1,as sign bit is not 63rd bit.
if (mapu_y < 0)
{
x_y_copy = (x_y_copy) | (4611686018427387904);//4611686018427387904 = pow(2,63)
}
I hope this helps.:)

Compact a hex number

Is there a clever (ie: branchless) way to "compact" a hex number. Basically move all the 0s all to one side?
eg:
0x10302040 -> 0x13240000
or
0x10302040 -> 0x00001324
I looked on Bit Twiddling Hacks but didn't see anything.
It's for a SSE numerical pivoting algorithm. I need to remove any pivots that become 0. I can use _mm_cmpgt_ps to find good pivots, _mm_movemask_ps to convert that in to a mask, and then bit hacks to get something like the above. The hex value gets munged in to a mask for a _mm_shuffle_ps instruction to perform a permutation on the SSE 128 bit register.
To compute mask for _pext:
mask = arg;
mask |= (mask << 1) & 0xAAAAAAAA | (mask >> 1) & 0x55555555;
mask |= (mask << 2) & 0xCCCCCCCC | (mask >> 2) & 0x33333333;
First do bit-or on pairs of bits, then on quads. Masks prevent shifted values from overflowing to other digits.
After computing mask this way or harold's way (which is probably faster) you don't need the full power of _pext, so if targeted hardware doesn't support it you can replace it with this:
for(int i = 0; i < 7; i++) {
stay_mask = mask & (~mask - 1);
arg = arg & stay_mask | (arg >> 4) & ~stay_mask;
mask = stay_mask | (mask >> 4);
}
Each iteration moves all nibbles one digit to the right if there is some space. stay_mask marks bits that are in their final positions. This uses somewhat less operations than Hacker's Delight solution, but might still benefit from branching.
Supposing we can use _pext_u32, the issue then is computing a mask that has an F for every nibble that isn't zero. I'm not sure what the best approach is, but you can compute the OR of the 4 bits of the nibble and then "spread" it back out to F's like this:
// calculate horizontal OR of every nibble
x |= x >> 1;
x |= x >> 2;
// clean up junk
x &= 0x11111111;
// spread
x *= 0xF;
Then use that as the mask of _pext_u32.
_pext_u32 can be emulated by this (taken from Hacker's Delight, figure 7.6)
unsigned compress(unsigned x, unsigned m) {
unsigned mk, mp, mv, t;
int i;
x = x & m; // Clear irrelevant bits.
mk = ~m << 1; // We will count 0's to right.
for (i = 0; i < 5; i++) {
mp = mk ^ (mk << 1); // Parallel prefix.
mp = mp ^ (mp << 2);
mp = mp ^ (mp << 4);
mp = mp ^ (mp << 8);
mp = mp ^ (mp << 16);
mv = mp & m; // Bits to move.
m = m ^ mv | (mv >> (1 << i)); // Compress m.
t = x & mv;
x = x ^ t | (t >> (1 << i)); // Compress x.
mk = mk & ~mp;
}
return x;
}
But that's a bit of a disaster. It's probably better to just resort to branching code then.
uint32_t fun(uint32_t val) {
uint32_t retVal(0x00);
uint32_t sa(28);
for (int sb(28); sb >= 0; sb -= 4) {
if (val & (0x0F << sb)) {
retVal |= (0x0F << sb) << (sa - sb)
sa -= 4;
}
}
return retVal;
}
I think this (or something similar) is what you're looking for. Eliminating the 0 nibbles within a number. I've not debugged it, and it would only works on one side atm.
If your processor supports conditional instruction execution, you may get a benefit from this algorithm:
uint32_t compact(uint32_t orig_value)
{
uint32_t mask = 0xF0000000u; // Mask for isolating a hex digit.
uint32_t new_value = 0u;
for (unsigned int i = 0; i < 8; ++i) // 8 hex digits
{
if (orig_value & mask == 0u)
{
orig_value = orig_value << 4; // Shift the original value by 1 digit
}
new_value |= orig_value & mask;
mask = mask >> 4; // next digit
}
return new_value;
}
This looks like a good candidate for loop unrolling.
The algorithm assumes that when the original value is shifted left, zeros are shifted in, filling in the "empty" bits.
Edit 1:
On a processor that supports conditional execution of instructions, the shifting of the original value would be conditionally executed depending on the result of the ANDing of the original value and the mask. Thus no branching, only ignored instructions.
I came up with the following solution. Please take a look, maybe it will help you.
#include <iostream>
#include <sstream>
#include <algorithm>
using namespace std;
class IsZero
{
public:
bool operator ()(char c)
{
return '0' == c;
}
};
int main()
{
int a = 0x01020334; //IMPUT
ostringstream my_sstream;
my_sstream << hex << a;
string str = my_sstream.str();
int base_str_length = str.size();
cout << "Input hex: " << str << endl;
str.insert(remove_if(begin(str), end(str), IsZero()), count_if(begin(str), end(str), IsZero()), '0');
str.replace(begin(str) + base_str_length, end(str), "");
cout << "Processed hex: " << str << endl;
return 0;
}
Output:
Input hex: 1020334
Processed hex: 1233400

How to split an unsigned long int (32 bit) into 8 nibbles?

I am sorry if my question is confusing but here is the example of what I want to do,
lets say I have an unsigned long int = 1265985549
in binary I can write this as 01001011011101010110100000001101
now I want to split this binary 32 bit number into 4 bits like this and work separately on those 4 bits
0100 1011 0111 0101 0110 1000 0000 1101
any help would be appreciated.
You can get a 4-bit nibble at position k using bit operations, like this:
uint32_t nibble(uint32_t val, int k) {
return (val >> (4*k)) & 0x0F;
}
Now you can get the individual nibbles in a loop, like this:
uint32_t val = 1265985549;
for (int k = 0; k != 8 ; k++) {
uint32_t n = nibble(val, k);
cout << n << endl;
}
Demo on ideone.
short nibble0 = (i >> 0) & 15;
short nibble1 = (i >> 4) & 15;
short nibble2 = (i >> 8) & 15;
short nibble3 = (i >> 12) & 15;
etc
Based on the comment explaining the actual use for this, here's an other way to count how many nibbles have an odd parity: (not tested)
; compute parities of nibbles
x ^= x >> 2;
x ^= x >> 1;
x &= 0x11111111;
; add the parities
x = (x + (x >> 4)) & 0x0F0F0F0F;
int count = x * 0x01010101 >> 24;
The first part is just a regular "xor all the bits" type of parity calculation (where "all bits" refers to all the bits in a nibble, not in the entire integer), the second part is based on this bitcount algorithm, skipping some steps that are unnecessary because certain bits are always zero and so don't have to be added.