C++ bitwise masking and shift - c++

I have a legacy C++ binary operation code like this:
constexpr static uint64_t val2bin(double val) {
uint64_t bin = (val > 0.) ? uint64_t(val/0.01)+1 : 0;
return (bin > 510) ? 511 : bin;
}
And the result is then used as part of expression:
static uint64_t const MASK = 0x1FF;
static uint64_t const VAL_OFFSET = 10;
((val2bin(val) & MASK) << VAL_OFFSET)
Doesn't it put an upper limit on val we can encode in this way? I am not able to comprehend what we are trying to achieve functionally by this?

Doesn't it put an upper limit on val we can encode in this way?
It will put an upper limit on the output you can have.
The operation performed here:
static uint64_t const MASK = 0x1FF;
static uint64_t const VAL_OFFSET = 10;
((val2bin(val) & MASK) << VAL_OFFSET)
will give val<<VAL_OFFSET as the output for val < 0x1FF and hence it won't cross the limit of uint64_t. For values greater than 0x1FF, the output will be MASK << VAL_OFFSET which still won't cross the limit of uint64_t (i.e. MASK = 511 therefore, the output is 523264). This is the aforementioned "upper limit" on the output i.e. 523264.
I am not able to comprehend what we are trying to achieve functionally by this?
Even I am not able to exactly comprehend what functionality we are trying to achieve here. Maybe you can provide some extra piece of code around this snippet and it may help.
I know this answer states the obvious but that's all I could infer as of now.
Best.

Related

How to safely extract a signed field from a uint32_t into a signed number (int or uint32_t)

I have a project in which I am getting a vector of 32-bit ARM instructions, and a part of the instructions (offset values) needs to be read as signed (two's complement) numbers instead of unsigned numbers.
I used a uint32_t vector because all the opcodes and registers are read as unsigned and the whole instruction was 32-bits.
For example:
I have this 32-bit ARM instruction encoding:
uint32_t addr = 0b00110001010111111111111111110110
The last 19 bits are the offset of the branch that I need to read as signed integer branch displacement.
This part: 1111111111111110110
I have this function in which the parameter is the whole 32-bit instruction:
I am shifting left 13 places and then right 13 places again to have only the offset value and move the other part of the instruction.
I have tried this function casting to different signed variables, using different ways of casting and using other c++ functions, but it prints the number as it was unsigned.
int getCat1BrOff(uint32_t inst)
{
uint32_t temp = inst << 13;
uint32_t brOff = temp >> 13;
return (int)brOff;
}
I get decimal number 524278 instead of -10.
The last option that I think is not the best one, but it may work is to set all the binary values in a string. Invert the bits and add 1 to convert them and then convert back the new binary number into decimal. As I would of do it in a paper, but it is not a good solution.
It boils down to doing a sign extension where the sign bit is the 19th one.
There are two ways.
Use arithmetic shifts.
Detect sign bit and or with ones at high bits.
There is no portable way to do 1. in C++. But it can be checked on compilation time. Please correct me if the code below is UB, but I believe it is only implementation defined - for which we check at compile time.
The only questionable thing is conversion of unsigned to signed which overflows, and the right shift, but that should be implementation defined.
int getCat1BrOff(uint32_t inst)
{
if constexpr (int32_t(0xFFFFFFFFu) >> 1 == int32_t(0xFFFFFFFFu))
{
return int32_t(inst << uint32_t{13}) >> int32_t{13};
}
else
{
int32_t offset = inst & 0x0007FFFF;
if (offset & 0x00040000)
{
offset |= 0xFFF80000;
}
return offset;
}
}
or a more generic solution
template <uint32_t N>
int32_t signExtend(uint32_t value)
{
static_assert(N > 0 && N <= 32);
constexpr uint32_t unusedBits = (uint32_t(32) - N);
if constexpr (int32_t(0xFFFFFFFFu) >> 1 == int32_t(0xFFFFFFFFu))
{
return int32_t(value << unusedBits) >> int32_t(unusedBits);
}
else
{
constexpr uint32_t mask = uint32_t(0xFFFFFFFFu) >> unusedBits;
value &= mask;
if (value & (uint32_t(1) << (N-1)))
{
value |= ~mask;
}
return int32_t(value);
}
}
https://godbolt.org/z/rb-rRB
In practice, you just need to declare temp as signed:
int getCat1BrOff(uint32_t inst)
{
int32_t temp = inst << 13;
return temp >> 13;
}
Unfortunately this is not portable:
For negative a, the value of a >> b is implementation-defined (in most
implementations, this performs arithmetic right shift, so that the
result remains negative).
But I have yet to meet a compiler that doesn't do the obvious thing here.

Can not flip sign

I found a weird bug that happens when i try to flip the sign of the number -9223372036854775808, which does simply nothing.
I get the same number back or at least that's what the debugger shows me.
Is there a way to solve this without branching?
#define I64_MAX 9223372036854775807LL
#define I64_MIN (-I64_MAX-1)
// -9223372036854775808 (can not be a constant in code as it will turn to ull)
using i64 = long long int;
int main()
{
i64 i = I64_MIN;
i = -i;
printf("%lld",i);
return 0;
}
Does the same thing with i32,i16,i8.
EDIT:
Current Fix:
// use template??
c8* szi32(i32 num,c8* in)
{
u32 number = S(u32,num);
if(num < 0)
{
in[0] = '-';
return SerializeU32(number,&in[1]);
}
else
{
return SerializeU32(number,in);
}
}
You can't do it in a completely portable way. Rather than dealing with int64_t, let us consider int8_t. The principle is almost exactly the same, but the numbers are much easier to deal with. I8_MAX will be 127, and I8_MIN will be -128. Negating I8_MIN will give 128, and there is no way to store that in int8_t.
Unless you have strong evidence that this is a bottleneck, then the right answer is:
constexpr int8_t negate(int8_t i) {
return (i==I8_MIN) ? I8_MAX : -i;
}
If you do have such evidence, then you will need to investigate some platform dependent code - perhaps a compiler intrinsic of some sort, perhaps some clever bit-twiddling which avoids a conditional jump.
Edit: Possible branchless bit-twiddling
constexpr int8_t negate(int8_t i) {
const auto ui = static_cast<uint8_t>(i);
// This will calculate the two's complement negative of ui.
const uint8_t minus_ui = ~ui+1;
// This will have the top bit set if, and only if, i was I8_MIN
const uint8_t top_bit = ui & minus_ui;
// Need to get top_bit into the 1 bit. Either use a compiler intrinsic rotate:
const int8_t bottom_bit = static_cast<int8_t>(rotate_left(top_bit)) & 1;
// -or- hope that your implementation does something sensible when you
// shift a negative number (most do).
const int8_t arithmetic_shifted = static_cast<int8_t>(top_bit) >> 7;
const int8_t bottom_bit = arithmetic_shifted & 1;
// Either way, at this point, bottom_bit is 1 if and only if i was
// I8_MIN, otherwise it is zero.
return -(i+bottom_bit);
}
You would need to profile to determine whether that is actually faster. Another option would be to shift top_bit into the carry bit, and use add-with-carry (adding a constant zero), or write it in assembler, and use an appropriate conditionally executed instruction.

How to grab specific bits from a 256 bit message?

I'm using winsock to receive udp messages 256 bits long. I use 8 32-bit integers to hold the data.
int32_t dataReceived[8];
recvfrom(client, (char *)&dataReceived, 8 * sizeof(int), 0, &fromAddr, &fromLen);
I need to grab specific bits like, bit #100, #225, #55, etc. So some bits will be in dataReceived[3], some in dataReceived[4], etc.
I was thinking I need to bitshift each array, but things got complicated. Am I approaching this all wrong?
Why are you using int32_t type for buffer elements and not uint32_t?
I usually use something like this:
int bit_needed = 100;
uint32_t the_bit = dataReceived[bit_needed>>5] & (1U << (bit_needed & 0x1F));
Or you can use this one (but it won't work for sign in signed integers):
int bit_needed = 100;
uint32_t the_bit = (dataReceived[bit_needed>>5] >> (bit_needed & 0x1F)) & 1U;
In other answers you can access only lowes 8bits in each int32_t.
When you count bits and bytes from 0:
int bit_needed = 100;
So:
int byte = int(bit_needed / 8);
int bit = bit_needed % 8;
int the_bit = dataReceived[byte] & (1 << bit);
If the recuired bit contains 0, then the_bit will be zero. If it's 1, then the_bit will hold 2 to the power of that bit ordinal place within the byte.
You can make a small function to do the job.
uint8_t checkbit(uint32_t *dataReceived, int bitToCheck)
{
byte = bitToCheck/32;
bit = bitToCheck - byte*32;
if( dataReceived[byte] & (1U<< bit))
return 1;
else
return 0;
}
Note that you should use uint32_t rather than int32_t, if you are using bit shifting. Signed integer bit shifts lead to unwanted results, especially if the MSbit is 1.
You can use a macro in C or C++ to check for specific bit:
#define bit_is_set(var,bit) ((var) & (1 << (bit)))
and then a simple if:
if(bit_is_set(message,29)){
//bit is set
}

Appropriate hashing function to hash random binary strings

i have an two arrays : char data1[length] where length is a multiple of 8 i.e length can be 8, 16,24 ... The array contains binary data read from a file that is open in binary mode. I will keep reading from the file and everytime i read i will store the read value in a hash table. The disterbution of this binary data has a random distribution. I would like to hash each array and store them in a hash table in order to be able to look for the char with the specific data again. What would be a good hashing function to achive this task. Thanks
Please note that i am writing this in c++ and c so any language you choose to provide a solution for would be great.
If the data that you read is 8 bytes long and really distributed randomly, and your hashcode needs to be 32 bits, what about this:
uint32_t hashcode(const unsigned char *data) {
uint32_t hash = 0;
hash ^= get_uint32_le(data + 0);
hash ^= get_uint32_le(data + 4);
return hash;
}
uint32_t get_uint32_le(const unsigned char *data) {
uint32_t value = 0;
value |= data[0] << 0;
value |= data[1] << 8;
value |= data[2] << 16;
value |= data[3] << 24;
return value;
}
If you need more speed, this code can probably made a lot faster if you can guarantee that data is always properly aligned to be interpreted as an const uint32_t *.
I have successfully used MurmurHash3 in one of my projects.
Pros:
It is fast. Very fast.
It supposedly has a low collision rate.
Cons:
It's not suitable for cryptography applications.
It's not standardized in any shape or form.
It's not portable to non-x86 platforms. However, it's small enough that you should be able to port it if you really need to - I was able to port it to Java, although that's not nearly the same thing.
It's a good possibility for use in e.g. a fast hash-table implementation...

checksum calculation

To calculate CRC I found a piece of code but I am not understanding the concept.
Here is the code:
count =128 and ptr=some value;
calcrc(unsigned char *ptr, int count)
{
unsigned short crc;
unsigned char i;
crc = 0;
while (--count >= 0)
{
crc = crc ^ (unsigned short)*ptr++ << 8;
i = 8;
do
{
if (crc & 0x8000)
crc = crc << 1 ^ 0x1021;
else
crc = crc << 1;
} while(--i);
}
return (crc);
}
Please any body explain and tell me the logic.
This looks like a CRC (specifically it looks like CRC-16-CCITT, used by things like 802.15.4, X.25, V.41, CDMA, Bluetooth, XMODEM, HDLC, PPP and IrDA). You might want to read up on the CRC theory on the linked-to Wikipedia page, to gain some more insight. Or you can view this as a "black box" that just solves the problem of computing a checksum.
You will probably need to know that in C, the ^ operator is a bitwise XOR operator and the << operator is the left shift operator (equivalent to multiplication by 2 to the power of the number on the right of the operator). Also the crc & 0x8000 expression is testing for the most significant bit set of the variable crc.
This will help you to work out a low level explanation of what is occurring when this runs, for a high level explanation of what a CRC is and why you might need it, read the Wikipedia page or How Stuff Works.
One famous text on CRCs is "A Painless Guide to CRC Error Detection Algorithms" by Ross Williams. It takes some time to absorb but it's pretty thorough.
Take a look at my answer to
How could I guess a checksum algorithm?