Concatenate binary numbers of different lengths - c++

So I have 3 numbers. One is a char, and the other two are int16_t (also known as shorts, but according to a table I found shorts won't reliably be 16 bits).
I'd like to concatenate them together. So say that the values of them were:
10010001
1111111111111101
1001011010110101
I'd like to end up with a long long containing:
1001000111111111111111011001011010110101000000000000000000000000
Using some solutions I've found online, I came up with this:
long long result;
result = num1;
result = (result << 8) | num2;
result = (result << 24) | num3;
But it doesn't work; it gives me very odd numbers when it's decoded.
In case there's a problem with my decoding code, here it is:
char num1 = num & 0xff;
int16_t num2 = num << 8 & 0xffff;
int16_t num3 = num << 24 & 0xffff;
What's going on here? I suspect it has to do with the size of a long long, but I can't quite wrap my head around it and I want room for more numbers in it later.

To get the correct bit-pattern as you requested, you shoud use:
result = num1;
result = (result << 16) | num2;
result = (result << 16) | num3;
result<<=24;
This will yield the exact bit pattern that you requested, 24 bits at the lsb-end left 0:
1001000111111111111111011001011010110101000000000000000000000000

For that last shift, you should only be shifting by 16, not by 24. 24 is the current length of your binary string, after the combination of num1 and num2. You need to make room for num3, which is 16 bits, so shift left by 16.
Edit:
Just realized the first shift is wrong too. That should be 16 also, for similar reasons.

Yes you are overflowing the value that can be stored in long. You can use a arbitrary precison library to store the big number like the GMP.

If I understand correctly what you are doing, I would use:
result = num1;
result = (result << 16) | num2;
result = (result << 16) | num3;
num1out = (result >> 32) & 0xff;
num2out = (result >> 16) & 0xffff;
num3out = result & 0xffff;
The left shift during building is by the width of the next number to insert. The right shift on extraction is by the total number of bits the field was left shifted during building.
I have tested the above code. long long is wide enough for this task with the g++ compiler, and I believe many others.

Related

Bitwise or before casting to int32

I was messing about with arrays and noticed this. EG:
int32_t array[];
int16_t value = -4000;
When I tried to write the value into the top and bottom half of the int32 array value,
array[0] = (value << 16) | value;
the compiler would cast the value into a 32 bit value first before the doing the bit shift and the bitwise OR. Thus, instead of 16 bit -4000 being written in the top and bottom halves, the top value will be -1 and the bottom will be -4000.
Is there a way to OR in the 16 bit value of -4000 so both halves are -4000? It's not really a huge problem. I am just curious to know if it can be done.
Sure thing, just undo the sign-extension:
array[0] = (value << 16) | (value & 0xFFFF);
Don't worry, the compiler should handle this reasonably.
To avoid shifting a negative number:
array[0] = ((value & 0xFFFF) << 16) | (value & 0xFFFF);
Fortunately that extra useless & (even more of a NOP than the one on the right) doesn't show up in the code.
Left shift on signed types is defined only in some cases. From standard
6.5.7/4 [...] If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting
value; otherwise, the behavior is undefined.
According to this definition, it seems what you have is undefined behaviour.
use unsigned value:
const uint16_t uvalue = value
array[0] = (uvalue << 16) | uvalue;
Normally when faced with this kind of issue I first set the resultant value to zero, then bitwise assign the values in.
So the code would be:
int32_t array[1];
int16_t value = -4000;
array[0] = 0x0000FFFF;
array[0] &= value;
array[0] |= (value << 16);
Cast the 16 too. Otherwise the int type is contagious.
array[0] = (value << (int16_t 16)) | value;
Edit: I don't have a compiler handy to test. Per comment below, this may not be right, but will get you in the right direction.

8-digit BCD check

I've a 8-digit BCD number and need to check it out to see if it is a valid BCD number. How can I programmatically (C/C++) make this?
Ex: 0x12345678 is valid, but 0x00f00abc isn't.
Thanks in advance!
You need to check each 4-bit quantity to make sure it's less than 10. For efficiency you want to work on as many bits as you can at a single time.
Here I break the digits apart to leave a zero between each one, then add 6 to each and check for overflow.
uint32_t highs = (value & 0xf0f0f0f0) >> 4;
uint32_t lows = value & 0x0f0f0f0f;
bool invalid = (((highs + 0x06060606) | (lows + 0x06060606)) & 0xf0f0f0f0) != 0;
Edit: actually we can do slightly better. It doesn't take 4 bits to detect overflow, only 1. If we divide all the digits by 2, it frees a bit and we can check all the digits at once.
uint32_t halfdigits = (value >> 1) & 0x77777777;
bool invalid = ((halfdigits + 0x33333333) & 0x88888888) != 0;
The obvious way to do this is:
/* returns 1 if x is valid BCD */
int
isvalidbcd (uint32_t x)
{
for (; x; x = x>>4)
{
if ((x & 0xf) >= 0xa)
return 0;
}
return 1;
}
This link tells you all about BCD, and recommends something like this asa more optimised solution (reworking to check all the digits, and hence using a 64 bit data type, and untested):
/* returns 1 if x is valid BCD */
int
isvalidbcd (uint32_t x)
{
return !!(((uint64_t)x + 0x66666666ULL) ^ (uint64_t)x) & 0x111111110ULL;
}
For a digit to be invalid, it needs to be 10-15. That in turn means 8 + 4 or 8+2 - the low bit doesn't matter at all.
So:
long mask8 = value & 0x88888888;
long mask4 = value & 0x44444444;
long mask2 = value & 0x22222222;
return ((mask8 >> 2) & ((mask4 >>1) | mask2) == 0;
Slightly less obvious:
long mask8 = (value>>2);
long mask42 = (value | (value>>1);
return (mask8 & mask42 & 0x22222222) == 0;
By shifting before masking, we don't need 3 different masks.
Inspired by #Mark Ransom
bool invalid = (0x88888888 & (((value & 0xEEEEEEEE) >> 1) + (0x66666666 >> 1))) != 0;
// or
bool valid = !((((value & 0xEEEEEEEEu) >> 1) + 0x33333333) & 0x88888888);
Mask off each BCD digit's 1's place, shift right, then add 6 and check for BCD digit overflow.
How this works:
By adding +6 to each digit, we look for an overflow * of the 4-digit sum.
abcd
+ 110
-----
*efgd
But the bit value of d does not contribute to the sum, so first mask off that bit and shift right. Now the overflow bit is in the 8's place. This all is done in parallel and we mask these carry bits with 0x88888888 and test if any are set.
0abc
+ 11
-----
*efg

Checksum and Bitshift

I'm learning to create a raw packet and send it following this tutorial. Everything makes sense until i reach the code where the checksum is generated.
unsigned short csum (unsigned short *buf, int nwords)
{
unsigned long sum;
for (sum = 0; nwords > 0; nwords--)
sum += *buf++;
sum = (sum >> 16) + (sum & 0xffff);
sum += (sum >> 16);
return ~sum;
}
It looks like that he's summing up all the words in the buffer. but when I hit
sum = (sum >> 16) + (sum & 0xffff);
sum += (sum >> 16);
I get completely lost. Looks like he shifts all the bits right, essentially discarding all the bits except the carry over and then adding it back into the original sum? Why is the & 0xfff necessary? after all that, why does the add the carry out bits again? is it because there might be a second carry out?
The line:
sum = (sum >> 16) + (sum & 0xffff);
Adds the left and right 16-bit words in the 32-bit integer. It basically splits the number in half and adds the two halves together. sum>>16 gives you the left half, and sum & 0xffff gives you the right half.
Then when these 2 are added together, they could possible overflow. This line:
sum += (sum >> 16);
Adds the overflow back into the original number.
The checksum being computed is 16-bit (unsigned short is very often 16-bit), but the variable sum is unsigned long, and thus probably 32 bit.
So the operation sum >> 16 captures the high word of the sum, all the times that pairs of words have summed to more than 16 bit can hold. This is then mixed with sum & 0xffff which is just the low word of the sum.
This way, all bits of the sum are "folded in" so that they contribute to the final result.

What does (value << 32) >> 32 mean?

I am facing with the code where for me there is a extraordinary operations like this:
return std::pair<T1, T2>(value >> 32, ( (value << 32) >> 32) );
What does this mean: ( (value << 32) >> 32) .
Is this the same as just value ? (for me it is reasonable when the size of value type is 64 bit)
Is this the same as just value?
No.
since zero's are shifted in (value << 32) >> 32) discards the top 32 bits.
(value << 32) >> 32) is the bottom 32 bits
value >> 32 is the top 32 bits
[That's assuming you start with a 64 bit type. If you have a 32 bit type, then it's undefined behaviour]
The code may aiming at split the an unsigned 64-bit integer ( uint64_t ) to 2 parts:
The low components ( bits 0 .. 31 )
And the high components ( bits 32 .. 63 )
val >> 32, will get the high components of v.
and (val << 32) >> 32 will get the low components of v.
val: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
val >> 32: 00000000000000000000000000000000 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
val << 32: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy 00000000000000000000000000000000
(val<<32)>>32: 0000000000000000000000000000000 yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
Notice: if val if a signed integer, you may not get the answer as expected.
When value's type is a 64bit integer, then (value << 32) >> 32) will return the "rightmost" 32 Bits of it!
return std::pair<T1, T2>(value >> 32, (value << 32) >> 32) ); actually splits a 64 Bit integer into its two 32 Bit parts ;)
It looks like this code is attempting to split a 64-bit number into a high 32-bit word and a low 32-bit word.
If we assume that value is an unsigned 64-bit integer then:
value >> 32
is the most significant 32-bit word (the bottom 32-bits spill off the end as the high word is shifted into the low word position). And:
(value << 32) >> 32
is the least significant 32-bit word rammed up into the high portion (bumping off the existing high portion) and then moved back down into the lower 32-bit section. This could also be achieved by stating:
value & 0xFFFFFFFF
<< and >> are bitwise operands.
They shift the bits of the number to the right >> or to the left <<.
For a 64 bit integer:
value >> 32 = top 32 bits.
(value << 32) >> 32 = bottom 32 bits.
If your value is, as you say, 64bits long, this expression will cut value in order to fill the 'leftmost' 32bits with 0 :
Suppose value is :
0xA3B252A2ADAEACA0
Then value << 32 is :
0xADAEACA000000000
And (value << 32) >> 32 is :
0x00000000ADAEACA0
The first part of you std::pair is value << 32 and it will do the opposite of that, returning only the 'leftmost' half of value :
0x00000000A3B252A2
Your instruction line will cut your 64bits into two 32bits consecutive values.

Grabbing n bits from a byte

I'm having a little trouble grabbing n bits from a byte.
I have an unsigned integer. Let's say our number in hex is 0x2A, which is 42 in decimal. In binary it looks like this: 0010 1010. How would I grab the first 5 bits which are 00101 and the next 3 bits which are 010, and place them into separate integers?
If anyone could help me that would be great! I know how to extract from one byte which is to simply do
int x = (number >> (8*n)) & 0xff // n being the # byte
which I saw on another post on stack overflow, but I wasn't sure on how to get separate bits out of the byte. If anyone could help me out, that'd be great! Thanks!
Integers are represented inside a machine as a sequence of bits; fortunately for us humans, programming languages provide a mechanism to show us these numbers in decimal (or hexadecimal), but that does not alter their internal representation.
You should review the bitwise operators &, |, ^ and ~ as well as the shift operators << and >>, which will help you understand how to solve problems like this.
The last 3 bits of the integer are:
x & 0x7
The five bits starting from the eight-last bit are:
x >> 3 // all but the last three bits
& 0x1F // the last five bits.
"grabbing" parts of an integer type in C works like this:
You shift the bits you want to the lowest position.
You use & to mask the bits you want - ones means "copy this bit", zeros mean "ignore"
So, in you example. Let's say we have a number int x = 42;
first 5 bits:
(x >> 3) & ((1 << 5)-1);
or
(x >> 3) & 31;
To fetch the lower three bits:
(x >> 0) & ((1 << 3)-1)
or:
x & 7;
Say you want hi bits from the top, and lo bits from the bottom. (5 and 3 in your example)
top = (n >> lo) & ((1 << hi) - 1)
bottom = n & ((1 << lo) - 1)
Explanation:
For the top, first get rid of the lower bits (shift right), then mask the remaining with an "all ones" mask (if you have a binary number like 0010000, subtracting one results 0001111 - the same number of 1s as you had 0-s in the original number).
For the bottom it's the same, just don't have to care with the initial shifting.
top = (42 >> 3) & ((1 << 5) - 1) = 5 & (32 - 1) = 5 = 00101b
bottom = 42 & ((1 << 3) - 1) = 42 & (8 - 1) = 2 = 010b
You could use bitfields for this. Bitfields are special structs where you can specify variables in bits.
typedef struct {
unsigned char a:5;
unsigned char b:3;
} my_bit_t;
unsigned char c = 0x42;
my_bit_t * n = &c;
int first = n->a;
int sec = n->b;
Bit fields are described in more detail at http://www.cs.cf.ac.uk/Dave/C/node13.html#SECTION001320000000000000000
The charm of bit fields is, that you do not have to deal with shift operators etc. The notation is quite easy. As always with manipulating bits there is a portability issue.
int x = (number >> 3) & 0x1f;
will give you an integer where the last 5 bits are the 8-4 bits of number and zeros in the other bits.
Similarly,
int y = number & 0x7;
will give you an integer with the last 3 bits set the last 3 bits of number and the zeros in the rest.
just get rid of the 8* in your code.
int input = 42;
int high3 = input >> 5;
int low5 = input & (32 - 1); // 32 = 2^5
bool isBit3On = input & 4; // 4 = 2^(3-1)