why can't you shift a uint16_t [duplicate] - c++

This question already has an answer here:
right shift count >= width of type or left shift count >= width of type
(1 answer)
Closed 3 years ago.
I am trying to fill a 64-bit unsigned variable by combining 16-bit and 8-bit values:
uint8_t byte0 = 0x00;
uint8_t byte1 = 0xAA;
uint8_t byte2 = 0x00;
uint8_t byte3 = 0xAA;
uint16_t hword0 = 0xAA00;
uint16_t hword1 = 0xAAAA;
uint64_t result = ( hword0 << 32 ) + ( byte3 << 24 ) +
( byte2 << 16 ) + ( byte1 << 8 ) + ( byte0 << 0 );
This gives me a warning.
left shift count >= width of type [-Wshift-count-overflow]
uint64_t result = ( hword0 << 32 )

hword0 is 16 bits long and you request for a 32 bit shift. Shifting more than the number of bits - 1 is undefined.
Solution is to convert your components to the destination type : uint64_t result = ( ((uint64_t)hword0) << 32 ) + etc.

As opposed to your question tile, you can shift a uint16_t. But you cannot shift it (losslessly) by more than its width.
Your input operand's type is applied to the output operand as well, so in your original question, you have a uint16_t << 32 which is 0 (because any value shifted by 32 to the left and then clipped to 16 bits is 0), and so are nearly all of your uint8_t values.
The solution is simple: before shifting, cast your values to the appropriate type suitable for shifting:
uint64_t result = ( (uint64_t)hword0 << 32 ) +
( (uint32_t)byte3 << 24 ) + ( (uint32_t)byte2 << 16 ) + ( (uint32_t)byte1 << 8 ) + ( (uint32_t)byte0 << 0 );

You can shift a uint16_t. What you can't do is shift an integer value by a number greater than or equal to the size of the type. Doing so invokes undefined behavior. This is documented in section 6.5.7p3 of the C standard regarding bitwise shift operators:
The integer promotions are performed on each of the operands. The
type of the result is that of the promoted left operand. If
the value of the right operand is negative or is greater than
or equal to the width of the promoted left operand, the behavior is
undefined.
You would think that this means that any shift greater than or equal to 16 on a uint16_t is not valid. However, as mentioned above the operands of the << operator are subject to integer promotion. This means that any value with a rank lower than int is promoted to int before being used in an expression. So if int is 32 bits on your system, then you can left shift up to 31 bits.
This is why ( byte3 << 24 ) + ( byte2 << 16 ) + ( byte1 << 8 ) + ( byte0 << 0 ) don't generate a warning even though byte is a uint8_t while ( hword0 << 32 ) is not. There is still an issue here however because of the promotion to int. Because the promoted value is now signed, you run the risk of shifting a 1 into the sign bit. Doing so invokes undefined behavior as well.
To fix this, any value that is shifted left by 32 or more must be first casted to uint64_t so that the value can be operated on properly, as well as any value that may end up shifting a 1 into the sign bit:
uint64_t result = ( (uint64_t)hword0 << 32 ) +
( (uint64_t)byte3 << 24 ) + ( (uint64_t)byte2 << 16 ) +
( (uint64_t)byte1 << 8 ) + ( byte0 << 0 );

According to the warning, 32 bits is more or equal to the size of the operand on the target system. The C++ standard says:
[expr.shift]
The operands shall be of integral or unscoped enumeration type and integral promotions are performed.The type of the result is that of the promoted left operand. The behavior is undefined if the right operandis negative, or greater than or equal to the length in bits of the promoted left operand.
Corresponding rule from the C standard:
Bitwise shift operators
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
According to the quoted rule, the behaviour of the your program is undefined whether it is written in C or C++.
You can solve the problem by explicitly converting the left hand operand of the shift to a sufficient large unsigned type.
P.S. On systems where uint16_t is smaller than int (which is quite typical), a uint16_t oprand will be promoted to int when used as an arithmetic operand. As such, byte2 << 16 is not unconditionally† undefined on such systems. You shouldn't rely on this detail, but that explains why you see no warning from the compiler regarding that shift.
† byte2 << 16 can still be undefined if the result is outside the range of representable values of the (signed) int type. It would be well defined if the promoted type was unsigned.

byte2 << 16
is left-shifting an 8-byte value 16 bytes. That won't work. Per 6.5.7 Bitwise shift operators, paragraph 4 of the C standard:
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 x 2E2 , reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 x 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
Since you're using a left shift on unsigned values, you get zero.
EDIT
Per paragraph 3 of the same section, it's actually undefined behavior:
If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
You want something like
( ( uint64_t ) byte2 ) << 16
The cast to a 64-bit value will ensure the result doesn't lose bits.

To do what you want to do, the key idea is to use intermediate uint64_t (the final size) in which to shuffle bits.
The following compiles with no warnings:
you can use auto promotion (and no cast)
{
uint64_t b4567 = hword0; // auto promotion
uint64_t b3 = byte3;
uint64_t b2 = byte2;
uint64_t b1 = byte1;
uint64_t b0 = byte0;
uint64_t result = (
(b4567 << 32) |
(b3 << 24) |
(b2 << 16) |
(b1 << 8) |
(b0 << 0) );
}
you can also use static cast (multiple times):
{
uint64_t result = (
(static_cast<uint64_t>(hword0) << 32) |
(static_cast<uint64_t>(byte3) << 24) |
(static_cast<uint64_t>(byte2) << 16) |
(static_cast<uint64_t>(byte1) << 8) |
(static_cast<uint64_t>(byte0) << 0 )
);
cout << "\n " << hex << result << endl;
}
And you can do both by creating a function to a) perform the static cast and b) with a formal parameter to get the compiler to auto-promote.
function looks like:
// vvvvvvvv ---- formal parameter
uint64_t sc (uint64_t ui64) {
return static_cast<uint64_t>(ui64);
}
// using static cast function
{
uint64_t result = (
(sc(hword0) << 32) |
(sc(byte3) << 24) |
(sc(byte2) << 16) |
(sc(byte1) << 8) |
(sc(byte0) << 0)
);
cout << "\n " << hex << result << endl;
}

From a C perspective:
Much discussion here omits that a uint8_t applied to a shift (left or right) is first promoted to an int, and then the shift rules are applied.
Same occurs with uint16_t when int is 32-bit. (17 bit or more)
When int is 32-bit
hword0 << 32 is UB due to the shift amount too great: outside 0 to 31.
byte3 << 24 is UB when attempting to shift into the sign bit. byte3 & 0x80 is true.
Other shifts are OK.
Had int been 64-bit, OP's original code is fine - no UB, including hword0 << 32.
Had int been 16-bit, all of code's shifts (aside from << 0) are UB or potential UB.
To do this, without casting (Something I try to avoid), consider
// uint64_t result = (hword0 << 32) + (byte3 << 24) + (byte2 << 16) + (byte1 << 8) + byte0
// Let an optimizing compiler do its job
uint64_t result = hword0;
result <<= 8;
result += byte3;
result <<= 8;
result += byte2;
result <<= 8;
result += byte1;
result <<= 8;
result += byte0;
Or
uint64_t result = (1ull*hword0 << 32) + (1ul*byte3 << 24) + (1ul*byte2 << 16) +
(1u*byte1 << 8) + byte0;

Related

C++ Bitshift 4 int_8t into a normal integer (32 bit )

I had already asked a question how to get 4 int8_t into a 32bit int, I was told that I have to cast the int8_t to a uint8_t first to pack it with bitshifting into a 32bit integer.
int8_t offsetX = -10;
int8_t offsetY = 120;
int8_t offsetZ = -60;
using U = std::uint8_t;
int toShader = (U(offsetX) << 24) | (U(offsetY) << 16) | (U(offsetZ) << 8) | (0 << 0);
std::cout << (int)(toShader >> 24) << " "<< (int)(toShader >> 16) << " " << (int)(toShader >> 8) << std::endl;
My Output is
-10 -2440 -624444
It's not what I expected, of course, does anyone have a solution?
In the shader I want to unpack the int16 later and that is only possible with a 32bit integer because glsl does not have any other data types.
int offsetX = data[gl_InstanceID * 3 + 2] >> 24;
int offsetY = data[gl_InstanceID * 3 + 2] >> 16 ;
int offsetZ = data[gl_InstanceID * 3 + 2] >> 8 ;
What is written in the square bracket does not matter it is about the correct shifting of the bits or casting after the bracket.
If any of the offsets is negative, then the shift results in undefined behaviour.
Solution: Convert the offsets to an unsigned type first.
However, this brings another potential problem: If you convert to unsigned, then negative numbers will have very large values with set bits in most significant bytes, and OR operation with those bits will always result in 1 regardless of offsetX and offsetY. A solution is to convert into a small unsigned type (std::uint8_t), and another is to mask the unused bytes. Former is probably simpler:
using U = std::uint8_t;
int third = U(offsetX) << 24u
| U(offsetY) << 16u
| U(offsetZ) << 8u
| 0u << 0u;
I think you're forgetting to mask the bits that you care about before shifting them.
Perhaps this is what you're looking for:
int32 offsetX = (data[gl_InstanceID * 3 + 2] & 0xFF000000) >> 24;
int32 offsetY = (data[gl_InstanceID * 3 + 2] & 0x00FF0000) >> 16 ;
int32 offsetZ = (data[gl_InstanceID * 3 + 2] & 0x0000FF00) >> 8 ;
if (offsetX & 0x80) offsetX |= 0xFFFFFF00;
if (offsetY & 0x80) offsetY |= 0xFFFFFF00;
if (offsetZ & 0x80) offsetZ |= 0xFFFFFF00;
Without the bit mask, the X part will end up in offsetY, and the X and Y part in offsetZ.
on CPU side you can use union to avoid bit shifts and bit masking and branches ...
int8_t x,y,z,w; // your 8bit ints
int32_t i; // your 32bit int
union my_union // just helper union for the casting
{
int8_t i8[4];
int32_t i32;
} a;
// 4x8bit -> 32bit
a.i8[0]=x;
a.i8[1]=y;
a.i8[2]=z;
a.i8[3]=w;
i=a.i32;
// 32bit -> 4x8bit
a.i32=i;
x=a.i8[0];
y=a.i8[1];
z=a.i8[2];
w=a.i8[3];
If you do not like unions the same can be done with pointers...
Beware on GLSL side is this not possible (nor unions nor pointers) and you have to use bitshifts and masks like in the other answer...

Wrong result with bitwise inclusive OR

I can't figure out why does inclusive OR return wrong result.
char arr[] = { 0x0a, 0xc0 };
uint16_t n{};
n = arr[0]; // I get 0x000a here.
n = n << 8; // Shift to the left and get 0x0a00 here.
n = n | arr[1]; // But now the n value is 0xffc0 instead of 0x0ac0.
What is the mistake in this example? Console app, MVS Community 2017.
The unintended 0xff is caused by sign bit extension of 0xc0.
0xc0 = 0b11000000
Hence, the uppermost bit is set which means sign for char (as signed char).
Please, note that all arithmetic and bitwise operations in C++ work with at least int (or unsigned int). Smaller types are promoted before and clipped afterwards.
Please, note also that char may be signed or unsigned. That's compiler implementation dependent. Obviously, it's signed in the case of OP. To prevent the unintended sign extension, the argument has to become unsigned (early enough).
Demonstration:
#include <iostream>
int main()
{
char arr[] = { '\x0a', '\xc0' };
uint16_t n{};
n = arr[0]; // I get 0x000a here.
n = n << 8; // Shift to the left and get 0x0a00 here.
n = n | arr[1]; // But now the n value is 0xffc0 instead of 0x0ac0.
std::cout << std::hex << "n (wrong): " << n << std::endl;
n = arr[0]; // I get 0x000a here.
n = n << 8; // Shift to the left and get 0x0a00 here.
n = n | (unsigned char)arr[1]; // (unsigned char) prevents sign extension
std::cout << std::hex << "n (right): " << n << std::endl;
return 0;
}
Session:
g++ -std=c++11 -O2 -Wall -pthread main.cpp && ./a.out
n (wrong): ffc0
n (right): ac0
Life demo on coliru
Note:
I had to changechar arr[] = { 0x0a, 0xc0 };to char arr[] = { '\x0a', '\xc0' }; to come around serious compiler complaints. I guess, these complaints where strongly related to this issue.
I got it to work correctly by doing:
int arr[] = { 0x0a, 0xc0 };
int n{};
n = arr[0]; // I get 0x000a here.
n = n << 8; // Shift to the left and get 0x0a00 here.
n = n | arr[1];
std::cout << n << std::endl;
There was some truncation if you leave the 'arr' array as char.
You have fallen a victim to signed integer promotion.
When assigning 0xc0 to the second element (signed char default because of MVS) in the array, this is represented as follows:
arr[1] = 1100 - 0000, or in decimal -64
When this is cast to an uint16_t, it is promoted to an integer with the value -64. This is:
n = 1111 - 1111 - 1100 - 0000 = -64
due to the 2's complement implementation of integers.
Therefore:
n = 1111 - 1111 - 1100 - 0000
arr[1] = 0000 - 0000 - 1010 - 0000 (after being promoted)
n | arr[1] = 1111 - 1111 -1110-0000 = 0xffc0

c++ 64 bit network to host translation

I know there are answers for this question using using gcc byteswap and other alternatives on the web but was wondering why my code below isn't working.
Firstly I have gcc warnings ( which I feel shouldn't be coming ) and reason why I don't want to use byteswap is because I need to determine if my machine is big endian or little endian and use byteswap accordingly i.,e if my machine is big endian I could memcpy the bytes as is without any translation otherwise I need to swap them and copy it.
static inline uint64_t ntohl_64(uint64_t val)
{
unsigned char *pp =(unsigned char *)&val;
uint64_t val2 = ( pp[0] << 56 | pp[1] << 48
| pp[2] << 40 | pp[3] << 32
| pp[4] << 24 | pp[5] << 16
| pp[6] << 8 | pp[7]);
return val2;
}
int main()
{
int64_t a=0xFFFF0000;
int64_t b=__const__byteswap64(a);
int64_t c=ntohl_64(a);
printf("\n %lld[%x] [%lld] [%lld]\n ", a, a, b, c);
}
Warnings:-
In function \u2018uint64_t ntohl_64(uint64_t)\u2019:
warning: left shift count >= width of type
warning: left shift count >= width of type
warning: left shift count >= width of type
warning: left shift count >= width of type
Output:-
4294901760[00000000ffff0000] 281470681743360[0000ffff00000000] 65535[000000000000ffff]
I am running this on a little endian machine so byteswap and ntohl_64 should result in exact same values but unfortunately I get completely unexpected results. It would be great if someone can pointout whats wrong.
The reason your code does not work is because you're shifting unsigned chars. As they shift the bits fall off the top and any shift greater than 7 can be though of as returning 0 (though some implementations end up with weird results due to the way the machine code shifts work, x86 is an example). You have to cast them to whatever you want the final size to be first like:
((uint64_t)pp[0]) << 56
Your optimal solution with gcc would be to use htobe64. This function does everything for you.
P.S. It's a little bit off topic, but if you want to make the function portable across endianness you could do:
Edit based on Nova Denizen's comment:
static inline uint64_t htonl_64(uint64_t val)
{
union{
uint64_t retVal;
uint8_t bytes[8];
};
bytes[0] = (val & 0x00000000000000ff);
bytes[1] = (val & 0x000000000000ff00) >> 8;
bytes[2] = (val & 0x0000000000ff0000) >> 16;
bytes[3] = (val & 0x00000000ff000000) >> 24;
bytes[4] = (val & 0x000000ff00000000) >> 32;
bytes[5] = (val & 0x0000ff0000000000) >> 40;
bytes[6] = (val & 0x00ff000000000000) >> 48;
bytes[7] = (val & 0xff00000000000000) >> 56;
return retVal;
}
static inline uint64_t ntohl_64(uint64_t val)
{
union{
uint64_t inVal;
uint8_t bytes[8];
};
inVal = val;
return bytes[0] |
((uint64_t)bytes[1]) << 8 |
((uint64_t)bytes[2]) << 16 |
((uint64_t)bytes[3]) << 24 |
((uint64_t)bytes[4]) << 32 |
((uint64_t)bytes[5]) << 40 |
((uint64_t)bytes[6]) << 48 |
((uint64_t)bytes[7]) << 56;
}
Assuming the compiler doesn't do something to the uint64_t on it's way back through the return, and assuming the user treats the result as an 8-byte value (and not an integer), that code should work on any system. With any luck, your compiler will be able to optimize out the whole expression if you're on a big endian system and use some builtin byte swapping technique if you're on a little endian machine (and it's guaranteed to still work on any other kind of machine).
uint64_t val2 = ( pp[0] << 56 | pp[1] << 48
| pp[2] << 40 | pp[3] << 32
| pp[4] << 24 | pp[5] << 16
| pp[6] << 8 | pp[7]);
pp[0] is an unsigned char and 56 is an int, so pp[0] << 56 performs the left-shift as an unsigned char, with an unsigned char result. This isn't what you want, because you want all these shifts to have type unsigned long long.
The way to fix this is to cast, like ((unsigned long long)pp[0]) << 56.
Since pp[x] is 8-bit wide, the expression pp[0] << 56 results in zero. You need explicit masking on the original value and then shifting:
uint64_t val2 = (( val & 0xff ) << 56 ) |
(( val & 0xff00 ) << 48 ) |
...
In any case, just use compiler built-ins, they usually result in a single byte-swapping instruction.
Casting and shifting works as PlasmaHH suggesting but I don't know why 32 bit shifts upconvert automatically and not 64 bit.
typedef uint64_t __u64;
static inline uint64_t ntohl_64(uint64_t val)
{
unsigned char *pp =(unsigned char *)&val;
return ((__u64)pp[0] << 56 |
(__u64)pp[1] << 48 |
(__u64)pp[2] << 40 |
(__u64)pp[3] << 32 |
(__u64)pp[4] << 24 |
(__u64)pp[5] << 16 |
(__u64)pp[6] << 8 |
(__u64)pp[7]);
}

retrieve last 6 bits from an integer

I need to fetch last 6 bits of a integer or Uint32. For example if I have a value of 183, I need last six bits which will be 110 111 ie 55.
I have written a small piece of code, but it's not behaving as expected. Could you guys please point out where I am making a mistake?
int compress8bitTolessBit( int value_to_compress, int no_of_bits_to_compress )
{
int ret = 0;
while(no_of_bits_to_compress--)
{
std::cout << " the value of bits "<< no_of_bits_to_compress << std::endl;
ret >>= 1;
ret |= ( value_to_compress%2 );
value_to_compress /= 2;
}
return ret;
}
int _tmain(int argc, _TCHAR* argv[])
{
int val = compress8bitTolessBit( 183, 5 );
std::cout <<" the value is "<< val << std::endl;
system("pause>nul");
return 0;
}
You have entered the realm of binary arithmetic. C++ has built-in operators for this kind of thing. The act of "getting certain bits" of an integer is done with an "AND" binary operator.
0101 0101
AND 0000 1111
---------
0000 0101
In C++ this is:
int n = 0x55 & 0xF;
// n = 0x5
So to get the right-most 6 bits,
int n = original_value & 0x3F;
And to get the right-most N bits,
int n = original_value & ((1 << N) - 1);
Here is more information on
Binary arithmetic operators in C++
Binary operators in general
I don't get the problem, can't you just use bitwise operators? Eg
u32 trimmed = value & 0x3F;
This will keep just the 6 least significant bits by using the bitwise AND operator.
tl;dr:
int val = x & 0x3F;
int value = input & ((1 << (no_of_bits_to_compress + 1) - 1)
This one calculates the (n+1)th power of two: 1 << (no_of_bits_to_compress + 1) and subtracts 1 to get a mask with all n bits set.
The last k bits of an integer A.
1. A % (1<<k); // simply A % 2^k
2. A - ((A>>k)<<k);
The first method uses the fact that the last k bits is what is trimmed after doing k right shits(divide by 2^k).

Unsigned long and bit shifting

I have a problem with bit shifting and unsigned longs. Here's my test code:
char header[4];
header[0] = 0x80;
header[1] = 0x00;
header[2] = 0x00;
header[3] = 0x00;
unsigned long l1 = 0x80000000UL;
unsigned long l2 = ((unsigned long) header[0] << 24) + ((unsigned long) header[1] << 16) + ((unsigned long) header[2] << 8) + (unsigned long) header[3];
cout << l1 << endl;
cout << l2 << endl;
I would expect l2 to also have a value of 2147483648 but instead it prints 18446744071562067968. I assume the bit shifting of the first byte causes problems?
Hopefully somebody can explain why this fails and how I modify the calculation of l2 so that it returns the correct value.
Thanks in advance.
Your value of 0x80 stored in a char is a signed quantity. When you cast this into a wider type, the value is being signed extended to keep the same value as a larger type.
Change the type of char in the first line to unsigned char and you will not get the sign extension happening.
To simplify what is happening in your case, run this:
char c = 0x80
unsigned long l = c
cout << l << endl;
You get this output:
18446744073709551488
which is -128 as a 64-bit integer (0x80 is -128 as a 8-bit integer).
Same result here (Linux/x86-64, GCC 4.4.5). The behavior depends on the size of unsigned long, which is at least 32 bits, but may be larger.
If you want exactly 32 bits, use a uint32_t instead (from the header <stdint.h>; not in C++03 but in the upcoming standard and widely supported).