C++ Bitshift 4 int_8t into a normal integer (32 bit ) - c++

I had already asked a question how to get 4 int8_t into a 32bit int, I was told that I have to cast the int8_t to a uint8_t first to pack it with bitshifting into a 32bit integer.
int8_t offsetX = -10;
int8_t offsetY = 120;
int8_t offsetZ = -60;
using U = std::uint8_t;
int toShader = (U(offsetX) << 24) | (U(offsetY) << 16) | (U(offsetZ) << 8) | (0 << 0);
std::cout << (int)(toShader >> 24) << " "<< (int)(toShader >> 16) << " " << (int)(toShader >> 8) << std::endl;
My Output is
-10 -2440 -624444
It's not what I expected, of course, does anyone have a solution?
In the shader I want to unpack the int16 later and that is only possible with a 32bit integer because glsl does not have any other data types.
int offsetX = data[gl_InstanceID * 3 + 2] >> 24;
int offsetY = data[gl_InstanceID * 3 + 2] >> 16 ;
int offsetZ = data[gl_InstanceID * 3 + 2] >> 8 ;
What is written in the square bracket does not matter it is about the correct shifting of the bits or casting after the bracket.

If any of the offsets is negative, then the shift results in undefined behaviour.
Solution: Convert the offsets to an unsigned type first.
However, this brings another potential problem: If you convert to unsigned, then negative numbers will have very large values with set bits in most significant bytes, and OR operation with those bits will always result in 1 regardless of offsetX and offsetY. A solution is to convert into a small unsigned type (std::uint8_t), and another is to mask the unused bytes. Former is probably simpler:
using U = std::uint8_t;
int third = U(offsetX) << 24u
| U(offsetY) << 16u
| U(offsetZ) << 8u
| 0u << 0u;

I think you're forgetting to mask the bits that you care about before shifting them.
Perhaps this is what you're looking for:
int32 offsetX = (data[gl_InstanceID * 3 + 2] & 0xFF000000) >> 24;
int32 offsetY = (data[gl_InstanceID * 3 + 2] & 0x00FF0000) >> 16 ;
int32 offsetZ = (data[gl_InstanceID * 3 + 2] & 0x0000FF00) >> 8 ;
if (offsetX & 0x80) offsetX |= 0xFFFFFF00;
if (offsetY & 0x80) offsetY |= 0xFFFFFF00;
if (offsetZ & 0x80) offsetZ |= 0xFFFFFF00;
Without the bit mask, the X part will end up in offsetY, and the X and Y part in offsetZ.

on CPU side you can use union to avoid bit shifts and bit masking and branches ...
int8_t x,y,z,w; // your 8bit ints
int32_t i; // your 32bit int
union my_union // just helper union for the casting
int8_t i8[4];
int32_t i32;
} a;
// 4x8bit -> 32bit
// 32bit -> 4x8bit
If you do not like unions the same can be done with pointers...
Beware on GLSL side is this not possible (nor unions nor pointers) and you have to use bitshifts and masks like in the other answer...


Difference between bitshifting mask vs unsigned int

For a project, I had to find the individual 8-bits of a unsigned int. I first tried bit-shifting the mask to find the numbers, but that didn't work, so I tried bit-shifting the value and it worked.
What's the difference between these two? Why didn't the first one work?
ExampleFunk(unsigned int value){
for (int i = 0; i < 4; i++) {
ExampleSubFunk(value & (0x00FF << (i * 8)));
ExampleFunk(unsigned int value){
for (int i = 0; i < 4; i++) {
ExampleSubFunk((value >> (i * 8)) & 0x00FF);
Take the value 0xAABBCCDD as an example.
The expression value & (0xFF << (i * 8)) assumes the values:
0xAABBCCDD & 0x000000FF = 0x000000DD
0xAABBCCDD & 0x0000FF00 = 0x0000CC00
0xAABBCCDD & 0x00FF0000 = 0x00BB0000
0xAABBCCDD & 0xFF000000 = 0xAA000000
While the expression (value >> (i * 8)) & 0xFF assumes the values:
0xAABBCCDD & 0x000000FF = 0x000000DD
0x00AABBCC & 0x000000FF = 0x000000CC
0x0000AABB & 0x000000FF = 0x000000BB
0x000000AA & 0x000000FF = 0x000000AA
As you can see, the results are quite different after i = 0, because the first expression is only "selecting" 8 bits from value, while the second expression is shifting them down to the least significant byte first.
Note that in the first case, the expression (0xFF << (i * 8)) is shifting an int literal (0xFF) left. You should cast the literal to unsigned int to avoid signed integer overflow, which is undefined behavior:
value & ((unsigned int)0xFF << (i * 8))
In this code:
ExampleFunk(unsigned int value){
for (int i = 0; i < 4; i++) {
ExampleSubFunk(value & (0x00FF << (i * 8)));
You are shifting the bits of 0x00FF itself, producing new masks of 0x00FF, 0xFF00, 0xFF0000, and 0xFF000000, and then you are masking value with each of those masks. The result contains only the 8 bits of value that you are interested in, but those 8 bits are not moving position at all.
In this code:
ExampleFunk(unsigned int value){
for (int i = 0; i < 4; i++) {
ExampleSubFunk((value >> (i * 8)) & 0x00FF);
You are shifting the bits of value, thus moving those 8 bits that you want, and then you are masking the result with 0x00FF to extract those 8 bits.

Split an integer into bytes and combine back into the integers results into error

Toy program to split an integer into 4 bytes and later combine these bytes to get back the input value results into error. However the program works for positive integers. I am interested in signed integers. Need help.
Expected Output: -12345
Actual Output: -57
int main()
int j,i = -12345;
char b[4];
b[0] = (i >> 24) & 0xFF;
b[1] = (i >> 16) & 0xFF;
b[2] = (i >> 8) & 0xFF;
b[3] = (i >> 0) & 0xFF;
j = (int)((b[0] << 24) | (b[1] << 16) | (b[2] << 8) | (b[3] << 0));
std::cout << j;
return 0;
There are actually two problems that leads to your "error".
The first is that the result of e.g. b[0] << 24 will be an int. When you cast that to a char (and assuming that char is an 8-bit type) then you cut off the top 24 bits of the value, truncating it.
The second problem is that char could be unsigned (it's implementation-defined if char is signed or unsigned). If char is unsigned then the value -1 (0xffffffff) will become 255 (0x000000ff).
When you then bring all that together it will almost certainly result in wrong values.
In general, whenever you feel the need to do a C-style cast (like in (char)(b[0] << 24)) when programming in C++, you should take that as a sign that you're doing something wrong.
One possible way to solve your problem, always work with explicit unsigned data-types.
First you need to copy the original int value to an unsigned int:
unsigned ui;
memcpy(&ui, &i, sizeof ui);
Then use ui instead of i when doing the "split". And explicitly use unsigned char:
unsigned char b[sizeof(unsigned)] = { 0 };
b[0] = (ui >> 24) & 0xFF;
b[1] = (ui >> 16) & 0xFF;
b[2] = (ui >> 8) & 0xFF;
b[3] = (ui >> 0) & 0xFF;
Then to put it all back, again use an explicit unsigned type, and copy it to the resulting variable:
unsigned uj = (b[0] << 24) | (b[1] << 16) | (b[2] << 8) | (b[3] << 0);
memcpy(&j, &uj, sizeof j);
I suggest using unsigned data types here to avoid possible problems that can come from sign-extension during conversion.
Your code works only for possessive numbers! "i" is negative and by shifting it to to right b[0] becomes positive! and finally desensitization results error!
int main()
int j, i = -12345;
const char* bytes = reinterpret_cast<const char*>(&i);
j = *reinterpret_cast<const int*>(bytes);
std::cout << j;
return 0;

Correct way to concatenate bitwise operations?

Im in need to concatenate some bitwise operations but the current output seems to be wrong. The splitted operations are similar to this :
unsigned char a = 0x12
unsigned char x = 0x00;
x = a << 4;
x = x >> 4;
expected result x = 0x02;
current result x = 0x02;
If i try to concatenate the operations the result is not correct:
unsigned char a = 0x12
unsigned char x = 0x00;
x = (a << 4) >> 4;
expected result x = 0x02;
current result x = 0x12;
Thanks in advance for any suggestion.
The problem is (a << 4) is cast to int (via Integral promotion), so (0x12 << 4) >> 4 is essentially 0x12
What you want to do is convert back (a << 4) to unsigned char by using static_cast
The final code:
unsigned char a = 0x12;
unsigned char x = 0x00;
x = static_cast<unsigned char>(a << 4) >> 4;
Compiler is NOT applying integral promotions for the >> and << operations
You might think that
x = (a << 4) >> 4;
Would use a byte-wide register for the operation, but the compiler promotes the char a to an int before doing the shift, preserving the bits that are shifted to the left.
You can solve this by doing this:
x = ((a << 4) & 0xff) >> 4;
Again, the issue is that integral promotion preserves the bits until the final cast.

Get signed integer from 2 16-bit signed bytes?

So this sensor I have returns a signed value between -500-500 by returning two (high and low) signed bytes. How can I use these to figure out what the actual value is? I know I need to do 2's complement, but I'm not sure how. This is what I have now -
real_velocity = temp.values[0];
if(temp.values[1] != -1)
real_velocity += temp.values[1];
//if high byte > 1, negative number - take 2's compliment
if(temp.values[1] > 1) {
real_velocity = ~real_velocity;
real_velocity += 1;
But it just returns the negative value of what would be a positive. So for instance, -200 returns bytes 255 (high) and 56(low). Added these are 311. But when I run the above code it tells me -311. Thank you for any help.
-200 in hex is 0xFF38,
you're getting two bytes 0xFF and 0x38,
converting these back to decimal you might recognise them
0xFF = 255,
0x38 = 56
your sensor is not returning 2 signed bytes but a simply the high and low byte of a signed 16 bit number.
so your result is
value = (highbyte << 8) + lowbyte
value being a 16 bit signed variable.
Based on the example you gave, it appears that the value is already 2's complement. You just need to shift the high byte left 8 bits and OR the values together.
real_velocity = (short) (temp.values[0] | (temp.values[1] << 8));
You can shift the bits and mask the values.
int main()
char data[2];
data[0] = 0xFF; //high
data[1] = 56; //low
int value = 0;
if (data[0] & 0x80) //sign
value = 0xFFFF8000;
value |= ((data[0] & 0x7F) << 8) | data[1];
real_velocity = temp.values[0];
real_velocity = real_velocity << 8;
real_velocity |= temp.values[1];
// And, assuming 32-bit integers
real_velocity <<= 16;
real_velocity >>= 16;
For 8-bit bytes, first just convert to unsigned:
typedef unsigned char Byte;
unsigned const u = (Byte( temp.values[1] ) << 8) | Byte( temp.values[0] );
Then if that is greater than the upper range for 16-bit two's complement, subtract 216:
int const i = int(u >= (1u << 15)? u - (1u << 16) : u);
You could do tricks at the bit level, but I don't think there's any point in that.
The above assuming that CHAR_BIT = 8, that unsigned is more than 16 bits, and that the machine and desired result is two's complement.
#include <iostream>
using namespace std;
int main()
typedef unsigned char Byte;
struct { char values[2]; } const temp = { 56, 255 };
unsigned const u = (Byte( temp.values[1] ) << 8) | Byte( temp.values[0] );
int const i = int(u >= (1u << 15)? u - (1u << 16) : u);
cout << i << endl;

C/C++ - Convert 24-bit signed integer to float

I'm programming in C++. I need to convert a 24-bit signed integer (stored in a 3-byte array) to float (normalizing to [-1.0,1.0]).
The platform is MSVC++ on x86 (which means the input is little-endian).
I tried this:
float convert(const unsigned char* src)
int i = src[2];
i = (i << 8) | src[1];
i = (i << 8) | src[0];
const float Q = 2.0 / ((1 << 24) - 1.0);
return (i + 0.5) * Q;
I'm not entirely sure, but it seems the results I'm getting from this code are incorrect. So, is my code wrong and if so, why?
You are not sign extending the 24 bits into an integer; the upper bits will always be zero. This code will work no matter what your int size is:
if (i & 0x800000)
i |= ~0xffffff;
Edit: Problem 2 is your scaling constant. In simple terms, you want to multiply by the new maximum and divide by the old maximum, assuming that 0 remains at 0.0 after conversion.
const float Q = 1.0 / 0x7fffff;
Finally, why are you adding 0.5 in the final conversion? I could understand if you were trying to round to an integer value, but you're going the other direction.
Edit 2: The source you point to has a very detailed rationale for your choices. Not the way I would have chosen, but perfectly defensible nonetheless. My advice for the multiplier still holds, but the maximum is different because of the 0.5 added factor:
const float Q = 1.0 / (0x7fffff + 0.5);
Because the positive and negative magnitudes are the same after the addition, this should scale both directions correctly.
Since you are using a char array, it does not necessarily follow that the input is little endian by virtue of being x86; the char array makes the byte order architecture independent.
Your code is somewhat over complicated. A simple solution is to shift the 24 bit data to scale it to a 32bit value (so that the machine's natural signed arithmetic will work), and then use a simple ratio of the result with the maximum possible value (which is INT_MAX less 256 because of the vacant lower 8 bits).
#include <limits.h>
float convert(const unsigned char* src)
int i = src[2] << 24 | src[1] << 16 | src[0] << 8 ;
return i / (float)(INT_MAX - 256) ;
Test code:
unsigned char* makeS24( unsigned int i, unsigned char* s24 )
s24[2] = (unsigned char)(i >> 16) ;
s24[1] = (unsigned char)((i >> 8) & 0xff);
s24[0] = (unsigned char)(i & 0xff);
return s24 ;
#include <iostream>
int main()
unsigned char s24[3] ;
volatile int x = INT_MIN / 2 ;
std::cout << convert( makeS24( 0x800000, s24 )) << std::endl ; // -1.0
std::cout << convert( makeS24( 0x7fffff, s24 )) << std::endl ; // 1.0
std::cout << convert( makeS24( 0, s24 )) << std::endl ; // 0.0
std::cout << convert( makeS24( 0xc00000, s24 )) << std::endl ; // -0.5
std::cout << convert( makeS24( 0x400000, s24 )) << std::endl ; // 0.5
Since it's not symmetrical, this is probably the best compromise.
Maps -((2^23)-1) to -1.0 and ((2^23)-1) to 1.0.
(Note: this is the same conversion style used by 24 bit WAV files)
float convert( const unsigned char* src )
  int i = ( ( src[ 2 ] << 24 ) | ( src[ 1 ] << 16 ) | ( src[ 0 ] << 8 ) ) >> 8;
return ( ( float ) i ) / 8388607.0;
The solution that works for me:
* Convert 24 byte that are saved into a char* and represent a float
* in little endian format to a C float number.
float convert(const unsigned char* src)
float num_float;
// concatenate the chars (short integers) and
// save them to a long int
long int num_integer = (
((src[2] & 0xFF) << 16) |
((src[1] & 0xFF) << 8) |
(src[0] & 0xFF)
// copy the bits from the long int variable
// to the float.
memcpy(&num_float, &num_integer, 4);
return num_float;
Works for me:
float convert(const char* stream)
int fromStream =
(0x00 << 24) +
(stream[2] << 16) +
(stream[1] << 8) +
return (float)fromStream;
Looks like you're treating it as an 24-bit unsigned integer. If the most significant bit is 1, you need to make i negative by setting the remaining 8 bits to 1 as well.
I'm not sure if it's good programming practice, but this seems to work (at least with g++ on 32-bit Linux, haven't tried it on anything else yet) and is certainly more elegant than extracting byte-by-byte from a char array, especially if it's not really a char array but rather a stream (in my case, it's a file stream) that you read from (if it is a char array, you can use memcpy instead of istream::read).
Just load the 24-bit variable into the less significant 3 bytes of a signed 32-bit (signed long). Then shift the long variable one byte to the left, so that the sign bit appears where it's meant to. Finally, just normalize the 32-bit variable, and you're all set.
union _24bit_LE{
char access;
signed long _long;
float getnormalized24bitsample(){
std::ifstream::read(&_24bit_LE_buf.access+1, 3);
return (_24bit_LE_buf._long<<8) / (0x7fffffff + .5);
(Strangely, it doesn't seem to work when you just read into the 3 more significant bytes right away).
EDIT: it turns out this method seems to have some problems I don't fully understand yet. Better don't use it for the time being.
This one, got from here, works for me.
typedef union {
struct {
unsigned short lo;
unsigned short hi;
} u16;
unsigned int u32;
signed int i32;
float f;
//Bipolar version (-1.0 to ~1.0)
void fInt24_to_float(float* dest, const char* src, size_t length) {
Versatype32 xTemp;
while (length--) {
xTemp.u32 = *(int*)src;
//Check if Negative by right shifting 8
xTemp.u32 <<= 8; //(If it's a negative, we'll know) (Props to Norman Wong)
//Convert to float
xTemp.f = (float)xTemp.i32;
//Skip divide down if zero
if (xTemp.u32 != 0) {
//Divide by (1<<31 or 2^31)
xTemp.u16.hi -= 0x0F80; //BAM! Bitmagic!
*dest = xTemp.f;
//Move to next set
src += 3;
} //Are we done yet?