Unsigned long and bit shifting

Unsigned long and bit shifting - c++

I have a problem with bit shifting and unsigned longs. Here's my test code:
char header[4];
header[0] = 0x80;
header[1] = 0x00;
header[2] = 0x00;
header[3] = 0x00;
unsigned long l1 = 0x80000000UL;
unsigned long l2 = ((unsigned long) header[0] << 24) + ((unsigned long) header[1] << 16) + ((unsigned long) header[2] << 8) + (unsigned long) header[3];
cout << l1 << endl;
cout << l2 << endl;
I would expect l2 to also have a value of 2147483648 but instead it prints 18446744071562067968. I assume the bit shifting of the first byte causes problems?
Hopefully somebody can explain why this fails and how I modify the calculation of l2 so that it returns the correct value.
Thanks in advance.

Your value of 0x80 stored in a char is a signed quantity. When you cast this into a wider type, the value is being signed extended to keep the same value as a larger type.
Change the type of char in the first line to unsigned char and you will not get the sign extension happening.
To simplify what is happening in your case, run this:
char c = 0x80
unsigned long l = c
cout << l << endl;
You get this output:
18446744073709551488
which is -128 as a 64-bit integer (0x80 is -128 as a 8-bit integer).

Same result here (Linux/x86-64, GCC 4.4.5). The behavior depends on the size of unsigned long, which is at least 32 bits, but may be larger.
If you want exactly 32 bits, use a uint32_t instead (from the header <stdint.h>; not in C++03 but in the upcoming standard and widely supported).

Related

Is it pointless to concatenate C-style cast operations?

At work I found this code in my codebase, where chars are casted twice:
constexpr unsigned int foo(char ch0, char ch1, char ch2, char ch3)
{
return ((unsigned int)(unsigned char)(ch0)
| ((unsigned int)(unsigned char)(ch1) << 8)
| ((unsigned int)(unsigned char)(ch2) << 16)
| ((unsigned int)(unsigned char)(ch3) << 24))
;
}
Wouldn't one cast to unsigned int be sufficient?
And in that case better make it a static_cast<unsigned_int>?

Yes, there is a difference. Consider if one of the char's has the value of -1. When you do
(unsigned int)(unsigned char)-1
you get 255 for a 8 bit char since you first do the conversion modulo 2^8. If you instead used
(unsigned int)-1
then you would get 4294967295 for a 32 bit int since you are now doing the conversion modulo 2^32.
So the first cast guarantees the result will be representable in 8 bits, or whatever the actual size a char is, and then the second cast is to promote it to a wider type.
You can get rid of the casts to unsigned char if you chnage the function parameters to it like
constexpr unsigned int foo(unsigned char ch0, unsigned char ch1,
unsigned char ch2, unsigned char ch3)
{
return static_cast<unsigned int>(ch0)
| static_cast<unsigned int>(ch1) << 8)
| static_cast<unsigned int>(ch2) << 16)
| static_cast<unsigned int>(ch3) << 24))
;
}

why can't you shift a uint16_t [duplicate]

This question already has an answer here:
right shift count >= width of type or left shift count >= width of type
(1 answer)
Closed 3 years ago.
I am trying to fill a 64-bit unsigned variable by combining 16-bit and 8-bit values:
uint8_t byte0 = 0x00;
uint8_t byte1 = 0xAA;
uint8_t byte2 = 0x00;
uint8_t byte3 = 0xAA;
uint16_t hword0 = 0xAA00;
uint16_t hword1 = 0xAAAA;
uint64_t result = ( hword0 << 32 ) + ( byte3 << 24 ) +
( byte2 << 16 ) + ( byte1 << 8 ) + ( byte0 << 0 );
This gives me a warning.
left shift count >= width of type [-Wshift-count-overflow]
uint64_t result = ( hword0 << 32 )

hword0 is 16 bits long and you request for a 32 bit shift. Shifting more than the number of bits - 1 is undefined.
Solution is to convert your components to the destination type : uint64_t result = ( ((uint64_t)hword0) << 32 ) + etc.

As opposed to your question tile, you can shift a uint16_t. But you cannot shift it (losslessly) by more than its width.
Your input operand's type is applied to the output operand as well, so in your original question, you have a uint16_t << 32 which is 0 (because any value shifted by 32 to the left and then clipped to 16 bits is 0), and so are nearly all of your uint8_t values.
The solution is simple: before shifting, cast your values to the appropriate type suitable for shifting:
uint64_t result = ( (uint64_t)hword0 << 32 ) +
( (uint32_t)byte3 << 24 ) + ( (uint32_t)byte2 << 16 ) + ( (uint32_t)byte1 << 8 ) + ( (uint32_t)byte0 << 0 );

You can shift a uint16_t. What you can't do is shift an integer value by a number greater than or equal to the size of the type. Doing so invokes undefined behavior. This is documented in section 6.5.7p3 of the C standard regarding bitwise shift operators:
The integer promotions are performed on each of the operands. The
type of the result is that of the promoted left operand. If
the value of the right operand is negative or is greater than
or equal to the width of the promoted left operand, the behavior is
undefined.
You would think that this means that any shift greater than or equal to 16 on a uint16_t is not valid. However, as mentioned above the operands of the << operator are subject to integer promotion. This means that any value with a rank lower than int is promoted to int before being used in an expression. So if int is 32 bits on your system, then you can left shift up to 31 bits.
This is why ( byte3 << 24 ) + ( byte2 << 16 ) + ( byte1 << 8 ) + ( byte0 << 0 ) don't generate a warning even though byte is a uint8_t while ( hword0 << 32 ) is not. There is still an issue here however because of the promotion to int. Because the promoted value is now signed, you run the risk of shifting a 1 into the sign bit. Doing so invokes undefined behavior as well.
To fix this, any value that is shifted left by 32 or more must be first casted to uint64_t so that the value can be operated on properly, as well as any value that may end up shifting a 1 into the sign bit:
uint64_t result = ( (uint64_t)hword0 << 32 ) +
( (uint64_t)byte3 << 24 ) + ( (uint64_t)byte2 << 16 ) +
( (uint64_t)byte1 << 8 ) + ( byte0 << 0 );

According to the warning, 32 bits is more or equal to the size of the operand on the target system. The C++ standard says:
[expr.shift]
The operands shall be of integral or unscoped enumeration type and integral promotions are performed.The type of the result is that of the promoted left operand. The behavior is undefined if the right operandis negative, or greater than or equal to the length in bits of the promoted left operand.
Corresponding rule from the C standard:
Bitwise shift operators
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
According to the quoted rule, the behaviour of the your program is undefined whether it is written in C or C++.
You can solve the problem by explicitly converting the left hand operand of the shift to a sufficient large unsigned type.
P.S. On systems where uint16_t is smaller than int (which is quite typical), a uint16_t oprand will be promoted to int when used as an arithmetic operand. As such, byte2 << 16 is not unconditionally† undefined on such systems. You shouldn't rely on this detail, but that explains why you see no warning from the compiler regarding that shift.
† byte2 << 16 can still be undefined if the result is outside the range of representable values of the (signed) int type. It would be well defined if the promoted type was unsigned.

byte2 << 16
is left-shifting an 8-byte value 16 bytes. That won't work. Per 6.5.7 Bitwise shift operators, paragraph 4 of the C standard:
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 x 2E2 , reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 x 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
Since you're using a left shift on unsigned values, you get zero.
EDIT
Per paragraph 3 of the same section, it's actually undefined behavior:
If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
You want something like
( ( uint64_t ) byte2 ) << 16
The cast to a 64-bit value will ensure the result doesn't lose bits.

To do what you want to do, the key idea is to use intermediate uint64_t (the final size) in which to shuffle bits.
The following compiles with no warnings:
you can use auto promotion (and no cast)
{
uint64_t b4567 = hword0; // auto promotion
uint64_t b3 = byte3;
uint64_t b2 = byte2;
uint64_t b1 = byte1;
uint64_t b0 = byte0;
uint64_t result = (
(b4567 << 32) |
(b3 << 24) |
(b2 << 16) |
(b1 << 8) |
(b0 << 0) );
}
you can also use static cast (multiple times):
{
uint64_t result = (
(static_cast<uint64_t>(hword0) << 32) |
(static_cast<uint64_t>(byte3) << 24) |
(static_cast<uint64_t>(byte2) << 16) |
(static_cast<uint64_t>(byte1) << 8) |
(static_cast<uint64_t>(byte0) << 0 )
);
cout << "\n " << hex << result << endl;
}
And you can do both by creating a function to a) perform the static cast and b) with a formal parameter to get the compiler to auto-promote.
function looks like:
// vvvvvvvv ---- formal parameter
uint64_t sc (uint64_t ui64) {
return static_cast<uint64_t>(ui64);
}
// using static cast function
{
uint64_t result = (
(sc(hword0) << 32) |
(sc(byte3) << 24) |
(sc(byte2) << 16) |
(sc(byte1) << 8) |
(sc(byte0) << 0)
);
cout << "\n " << hex << result << endl;
}

From a C perspective:
Much discussion here omits that a uint8_t applied to a shift (left or right) is first promoted to an int, and then the shift rules are applied.
Same occurs with uint16_t when int is 32-bit. (17 bit or more)
When int is 32-bit
hword0 << 32 is UB due to the shift amount too great: outside 0 to 31.
byte3 << 24 is UB when attempting to shift into the sign bit. byte3 & 0x80 is true.
Other shifts are OK.
Had int been 64-bit, OP's original code is fine - no UB, including hword0 << 32.
Had int been 16-bit, all of code's shifts (aside from << 0) are UB or potential UB.
To do this, without casting (Something I try to avoid), consider
// uint64_t result = (hword0 << 32) + (byte3 << 24) + (byte2 << 16) + (byte1 << 8) + byte0
// Let an optimizing compiler do its job
uint64_t result = hword0;
result <<= 8;
result += byte3;
result <<= 8;
result += byte2;
result <<= 8;
result += byte1;
result <<= 8;
result += byte0;
Or
uint64_t result = (1ull*hword0 << 32) + (1ul*byte3 << 24) + (1ul*byte2 << 16) +
(1u*byte1 << 8) + byte0;

Make a Integer from 6 bytes or more using C++

I am new in C++ programming. I am trying to implement a code through which I can make a single integer value from 6 or more individual bytes.
I have Implemented same for 4 bytes and it's working
My Code for 4 bytes:
char *command = "\x42\xa0\x82\xa1\x21\x22";
__int64 value;
value = (__int64)(((unsigned char)command[2] << 24) + ((unsigned char)command[3] << 16) + ((unsigned char)command[4] << 8) + (unsigned char)command[5]);
printf("%x %x %x %x %x",command[2], command[3], command[4], command[5], value);
Using this Code the value of value is 82a12122 but when I try to do for 6 byte then the result was is wrong.
Code for 6 Bytes:
char *command = "\x42\xa0\x82\xa1\x21\x22";
__int64 value;
value = (__int64)(((unsigned char)command[0] << 40) + ((unsigned char)command[1] << 32) + ((unsigned char)command[2] << 24) + ((unsigned char)command[3] << 16) + ((unsigned char)command[4] << 8) + (unsigned char)command[5]);
printf("%x %x %x %x %x %x %x", command[0], command[1], command[2], command[3], command[4], command[5], value);
The output value of value is 82a163c2 which is wrong, I need 42a082a12122.
So can anyone tell me how to get the expected output and what is wrong with the 6 Byte Code.
Thanks in Advance.

Just cast each byte to a sufficiently large unsigned type before shifting. Even after integral promotions (to unsigned int), the type is not large enough to shift by more than 32 bytes (in the usual case, which seems to apply to you).
See here for demonstration: https://godbolt.org/g/x855XH
unsigned long long large_ok(char x)
{
return ((unsigned long long)x) << 63;
}
unsigned long long large_incorrect(char x)
{
return ((unsigned long long)x) << 64;
}
unsigned long long still_ok(char x)
{
return ((unsigned char)x) << 31;
}
unsigned long long incorrect(char x)
{
return ((unsigned char)x) << 32;
}
In simpler terms:
The shift operators promote their operands to int/unsigned int automatically. This is why your four byte version works: unsigned int is large enough for all your shifts. However, (in your implementation, as in most common ones) it can only hold 32 bits, and the compiler will not automatically choose a 64 bit type if you shift by more than 32 bits (that would be impossible for the compiler to know).
If you use large enough integral types for the shift operands, the shift will have the larger type as the result and the shifts will do what you expect.
If you turn on warnings, your compiler will probably also complain to you that you are shifting by more bits than the type has and thus always getting zero (see demonstration).
(The bit counts mentioned are of course implementation defined.)
A final note: Types beginning with double underscores (__) or underscore + capital letter are reserved for the implementation - using them is not technically "safe". Modern C++ provides you with types such as uint64_t that should have the stated number of bits - use those instead.

Your shift overflows bytes, and you are not printing the integers correctly.
This code is working:
(Take note of the print format and how the shifts are done in uint64_t)
#include <stdio.h>
#include <cstdint>
int main()
{
const unsigned char *command = (const unsigned char *)"\x42\xa0\x82\xa1\x21\x22";
uint64_t value=0;
for (int i=0; i<6; i++)
{
value <<= 8;
value += command[i];
}
printf("%x %x %x %x %x %x %llx",
command[0], command[1], command[2], command[3], command[4], command[5], value);
}

How to shift bits in big endian in c++

Here is code for little endian bit shift, i want to convert it in big endian bit shift.
please help me out. actually this is LZW decompression code using little endian shift.
but i want big endian code
unsigned int input_code(FILE *input)
{
unsigned int val;
static int bitcount=0;
static unsigned long inbitbuf=0L;
while (bitcount <= 24)
{
inbitbuf |=(unsigned long) getc(input) << (24-bitcount);
bitcount += 8;
}
val=inbitbuf >> (32-BITS);
inbitbuf <<= BITS;
bitcount -= BITS;
return(val);
}
void output_code(FILE *output,unsigned int code)
{
static int output_bit_count=0;
static unsigned long output_bit_buffer=0L;
output_bit_buffer |= (unsigned long) code << (32-BITS-output_bit_count);
output_bit_count += BITS;
while (output_bit_count >= 8)
{
putc(output_bit_buffer >> 24,output);
output_bit_buffer <<= 8;
output_bit_count -= 8;
}
}

You probably want something like.
unsigned char raw[4];
unsigned int val;
if (4 != fread(raw, 1, 4, input)) {
// error condition, return early or throw or something
}
val = static_cast<unsigned int>(data[3])
| static_cast<unsigned int>(data[2]) << 8
| static_cast<unsigned int>(data[1]) << 16
| static_cast<unsigned int>(data[0]) << 24;
if you were doing little endian, reverse the indexes and everything stays the same.
A good rant on endianness and the code that people seem to write, if you want more.

Its a good idea to mask (perform a bitwise OR against) the bytes one at a time before shifting them . Obviously if you are shifting a 16 bit integer the unmasked bits will just be pushed off either end into oblivion. But for integers larger that 16 bits (I actually had to use 24 bit integers once) it's best to mask each byte before shifting and recombining (perform a bitwise OR on) them.

Unexpected result in byte representation of a variable

I scan through the byte representation of an int variable and get somewhat unexpected result.
If I do
int a = 127;
cout << (unsigned int) *((char *)&a);
I get 127 as expected. If I do
int a = 256;
cout << (unsigned int) *((char *)&a + 1);
I get 1 as expected. But if I do
int a = 128;
cout << (unsigned int) *((char *)&a);
I have 4294967168 which is, well… quite fancy.
The question is: is there a way to get 128 when looking at first byte of an int variable which value is 128?

For the same reason that (unsigned int)(char)128 is 4294967168: char is signed by default on most commonly used systems. 128 cannot fit in a signed 8-bit quantity, so when you cast it to char, you get -128 (0x80 in hex).
Then, when you cast -128 to an unsigned int, you get 232 - 128, which is 4294967168.
If you want to get +128, then use an unsigned char instead of char.

char is signed here, so in your second example, *((char *)&a + 1) = ((char)256 +1) = (0+1) = 1, which is encoded as 0b00000000000000000000000000000001, so becomes 1 as an unsigned int.
In your third example, *((char *)&a) = (char)128 = (char)-127, which is encoded as 0b10000000000000000000000000000000, i.e., 2<<31, which is 4294967168

As the comments have pointed out, it looks like what's happening here is that you are running into an oddity of twos complement. In your last cast, since you are not using an unsigned char, the highest-order bit of the byte is being used to indicate positive or negative values. You then only have 7 bits out of the full 8 to represent your value, giving you a range of 0-127 for positive numbers (-128-127 overall).
If you exceed this range, then it wraps, and you get -128, which when casted back to an unsigned int will result in that abnormally large value.

int a = 128;
cout << (unsigned int) *((unsigned char *)&a);
Also all of your code is dependent on running on a little endian machine.
Here's how you should probably be doing these things:
int a = 127;
cout << (unsigned)(unsigned char)(0xFF & a);
int a = 256;
cout << (unsigned)(unsigned char)(0xFF & (a>>8));
int a = 128;
cout << (unsigned)(unsigned char)(0xFF & a);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js