int value = 0xffffffff;
int len = 32;
int result = value << len; // result will be 0xffffffff
result = value << 32; // result will be 0x0
Why does it makes a difference?
Edit:
Sorry I made a mistake. In the example above, both results are 0xffffffff.
So look at this:
unsigned int value = 0xffffffff;
unsigned int len = 32;
printf("0x%x\n", value << len); //will print 0xffffffff
printf("0x%x\n", 0xffffffff << 32); //will print 0x0
If the size of an int is 32 bits or less, your code contains
undefined behavior. The number of bits
shifted must be greater than or equal 0, and strictly less than
the number of bits in what is being shifted.
What is probably happening in practice is that for the variable,
the compiler is probably just passing it to a machine
instruction which only considers 5 low order bits (which are
0 in the case of 32); when the shift count is a constant, the
compiler evaluates the expression internally, likely in long
long, and then truncates it. But this is just one possible
behavior; anything might happen as far as the language is
concerned.
If len >= sizeof(int) or len < 0, the code contains undefined behaviour.
See this answer for more details.
Related
Here is a code in C:
#include <stdio.h>
int main()
{
printf("Int width = %lu\n", sizeof(unsigned int)); // Gives 32 bits on my computer
unsigned int n = 32;
unsigned int y = ((unsigned int)1 << n) - 1; // This is line 8
printf("%u\n", y);
unsigned int x = ((unsigned int)1 << 32) - 1; // This is line 11
printf("%u", x);
return 0;
}
It outputs:
main.c:11:39: warning: left shift count >= width of type [-Wshift-count-overflow]
Int width = 4
0
4 294 967 295 (= 2^32-1)
The warning for the line 11 is expected as explained in these links: wiki.sei.cmu.edu and https://stackoverflow.com/a/11270515
left-shift operations [...] if the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
There is no warning for the line 8, but I was expected the same warning as for line 11. Futhermore, the results are entirely different ! What do I miss ?
This behaviour is similar for C++:
#include <iostream>
using namespace std;
int main()
{
cout << "Int width = " << sizeof(uint64_t) << "\n"; // Gives 64 bits on my computer
int n = 64;
uint64_t y = ((uint64_t)1 << n) - 1; // This is line 8
cout << "y = " << y;
uint64_t x = ((uint64_t)1 << 64) - 1; // This is line 11
cout << "\nx = " << x;
return 0;
}
Which outputs:
main.cpp:11:34: warning: left shift count >= width of type [-Wshift-count-overflow]
Int width = 8
y = 0
x = 18 446 744 073 709 551 615 (= 2^64-1)
I used the onlineGBD for C compiler for the C code and onlineGBD for C++ compiler.
Here are the link to the code: C code and C++ code.
For line 8, the compiler has to prove that in ((unsigned int)1 << n), n is 32 or more. That can be difficult since n is not const so it's value could be changed. The compiler would have to do more static analysis to give you the warning.
On the other hand, with (unsigned int)1 << 32) the compiler knows that the value is 32 or more and can easily warn. This requies almost no time to detect, since the type and the value to shift by are both compile time "literals".
If you switch to using const int n = 64; in your C++ code, then you will get an error at OnlineGBD. You can see that here. I tried that with the C version but it still doesn't warn.
There is no warning for the line 8, but I was expected the same warning as for line 11.
The C standard does not require a compiler to diagnose an excessive shift amount. (Generally, it does not require C implementations to diagnose errors other than those explicitly listed in “Constraints” clauses.)
The compiler you using diagnoses the error with the integer constant expression (32), as this is easy. It does not diagnose the error with the variable n, as that involves more work and the compiler authors have not implemented it.
Futhermore, the results are entirely different !
With the integer constant expression, the compiler evaluates the shift during compilation, using whatever software is built into it. That apparently produces zero for (unsigned int) 1 << 32. With the variable, the compiler generates an instruction to perform the shift during program execution. That instruction likely uses only the low five bits of the right operand, so an operand of 32 (1000002) yields of shift of zero bits, so shifting (unsigned int) 1 produces one.
Both behaviors are allowed by the C standard.
It's likely because n is a variable, the compiler doesn't seem to be verifying it, as it doesn't know its value it doesn't issue a warning, if you turn it into a constant i.e const int n = 64;, the warning is issued.
https://godbolt.org/z/4s5jz6
As for the results, undefined behavior is what it is, for the sake of curiosity you can analyze a particular case and try to figure out what the compiler did, but the results can't be reasoned with because there is no correct result.
Even the warnings are optional, gcc is nice enough to to warn you when a constant or constant literal is used but it didn't have to.
Undefined behaviour (UB) means undefined behaviour. Literally anything can happen. Compilers are not required to tell you of UB, but are permitted to.
If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
So max unsigned -1, or zero, or format your SSD while installing viruses on your cloud stored files and emailing your browser history to your contact list are all permitted things for your compiler to convert a shift of 32 bits of a 32 bit integer.
As a Quality of Implementation issue, the last is rare.
The compiler is optimizing (1<<x)-1 as x 1 bits, not even doing a shift operation, in one case. Within the bounds of defined shift operations, this is equivalent, so this is a valid optimization. So when you pass 32, it writes 0xffffffff.
In the other case, it is maybe setting the nth bit, reading the low 5 bits of the shift operation to see which to set. Also valid within the range of defined behaviour, utterly different.
Welcome to UB.
I would expect further changes based on optimization level.
This code gives the meaningful output
#include <iostream>
int main() {
unsigned int ui = 100;
unsigned int negative_ui = -22u;
std::cout << ui + negative_ui << std::endl;
}
Output:
78
The variable negative_ui stores -22, but is an unsigned int.
My question is why does unsigned int negative_ui = -22u; work.
How can an unsigned int store a negative number? Is it save to be used or does this yield undefined behaviour?
I use the intel compiler 18.0.3. With the option -Wall no warnings occurred.
Ps. I have read What happens if I assign a negative value to an unsigned variable? and Why unsigned int contained negative number
How can an unsigned int store a negative number?
It doesn't. Instead, it stores a representable number that is congruent with that negative number modulo the number of all representable values. The same is also true with results that are larger than the largest representable value.
Is it save to be used or does this yield undefined behaviour?
There is no UB. Unsigned arithmetic overflow is well defined.
It is safe to rely on the result. However, it can be brittle. For example, if you add -22u and 100ull, then you get UINT_MAX + 79 (i.e. a large value assuming unsigned long long is a larger type than unsigned) which is congruent with 78 modulo UINT_MAX + 1 that is representable in unsigned long long but not representable in unsigned.
Note that signed arithmetic overflow is undefined.
Signed/Unsigned is a convention. It uses the last bit of the variable (in case of x86 int, the last 31th bit). What you store in the variable takes the full bit length.
It's the calculations that follow that take the upper bit as a sign indicator or ignore it. Therefore, any "unsigned" variable can contain a signed value which will be converted to the unsigned form when the unsigned variable participates in a calculation.
unsigned int x = -1; // x is now 0xFFFFFFFF.
x -= 1; // x is now 0xFFFFFFFE.
if (x < 0) // false. x is compared as 0xFFFFFFFE.
int x = -1; // x stored as 0xFFFFFFFF
x -= 1; // x stored as 0xFFFFFFFE
if (x < 0) // true, x is compared as -2.
Technically valid, bad programming.
on my Arduino, the following code produces output I don't understand:
void setup(){
Serial.begin(9600);
int a = 250;
Serial.println(a, BIN);
a = a << 8;
Serial.println(a, BIN);
a = a >> 8;
Serial.println(a, BIN);
}
void loop(){}
The output is:
11111010
11111111111111111111101000000000
11111111111111111111111111111010
I do understand the first line: leading zeros are not printed to the serial terminal. However, after shifting the bits the data type of a seems to have changed from int to long (32 bits are printed). The expected behaviour is that bits are shifted to the left, and that bits which are shifted "out" of the 16 bits an int has are simply dropped. Shifting the bits back does not turn the "32bit" variable to "16bit" again.
Shifting by 7 or less positions does not show this effect.
I probably should say that I am not using the Arduino IDE, but the Makefile from https://github.com/sudar/Arduino-Makefile.
What is going on? I almost expect this to be "normal", but I don't get it. Or is it something in the printing routine which simply adds 16 "1"'s to the output?
Enno
In addition to other answers, Integers might be stored in 16 bits or 32 bits depending on what arduino you have.
The function printing numbers in Arduino is defined in /arduino-1.0.5/hardware/arduino/cores/arduino/Print.cpp
size_t Print::printNumber(unsigned long n, uint8_t base) {
char buf[8 * sizeof(long) + 1]; // Assumes 8-bit chars plus zero byte.
char *str = &buf[sizeof(buf) - 1];
*str = '\0';
// prevent crash if called with base == 1
if (base < 2) base = 10;
do {
unsigned long m = n;
n /= base;
char c = m - base * n;
*--str = c < 10 ? c + '0' : c + 'A' - 10;
} while(n);
return write(str);
}
All other functions rely on this one, so yes your int gets promoted to an unsigned long when you print it, not when you shift it.
However, the library is correct. By shifting left 8 positions, the negative bit in the integer number becomes '1', so when the integer value is promoted to unsigned long the runtime correctly pads it with 16 extra '1's instead of '0's.
If you are using such a value not as a number but to contain some flags, use unsigned int instead of int.
ETA: for completeness, I'll add further explanation for the second shifting operation.
Once you touch the 'negative bit' inside the int number, when you shift towards right the runtime pads the number with '1's in order to preserve its negative value. Shifting to the left k positions corresponds to dividing the number by 2^k, and since the number is negative to start with then the result must remain negative.
uint32_t a = 0xFF << 8;
uint32_t b = 0xFF;
uint32_t c = b << 8;
I'm compiling for the Uno (1.0.x and 1.5) and it would seem obvious that a and c should be the same value, but they are not... at least not when running on the target. I compile the same code on the host and have no issues.
Right shift works fine, left shift only works when I'm shifting a variable versus a constant.
Can anyone confirm this?
I'm using Visual Micro with VS2013. Compiling with either 1.0.x or 1.5 Arduino results in the same failure.
EDIT:
On the target:
A = 0xFFFFFF00
C = 0x0000FF00
The problem is related to the signed/unsigned implicit cast.
With uint32_t a = 0xFF << 8; you mean
0xFF is declared; it is a signed char;
There is a << operation, so that variable is converted to int. Since it was a signed char (and so its value was -1) it is padded with 1, to preserve the sign. So the variable is 0xFFFFFFFF;
it is shifted, so a = 0xFFFFFF00.
NOTE: this is slightly wrong, see below for the "more correct" version
If you want to reproduce the same behaviour, try this code:
uint32_t a = 0xFF << 8;
uint32_t b = (signed char)0xFF;
uint32_t c = b << 8;
Serial.println(a, HEX);
Serial.println(b, HEX);
Serial.println(c, HEX);
The result is
FFFFFF00
FFFFFFFF
FFFFFF00
Or, in the other way, if you write
uint32_t a = (unsigned)0xFF << 8;
you get that a = 0x0000FF00.
There are just two weird things with the compiler:
uint32_t a = (unsigned char)0xFF << 8; returns a = 0xFFFFFF00
uint32_t a = 0x000000FF << 8; returns a = 0xFFFFFF00 too.
Maybe it's a wrong cast in the compiler....
EDIT:
As phuclv pointed out, the above explanation is slightly wrong. The correct explanation is that, with uint32_t a = 0xFF << 8;, the compiler does this operations:
0xFF is declared; it is an int;
There is a << operation, and thus this becomes 0xFF00; it was an int, so it is negative
it is then promoted to uint32_t. Since it was negative, 1s are prepended, resulting in a 0xFFFFFF00
The difference with the above explanation is that if you write uint32_t a = 0xFF << 7; you get 0x7F80 rather than 0xFFFFFF80.
This also explains the two "weird" things I wrote in the end of the previous answer.
For reference, in the thread linked in the comment there are some more explanations on how the compiler interpretes literals. Particularly in this answer there is a table with the types the compiler assigns to the literals. In this case (no suffix, hexadecimal value) the compiler assigns this type, according to what is the smallest type that fits the value:
int
unsigned int
long int
unsigned long int
long long int
unsigned long long int
This leads to some more considerations:
uint32_t a = 0x7FFF << 8; this means that the literal is interpreted as a signed integer; the promotion to the bigger integer extends the sign, and so the result is 0xFFFFFF00
uint32_t b = 0xFFFF << 8; the literal in this case is interpreted as an unsigned integer. The result of the promotion to the 32-bit integer is therefore 0x0000FF00
The most important thing here is that in Arduino int is a 16-bit type. That'll explain everything
For uint32_t a = 0xFF << 8: 0xFF is of type int1. 0xFF << 8 results in 0xFF00 which is a signed negative value in 16-bit int2. When assigning the int value to a uint32_t variable again it'll be sign-extended 3 when upcasting, thus the result becomes 0xFFFFFF00U
For the following lines
uint32_t b = 0xFF;
uint32_t c = b << 8;
0xFF is positive in 16-bit int, therefore b also contains 0xFF. Then shifting it left 8 bits results in 0x0000FF00, because b << 8 is an uint32_t expression. It's wider than int so there's no promotion to int happening here
Similarly with uint32_t a = (unsigned)0xFF << 8 the output is 0x0000FF00 because the positive 0xFF when converted to unsigned int is still positive. Upcasting unsigned int to uint32_t does a zero extension, but the sign bit is already zero so even if you do int32_t b = 0xFF; uint32_t c = b << 8 the high bits are still zero. Same to the "weird" uint32_t a = 0x000000FF << 8. Instead of (unsigned)0xFF you can just use the exact equivalent version (but shorter) 0xFFU
OTOH if you declare b as uint8_t b = 0xFF or int8_t b = 0xFF then things will be different, integer promotion occurs and the result will be similar to the first line (0xFFFFFF00U). And if you cast 0xFF to signed char like this
uint32_t b = (signed char)0xFF;
uint32_t c = b << 8;
then upon promoting to int it'll be sign-extended to 0xFFFF. Similarly casting it to int32_t or uint32_t will result in a sign-extension from signed char to the 32-bit wide value 0xFFFFFFFF
If you cast to unsigned char like in uint32_t a = (unsigned char)0xFF << 8; instead then the (unsigned char)0xFF will be promoted to int using zero extension4, therefore the result will be exactly the same as uint32_t a = 0xFF << 8;
In summary: When in doubt, consult the standard. The compiler rarely lies to you
1 Type of integer literals not int by default?
The type of an integer constant is the first of the corresponding list in which its value can be represented.
Suffix Decimal Constant Octal or Hexadecimal Constant
-------------------------------------------------------------------
none int int
long int unsigned int
long long int long int
unsigned long int
long long int
unsigned long long int
2 Strictly speaking shifting into sign bit like that is undefined behavior
1 << 31 produces the error, "The result of the '<<' expression is undefined"
Defining (1 << 31) or using 0x80000000? Result is different
3 The rule is to add UINT_MAX + 1
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Signed to unsigned conversion in C - is it always safe?
4A cast will always preserve the input value if the value fits in the target type, so casting a signed type to a wider signed type will be done by a sign-extension, and casting an unsigned type to a wider type will be done by a zero-extension
[Credit goes to Mats Petersson]
Using a cast operator to force the compiler to treat the 0xFF as a uint32_t addresses the issue. Seems like the Arduino xcompiler treats constants a little differently since I've never had cast before a shift.
Thanks!
How do I get the number of bits in type char?
I know about CHAR_BIT from climits. This is described as »The macro yields the maximum value for the number of bits used to represent an object of type char.« at Dikumware's C Reference. I understand that means the number of bits in a char, doesn't it?
Can I get the same result with std::numeric_limits somehow? std::numeric_limits<char>::digits returns 7 correctly but unfortunately, because this value respects the signedness of the 8-bit char here…
CHAR_BIT is, by definition, number of bits in the object representation of type [signed/unsigned] char.
numeric_limits<>::digits is the number of non-sign bits in the value representation of the given type.
Which one do you need?
If you are looking for number of bits in the object representation, then the correct approach is to take the sizeof of the type and multiply it by CHAR_BIT (of course, there's no point in multiplying by sizeof in specific case of char types, since their size is always 1, and since CHAR_BIT by definition alredy contains what you need).
If you are talking about value representation then numeric_limits<> is the way to go.
For unsigned char type the bit-size of object representation (CHAR_BIT) is guaranteed to be the same as bit-size of value representation, so you can use numeric_limits<unsigned char>::digits and CHAR_BIT interchangeably, but this might be questionable from the conceptual point of view.
If you want to be overly specific, you can do this:
sizeof(char) * CHAR_BIT
If you know you are definitely going to do the sizeof char, it's a bit overkill as sizeof(char) is guaranteed to be 1.
But if you move to a different type such as wchar_t, that will be important.
Looking at the snippets archive for this code here, here's an adapted version, I do not claim this code:
int countbits(char ch){
int n = 0;
if (ch){
do n++;
while (0 != (ch = ch&(ch-1)));
}
return n;
}
Hope this helps,
Best regards,
Tom.
A non-efficient way:
char c;
int bits;
for ( c = 1, bits = 0; c; c <<= 1, bits++ )
;
printf( "bits = %d\n", bits );
no reputation here yet so I'm not allowed to reply to #t0mm13b 's answer, but wanted to point out that there's a problem with the code:
int countbits(char ch){
int n = 0;
if (ch){
do n++;
while (0 != (ch = ch&(ch-1)));
}
return n;
}
The above won't count the number of bits in a character, it will count the number of set bits (1 bits).
For example, the following call will return 4:
char c = 'U';
countbits(c);
The code:
ch = ch & (ch - 1)
Is a trick to strip off the right most (least significant) bit that's set to 1. So, it glosses over any bits set to 0 and doesn't count them.