Is the following undefined and why?
int i = 0xFF;
unsigned int r = i << 24;
The behaviour is technically undefined unless the int type has more than 32 bits.
From C++11, 5.8/2 (describing an expression E1 << E2):
if E1 has a signed type and non-negative value, and E1×2E2 is representable
in the result type, then that is the resulting value; otherwise, the behavior is undefined.
The result type of i << 24 is (signed) int; if that has 32 bits or less, then 0xff * 2^24 == 0xff000000 is not representable (the maximum representable 32-bit signed value being 0x7fffffff), so behaviour is undefined as specified in that clause.
According to N3242 section 5.8 Shift operators:
The shift operators << and >> group left-to-right.
shift-expression: additive-expression shift-expression << additive-expression shift-expression >> additive-expression
The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.
So my answer? Depends on the number of bits in your left operand (which depends on your system).
Related
int main()
{
int x = -2;
cout << (1<<x) << endl;
cout << (1<<-2) << endl;
}
Here the (1<<x) prints 1073741824 (how is this calculated)
Whereas (1<<-2) prints a garbage value.
And why do these two return different answers?
According to the C Standard (6.5.7 Bitwise shift operators)
3 The integer promotions are performed on each of the operands. The
type of the result is that of the promoted left operand. If the
value of the right operand is negative or is greater than or equal to
the width of the promoted left operand, the behavior is undefined
The same is written in the C++ Standard (C++20, 7.6.7 Shift operators)
... The operands shall be of integral or unscoped enumeration type and
integral promotions are performed. The type of the result is that
of the promoted left operand. The behavior is undefined if the right
operand is negative, or greater than or equal to the width of the
promoted left operand.
In the standard, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf
Page 118, Section 5.8.1:
The behavior is undefined if the right operand is negative, or
greater than or equal to the length in bits of the promoted left
operand
Meaning the compiler can do whatever it wants here - all bets are off.
According to cppreference,
For signed a, the value of a << b is a * 2^b if it is
representable [in the unsigned version of the (since C++14)] return
type [(which is then converted to signed: this makes it legal to
create INT_MIN as 1 << 31) (since C++14)], otherwise the behavior
is undefined.
I can't quite understand this specification. What does it mean exactly by if it is representable in the unsigned version of the return type? Does it apply only to the case when the signed value is non-negative? Are INT_MIN << 1 and -1 << 31 well-defined?
If we go to source, the C++14 Standard, this is what we find (with the part about unsigned types highlighted):
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E2 , reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.
For a platform in which std::numeric_limits<int>::digits is 31, it is legal to perform 1 << 31. The resulting value in unsigned int will be 0x80000000 or 2147483648. However, that number is beyond the valid range of value of int for such a platform. However, that bit pattern, when treated as a two's complement representation int is equal to -2147483648, which is the same as INT_MIN for such a platform.
If you use
int a = 2 << 31;
you invoke undefined behavior since 2 * 231 is not representable as an unsigned int in that platform.
Are INT_MIN << 1 and -1 << 31 well-defined?
No, they are not. Bitwise left shifting negative numbers is undefined behavior. Notice the use of non-negative value in the above highlighted text.
I am trying to understand exaclty how integral promotion works with arithmetic shifts operators. Particularly, I would like to know, which values of a, b, c, d, e, f, g, h are exactly defined according to the C++14 standard, and which ones can depend on the platform/hardware/compiler (assuming that sizeof(int) == 4).
int a = true << 3;
int b = true >> 3;
int c = true << 3U;
int d = true >> 3U;
int e = true << 31;
int f = true >> 31;
int g = true << 31U;
int h = true >> 31U;
From [expr.shift]:
The type of the result is that of the promoted left operand. The behavior is undefined if the right operand
is negative, or greater than or equal to the length in bits of the promoted left operand.
The result type of shifting a bool is always int, regardless of what's on the right hand side. We're never shifting by at least 32 or by a negative number, so we're ok there on all accounts.
For the left-shifts (E1 << E2):
Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable
in the corresponding unsigned type of the result type, then that value, converted to the result type, is the
resulting value; otherwise, the behavior is undefined.
1×231 is representable by unsigned int, and that's the largest left-shift we're doing, so we're ok there on all accounts too.
For the right-shifts (E1 >> E2):
If E1 has a signed type and a negative value, the resulting value is implementation-defined.
E1 is never negative, so we're ok there too! No undefined or implementation-defined behavior anywhere.
Following is mainly a complement to Barry's answer, that clearly explains the rules for left and right shifting.
At least fo C++11, the integral promotion of a bool gives 0 for false and 1 for true : 4.5 Integral promotions [conv.prom] § 6
A prvalue of type bool can be converted to a prvalue of type int, with false becoming zero and true becoming one.
So in original examples, b, d, f and h will all get a 0 value, a and c both get a 8 value: only perfectly defined behaviour until here.
But e and g will receive the unsigned value 0x80000000, so it would be fine if you affected it to an unsigned int variable, but you are using signed 32 bits integers. So you get an integral conversion: 4.7 Integral conversions [conv.integral] §3
If the destination type is signed, the value is unchanged if it can be represented in the destination type; otherwise, the value is implementation-defined.
And unsigned 0x80000000 is not representable in a signed 64 bits integer so the result is implementation defined for e and g.
I was wondering what this function actually performs.
To my understanding it should return pSrc[1].
So why does it bother left-shifting pSrc[0] by 8 bits, which zeroes out those 8 bits.
And when these zeroes are ORed with pSrc[1], pSrc[1] is not affected so you get pSrc[1] anyway as if the bitwise OR had never happened.
/*
* Get 2 big-endian bytes.
*/
INLINE u2 get2BE(unsigned char const* pSrc)
{
return (pSrc[0] << 8) | pSrc[1];
}
This function is from the source code of the dalvik virtual machine.
https://android.googlesource.com/platform/dalvik/+/android-4.4.4_r1/vm/Bits.h
Update:
OK, now I got it thanks to all the answers here.
(1) pSrc[0] is originally an unsigned char (1 byte).
(2) When it is left-shifted (pSrc[0] << 8) with the literal 8 of int type, pSrc[0] is therefore int-promoted to a signed int (4 byte).
(3) The result of pSrc[0] << 8 is that the interested 8 bits in pSrc[0] are shifted over to the second byte of the 4 bytes of the signed int, thereby leaving zeroes in the other bytes(1st,3rd and 4th bytes).
(4) And when it is ORed ( intermediate result from step (3) | pSrc[1]), pSrc[1] is then int-promoted to a signed int (4 bytes).
(5) The result of ( intermediate result from step (3) | pSrc[1]) leaves the first two least significant bytes the way we want with zeroes all in the two most significant bytes.
(6) return only the first two least significant bytes to get the 2 big-endian bytes by returning the result as a u2 type.
For arithmetic operations like this, the unsigned char is converted via a process called integral promotions.
C++11 - N3485 §5.8 [expr.shift]/1:
The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand.
And §13.6 [over.built]/17:
For every pair of promoted integral types L and R, there exist candidate operator functions of the form
LR operator%(L , R );
LR operator&(L , R );
LR operator^(L , R );
LR operator|(L , R );
L operator<<(L , R );
L operator>>(L , R );
where LR is the result of the usual arithmetic conversions between types L and R.
When integral promotions are done (§4.5 [conv.prom]/1):
A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion
rank (4.13) is less than the rank of int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned
int.
By integral promotions, the unsigned char will be promoted to int. The other operand is already int, so no changes in type are made to it. The return type then becomes int as well.
Thus, what you have is the first unsigned char's bits shifted left, but still in the now-bigger int, and then the second unsigned char's bits at the end.
You'll notice that the return type of operator| is the result of usual arithmetic conversions between the two operands. At this point, those are the int from the shift and the second unsigned char.
This conversion is defined as follows (§5 [expr]/10):
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield
result types in a similar way. The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follows:
…
Otherwise, the integral promotions (4.5) shall be performed on both operands. Then the following
rules shall be applied to the promoted operands:
…
If both operands have the same type, no further conversion is needed.
Since L and R, being promoted before this, are already int, the promotion leaves them the same and the overall return type of the expression is thus int, which is then converted to u2, whatever that happens to be.
There are no operations (other than type conversions) on
unsigned char. Before any operation, integral promotion
occurs, which converts the unsigned char to an int. So the
operation is shifting an int left, not an unsigned char.
C11 6.5.7 Bitwise shift operators
The integer promotions are performed on each of the operands. The type
of the result is that of the promoted left operand. If the value of
the right operand is negative or is greater than or equal to the
width of the promoted left operand, the behavior is undefined.
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with
zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo
one more than the maximum value representable in the result type. If E1 has a signed
type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is
the resulting value; otherwise, the behavior is undefined.
So pSrc[0] is integer promoted to an int. The literal 8 is already an int, so no integer promotion takes place. The usual arithmetic converstions do not apply to shift operators: they are a special case.
Since the original variable was an unsigned char which gets left shifted 8 bits, we also encounter the issue where "E1" (our promoted variable) is signed and potentially the result cannot be representable in the result type, which leads to undefined behavior if this is a 16 bit system.
In plain English: if you shift something into the sign bits of a signed variable, anything can happen. In general: relying on implicit type promotions is bad programming and dangerous practice.
You should fix the code to this:
((unsigned int)pSrc[0] << 8) | (unsigned int)pSrc[1]
Is left-shifting a negative int Undefined Behavior in C++11?
The relevant Standard passages here are from 5.8:
2/The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are zero-filled. If E1 has an unsigned type, the value of the
result is E1 × 2E2, reduced modulo one more than the maximum value
representable in the result type. Otherwise, if E1 has a signed type
and non-negative value, and E1×2E2 is representable in the result
type, then that is the resulting value; otherwise, the behavior is
undefined.
The part that confuses me is:
Otherwise, if E1 has a signed type and non-negative value, and E1×2E2
is representable in the result type, then that is the resulting value;
otherwise, the behavior is undefined.
Should this be interpreted to mean that left-shifting any negative number is UB? Or does it only mean if you LS a negative and the result doesn't fit in the result type, then it's UB?
Moreover, the preceding clause says:
1/The shift operators << and >> group left-to-right.
shift-expression:
additive-expression
shift-expression << additive-expression
shift-expression >> additive-expression
The operands shall be of integral or unscoped enumeration type and
integral promotions are performed.
The type of the result is that of the promoted left operand. The
behavior is undefined if the right operand is negative, or greater
than or equal to the length in bits of the promoted left operand.
This makes it explicit that using a negative number for one of the operands is UB. If it were UB to use a negative for the other operand, I would expect that to be made clear here as well.
So, bottom line, is:
-1 << 1
Undefined Behavior?
#Angew provided a psudocode interpretation of the Standardese which succinctly expresses one possible (likely) valid interpretation. Others have questioned whether this question is really about the applicability of the language "behavior is undefined" versus our (StackOverflow's) use of the phrase "Undefined Behavior." This edit is to provide some more clarification on what I'm trying to ask.
#Angew's interpretation of the Standardese is:
if (typeof(E1) == unsigned integral)
value = E1 * 2^E2 % blah blah;
else if (typeof(E1) == signed integral && E1 >= 0 && representable(E1 * 2^E2))
value = E1 * 2^E2;
else
value = undefined;
What this question really boils down to is this -- is the correct interpretation actually:
value = E1 left-shift-by (E2)
switch (typeof(E1))
{
case unsigned integral :
value = E1 * 2^E2 % blah blah;
break;
case signed integral :
if (E1 >= 0)
{
if (representable(E1 * 2^E2))
{
value = E1 * 2^E2;
}
else
{
value = undefined;
}
}
break;
}
?
Sidenote, in looking at this in terms of psudocode makes it fairly clear in my mind that #Agnew's interpretation is the correct one.
Yes, I would say it's undefined. If we translate the standardese to pseudo-code:
if (typeof(E1) == unsigned integral)
value = E1 * 2^E2 % blah blah;
else if (typeof(E1) == signed integral && E1 >= 0 && representable(E1 * 2^E2))
value = E1 * 2^E2;
else
value = undefined;
I'd say the reason why they're explicit about the right-hand operand and not about the left-hand one is that the paragrpah you quote (the one with the right-hand operand case) applies to both left and right shifts.
For the left-hand operand, the ruling differs. Left-shifting a negative is undefined, right-shifting it is implementation-defined.
Should this be interpreted to mean that left-shifting any negative number is UB?
Yes, the behavior is undefined when given any negative number. The behavior is only defined when both of the following are true:
the number is non-negative
E1 × 2E2 is representable in the result type
That's literally what "if E1 has a signed type and non-negative value, and E1×2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined," is saying:
if X and Y
then Z
else U
Answer as per the Question:
The question really is:
Can we equate the term "behavior is undefined" equates exactly to the term "Undefined Behavior".
As it is currently worded it means "Undefined Behavior."
Personal comment about the situation
But I am not convinced that is the authors intention.
If it is the authors intention, then we should probably have a note explaining why. But I am more inclined to believe the author meant that the result of that operation is undefined because the representation of negative numbers is not explicitly defined by the standard. If the representation of negative numbers is not explicitly defined for negatives, then moving the bits around would lead to an undefined value.
Either way, the wording (or explanation) needs to be tightened/expanded to make it less ambiguous.