Is left-shifting a negative int Undefined Behavior in C++11?
The relevant Standard passages here are from 5.8:
2/The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are zero-filled. If E1 has an unsigned type, the value of the
result is E1 × 2E2, reduced modulo one more than the maximum value
representable in the result type. Otherwise, if E1 has a signed type
and non-negative value, and E1×2E2 is representable in the result
type, then that is the resulting value; otherwise, the behavior is
undefined.
The part that confuses me is:
Otherwise, if E1 has a signed type and non-negative value, and E1×2E2
is representable in the result type, then that is the resulting value;
otherwise, the behavior is undefined.
Should this be interpreted to mean that left-shifting any negative number is UB? Or does it only mean if you LS a negative and the result doesn't fit in the result type, then it's UB?
Moreover, the preceding clause says:
1/The shift operators << and >> group left-to-right.
shift-expression:
additive-expression
shift-expression << additive-expression
shift-expression >> additive-expression
The operands shall be of integral or unscoped enumeration type and
integral promotions are performed.
The type of the result is that of the promoted left operand. The
behavior is undefined if the right operand is negative, or greater
than or equal to the length in bits of the promoted left operand.
This makes it explicit that using a negative number for one of the operands is UB. If it were UB to use a negative for the other operand, I would expect that to be made clear here as well.
So, bottom line, is:
-1 << 1
Undefined Behavior?
#Angew provided a psudocode interpretation of the Standardese which succinctly expresses one possible (likely) valid interpretation. Others have questioned whether this question is really about the applicability of the language "behavior is undefined" versus our (StackOverflow's) use of the phrase "Undefined Behavior." This edit is to provide some more clarification on what I'm trying to ask.
#Angew's interpretation of the Standardese is:
if (typeof(E1) == unsigned integral)
value = E1 * 2^E2 % blah blah;
else if (typeof(E1) == signed integral && E1 >= 0 && representable(E1 * 2^E2))
value = E1 * 2^E2;
else
value = undefined;
What this question really boils down to is this -- is the correct interpretation actually:
value = E1 left-shift-by (E2)
switch (typeof(E1))
{
case unsigned integral :
value = E1 * 2^E2 % blah blah;
break;
case signed integral :
if (E1 >= 0)
{
if (representable(E1 * 2^E2))
{
value = E1 * 2^E2;
}
else
{
value = undefined;
}
}
break;
}
?
Sidenote, in looking at this in terms of psudocode makes it fairly clear in my mind that #Agnew's interpretation is the correct one.
Yes, I would say it's undefined. If we translate the standardese to pseudo-code:
if (typeof(E1) == unsigned integral)
value = E1 * 2^E2 % blah blah;
else if (typeof(E1) == signed integral && E1 >= 0 && representable(E1 * 2^E2))
value = E1 * 2^E2;
else
value = undefined;
I'd say the reason why they're explicit about the right-hand operand and not about the left-hand one is that the paragrpah you quote (the one with the right-hand operand case) applies to both left and right shifts.
For the left-hand operand, the ruling differs. Left-shifting a negative is undefined, right-shifting it is implementation-defined.
Should this be interpreted to mean that left-shifting any negative number is UB?
Yes, the behavior is undefined when given any negative number. The behavior is only defined when both of the following are true:
the number is non-negative
E1 × 2E2 is representable in the result type
That's literally what "if E1 has a signed type and non-negative value, and E1×2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined," is saying:
if X and Y
then Z
else U
Answer as per the Question:
The question really is:
Can we equate the term "behavior is undefined" equates exactly to the term "Undefined Behavior".
As it is currently worded it means "Undefined Behavior."
Personal comment about the situation
But I am not convinced that is the authors intention.
If it is the authors intention, then we should probably have a note explaining why. But I am more inclined to believe the author meant that the result of that operation is undefined because the representation of negative numbers is not explicitly defined by the standard. If the representation of negative numbers is not explicitly defined for negatives, then moving the bits around would lead to an undefined value.
Either way, the wording (or explanation) needs to be tightened/expanded to make it less ambiguous.
Related
In c++20, signed integers are now defined to use two's complement,
see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r3.html
This is a welcome change, however one of the bullet-points caught my eye:
Change Left-shift on signed integer types produces the same
results as left-shift on the corresponding unsigned integer type.
This seem like a strange change. Will this not shift away the sign bit?
The C++17 wording for signed left shifts (E1 << E2) was:
Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.
Note that it speaks of being representable in "the corresponding unsigned type". So if you have a 32-bit signed integer whose value is the 0x7FFFFFFF, and you left-shift it by 1, the resulting shift is representable in a 32-bit unsigned integer (0xFFFFFFFE). But then this unsigned value gets converted into the result type. And converting an unsigned integer whose value is too big for the corresponding signed type is implementation-defined.
Overall, in C++17, left-shifting into the sign bit could happen through implementation-defined behavior, and even then only if you don't shift beyond the unsigned result type's size. Going past that is explicitly UB.
The C++20 wording, for both signed and unsigned integers, is:
The value of E1 << E2 is the unique value congruent to E1×2E2 modulo 2N, where N is the width of the type of the result.
Integer congruence modulo a number basically means cutting off the bits beyond the modulo number. The "width" of an integer is explicitly defined as:
The range of representable values for a signed integer type is −2N−1 to 2N−1−1 (inclusive), where N is called the width of the type.
This means that for a 32-bit signed integer, the width is 31. So the modulous of the result of a shift is 31 bits, which cuts off the sign bit, explicitly preventing shifting into it.
So in C++20, we have a harder guarantee; implementations can never do a signed left-shift into the sign bit. This is different from C++17 only in the sense that implementation variance/UB has been explicitly defined to not happen.
So left shift wasn't defined to shift into the sign bit in C++17, and is defined not to do so in C++20.
What exactly that quote means probably refers to the fact that left shift on a negative number is now valid, shifting is always well-defined no matter how much shifting you do, and the wording for the signed/unsigned shifting is overall the same.
Yes, the left shifting signed integer behavior changed with C++20.
With C++17, left-shifting a positive signed integer into the sign bit invokes implementation defined behavior.1 Example:
int i = INT_MAX;
int j = i << 1; // implementation defined behavior with std < C++20
C++20 changed this to defined behavior because it mandates two's complement representation for signed integers.2,3
With C++17, shifting a negative signed integer invokes undefined behavior.1 Example:
int i = -1;
int j = i << 1; // undefined behavior with std < C++20
In C++20, this changed as well and this operation now also invokes defined behavior.3
This seem like a strange change. Will this not shift away the sign bit?
Yes, a signed left shift shifts away the sign bit. Example:
int i = 1 << (sizeof(int)*8-1); // C++20: defined behavior, set most significant bit
int j = i << 1; // C++20: defined behavior, set to 0
The main reason for specifying something as undefined or implementation defined behavior is to allow for efficient implementations on different hardware.
Nowadays, since all CPUs implement two's complement it's natural that the C++ standard mandates it. And if you mandate two's complement it's only consequential that you make the above operations defined behavior because this is also how left shift behaves in all two's complement instruction set architectures (ISAs).
IOW, leaving it implementation defined and undefined wouldn't buy you anything.
Or, if you liked the previous undefined behavior why would you care if it gets changed to defined behavior? You can still avoid this operation as before. You wouldn't have to change your code.
1
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned
type, the value of the result is E1 × 2**E2, reduced modulo one more than the maximum value representable in
the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2**E2 is representable
in the corresponding unsigned type of the result type, then that value, converted to the result type, is the
resulting value; otherwise, the behavior is undefined.
(C++17 final working draft, Section 8.8 Shift operators [expr.shift], Paragraph 2, page 132 - emphasis mine)
2
[..] For each value x of a signed integer type, the value of the
corresponding unsigned integer type congruent to x modulo 2 N has the same value of corresponding bits in
its value representation. 41) This is also known as two’s complement representation. [..]
(C++20 latest working draft, Section 6.8.1 Fundamental types [basic.fundamental], Paragraph 3, page 66)
3
The value of E1 << E2 is the unique value congruent to E1 × 2**E2 modulo 2**N, where N is the width of the
type of the result. [Note: E1 is left-shifted E2 bit positions; vacated bits are zero-filled. — end note]
(C++20 latest working draft, Section 7.6.7 Shift operators [expr.shift], Paragraph 2, page 129, link mine)
Is it legal to do the following in C11, C++11 and C++14?
static_assert(((-4) >> 1) == -2, "my code assumes sign-extending right shift");
or the C equivalent:
_Static_assert(((-4) >> 1) == -2, "my code assumes sign-extending right shift");
I don't know the rules for constant-expressions regarding whether you can use implementation-defined operations like the above.
I'm aware that the opposite, signed shift left of negative numbers, is undefined regardless of machine type.
Yes. The C++11 standard says in [expr.shift]/3:
The value of E1 >> E2 is E1 right-shifted E2 bit positions. If
E1 has an unsigned type or if E1 has a signed type and a non-negative
value, the value of the result is the integral part of the quotient of
E1/2^E2. If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
And nowhere in [expr.const]/2 it is said that such a shift, or expressions with implementation-defined values in general, are not constant expressions.
You will thus get a constant expression that has an implementation-defined value.
This is legal, insofaras it doesn't cause undefined behaviour.
The behaviour of right-shift of negative values is implementation-defined. The C and C++ standards do not guarantee it to be either arithmetic or logical; although so far as I know there has never been a CPU that didn't pick one or the other.
In all versions of C and C++ prior to 2014, writing
1 << (CHAR_BIT * sizeof(int) - 1)
caused undefined behaviour, because left-shifting is defined as being equivalent to successive multiplication by 2, and this shift causes signed integer overflow:
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. [...] If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
However in C++14 the text has changed for << but not for multiplication:
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. [...] Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.
The behaviour is now the same as for out-of-range assignment to signed type, i.e. as covered by [conv.integral]/3:
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.
This means it's still non-portable to write 1 << 31 (on a system with 32-bit int). So why was this change made in C++14?
The relevant issue is CWG 1457, where the justification is that the change allows 1 << 31 to be used in constant expressions:
The current wording of 5.8 [expr.shift] paragraph 2 makes it undefined
behavior to create the most-negative integer of a given type by
left-shifting a (signed) 1 into the sign bit, even though this is not
uncommonly done and works correctly on the majority of
(twos-complement) architectures:
...if E1 has a signed type and non-negative value, and E1 * 2E2 is
representable in the result type, then that is the resulting value;
otherwise, the behavior is undefined.
As a result, this technique
cannot be used in a constant expression, which will break a
significant amount of code.
Constant expressions can't contain undefined behavior, which means that using an expression containing UB in a context requiring a constant expression makes the program ill-formed. libstdc++'s numeric_limits::min, for example, once failed to compile in clang due to this.
I understand the reason why C++ define INT_MIN as (-2147483647 - 1), but why don't they just use 1<<31 ? That prevent the overflow and also easy to understand.
That prevent the overflow and also easy to understand
How could it prevent the overflow, if by left-shifting a positive number you are trying to obtain a negative one? ;)
Keep in mind, that signed integer overflow is Undefined Behavior. Per paragraph 5.8/2 of the C++11 Standard:
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. [...] Otherwise, if E1 has a signed type and non-negative value, and E1×2^E2 is representable
in the corresponding unsigned type of the result type, then that value, converted to the result type, is the
resulting value; otherwise, the behavior is undefined.
Also, per paragraph 5/4:
If during the evaluation of an expression, the result is not mathematically defined or not in the range of
representable values for its type, the behavior is undefined. [...]
Because 1 << 31 invokes undefined behaviour (assuming 32-bit int).
Is the following undefined and why?
int i = 0xFF;
unsigned int r = i << 24;
The behaviour is technically undefined unless the int type has more than 32 bits.
From C++11, 5.8/2 (describing an expression E1 << E2):
if E1 has a signed type and non-negative value, and E1×2E2 is representable
in the result type, then that is the resulting value; otherwise, the behavior is undefined.
The result type of i << 24 is (signed) int; if that has 32 bits or less, then 0xff * 2^24 == 0xff000000 is not representable (the maximum representable 32-bit signed value being 0x7fffffff), so behaviour is undefined as specified in that clause.
According to N3242 section 5.8 Shift operators:
The shift operators << and >> group left-to-right.
shift-expression: additive-expression shift-expression << additive-expression shift-expression >> additive-expression
The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.
So my answer? Depends on the number of bits in your left operand (which depends on your system).