I am trying to understand exaclty how integral promotion works with arithmetic shifts operators. Particularly, I would like to know, which values of a, b, c, d, e, f, g, h are exactly defined according to the C++14 standard, and which ones can depend on the platform/hardware/compiler (assuming that sizeof(int) == 4).
int a = true << 3;
int b = true >> 3;
int c = true << 3U;
int d = true >> 3U;
int e = true << 31;
int f = true >> 31;
int g = true << 31U;
int h = true >> 31U;
From [expr.shift]:
The type of the result is that of the promoted left operand. The behavior is undefined if the right operand
is negative, or greater than or equal to the length in bits of the promoted left operand.
The result type of shifting a bool is always int, regardless of what's on the right hand side. We're never shifting by at least 32 or by a negative number, so we're ok there on all accounts.
For the left-shifts (E1 << E2):
Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable
in the corresponding unsigned type of the result type, then that value, converted to the result type, is the
resulting value; otherwise, the behavior is undefined.
1×231 is representable by unsigned int, and that's the largest left-shift we're doing, so we're ok there on all accounts too.
For the right-shifts (E1 >> E2):
If E1 has a signed type and a negative value, the resulting value is implementation-defined.
E1 is never negative, so we're ok there too! No undefined or implementation-defined behavior anywhere.
Following is mainly a complement to Barry's answer, that clearly explains the rules for left and right shifting.
At least fo C++11, the integral promotion of a bool gives 0 for false and 1 for true : 4.5 Integral promotions [conv.prom] § 6
A prvalue of type bool can be converted to a prvalue of type int, with false becoming zero and true becoming one.
So in original examples, b, d, f and h will all get a 0 value, a and c both get a 8 value: only perfectly defined behaviour until here.
But e and g will receive the unsigned value 0x80000000, so it would be fine if you affected it to an unsigned int variable, but you are using signed 32 bits integers. So you get an integral conversion: 4.7 Integral conversions [conv.integral] §3
If the destination type is signed, the value is unchanged if it can be represented in the destination type; otherwise, the value is implementation-defined.
And unsigned 0x80000000 is not representable in a signed 64 bits integer so the result is implementation defined for e and g.
Related
So I was playing around with types and I came out with this weird result below. Debugging it made no sense, and then the only result was to check out the c++ spects, which didn't helped much. I was wondering if you might know what is happening here exactly, and if it is 32Bit and/or 64Bit specific issue.
#include <iostream>
using namespace std;
int main() {
unsigned int u = 1;
signed int i = 1;
long long lu = -1 * u;
long long li = -1 * i;
std::cout<<"this is a weird " << lu << " " << li << std::endl;
return 0;
}
Where the output is
this is a weird 4294967295 -1
The key observation is that the expression -1 * u is of type unsigned int. That is because the rules for arithmetic conversions* say that if one operand is unsigned int and the other is signed int, then the latter operand is converted to unsigned int. The arithmetic expressions are ultimately only defined for homogeneous operands, so the conversions happen before the operation proper.
The result of the conversion of -1 to unsigned int is a large, positive number, which is representable as a long long int, and which is the number you see in the output.
Currently, that's [expr]/(11.5.3).
The type of -1 is signed int. When you perform an arithmetic operation between objects of different fundamental type, one or both of the arguments will be converted so that both have the same type. (For non-fundamental types, there may be operator overloads for mixed operands). In this case, the signed value is converted to unsigned, following the conversion rules †.
So, -1 was converted to unsigned. But negative numbers cannot be represented by unsigned types. What happens, is that the resulting value will be the smallest positive value that can be represented by the unsigned type, that is congruent with the original signed value modulo the maximum value representable by unsigned type. Which on your platform happens to be 4294967295.
†The rules ([expr], standard draft):
... rules that apply to non-integers ...
Otherwise, the integral promotions (4.5) shall be performed on both operands.61 Then the following
rules shall be applied to the promoted operands:
— If both operands have the same type, no further conversion is needed.
— Otherwise, if both operands have signed integer types or both have unsigned integer types, the
operand with the type of lesser integer conversion rank shall be converted to the type of the
operand with greater rank.
— Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the
rank of the type of the other operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type. (this applies to your case)
— Otherwise, if the type of the operand with signed integer type can represent all of the values of
the type of the operand with unsigned integer type, the operand with unsigned integer type shall
be converted to the type of the operand with signed integer type.
— Otherwise, both operands shall be converted to the unsigned integer type corresponding to the
type of the operand with signed integer type.
The evaluation of
-1 * i
is trivial multiplication of two int types: nothing strange there. And a long long must be capable of holding any int.
First note is there is no such thing as a negative literal in C++, so
-1 * u
is evaluated as (-1) * u due to operator precedence. The type of (-1) must be int. But this will be converted to unsigned int due to C++'s rule of argument conversion as the other argument is an unsigned int In doing that it is converted modulo UINT_MAX + 1, so you end up with UINT_MAX multiplied by 1, which is the number you observe, albeit converted to a long long type.
As a final note, the behaviour of this conversion is subject to the rules of conversion from an unsigned to a signed type: if unsigned int and long long were both 64 bits on your platform then the behaviour is implementation-defined.
The bit pattern "0xFFFFFFFF" corresponds with "-1" when interpreted as a 32b signed integer and corresponds with "4294967295" when interpreted as a 32b unsigned integer.
If used -2 the result is "4294967294"
If used -3 the result is "4294967293"
If used -4 the result is "4294967292"
....
According to cppreference,
For signed a, the value of a << b is a * 2^b if it is
representable [in the unsigned version of the (since C++14)] return
type [(which is then converted to signed: this makes it legal to
create INT_MIN as 1 << 31) (since C++14)], otherwise the behavior
is undefined.
I can't quite understand this specification. What does it mean exactly by if it is representable in the unsigned version of the return type? Does it apply only to the case when the signed value is non-negative? Are INT_MIN << 1 and -1 << 31 well-defined?
If we go to source, the C++14 Standard, this is what we find (with the part about unsigned types highlighted):
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E2 , reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.
For a platform in which std::numeric_limits<int>::digits is 31, it is legal to perform 1 << 31. The resulting value in unsigned int will be 0x80000000 or 2147483648. However, that number is beyond the valid range of value of int for such a platform. However, that bit pattern, when treated as a two's complement representation int is equal to -2147483648, which is the same as INT_MIN for such a platform.
If you use
int a = 2 << 31;
you invoke undefined behavior since 2 * 231 is not representable as an unsigned int in that platform.
Are INT_MIN << 1 and -1 << 31 well-defined?
No, they are not. Bitwise left shifting negative numbers is undefined behavior. Notice the use of non-negative value in the above highlighted text.
I was wondering what this function actually performs.
To my understanding it should return pSrc[1].
So why does it bother left-shifting pSrc[0] by 8 bits, which zeroes out those 8 bits.
And when these zeroes are ORed with pSrc[1], pSrc[1] is not affected so you get pSrc[1] anyway as if the bitwise OR had never happened.
/*
* Get 2 big-endian bytes.
*/
INLINE u2 get2BE(unsigned char const* pSrc)
{
return (pSrc[0] << 8) | pSrc[1];
}
This function is from the source code of the dalvik virtual machine.
https://android.googlesource.com/platform/dalvik/+/android-4.4.4_r1/vm/Bits.h
Update:
OK, now I got it thanks to all the answers here.
(1) pSrc[0] is originally an unsigned char (1 byte).
(2) When it is left-shifted (pSrc[0] << 8) with the literal 8 of int type, pSrc[0] is therefore int-promoted to a signed int (4 byte).
(3) The result of pSrc[0] << 8 is that the interested 8 bits in pSrc[0] are shifted over to the second byte of the 4 bytes of the signed int, thereby leaving zeroes in the other bytes(1st,3rd and 4th bytes).
(4) And when it is ORed ( intermediate result from step (3) | pSrc[1]), pSrc[1] is then int-promoted to a signed int (4 bytes).
(5) The result of ( intermediate result from step (3) | pSrc[1]) leaves the first two least significant bytes the way we want with zeroes all in the two most significant bytes.
(6) return only the first two least significant bytes to get the 2 big-endian bytes by returning the result as a u2 type.
For arithmetic operations like this, the unsigned char is converted via a process called integral promotions.
C++11 - N3485 §5.8 [expr.shift]/1:
The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand.
And §13.6 [over.built]/17:
For every pair of promoted integral types L and R, there exist candidate operator functions of the form
LR operator%(L , R );
LR operator&(L , R );
LR operator^(L , R );
LR operator|(L , R );
L operator<<(L , R );
L operator>>(L , R );
where LR is the result of the usual arithmetic conversions between types L and R.
When integral promotions are done (§4.5 [conv.prom]/1):
A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion
rank (4.13) is less than the rank of int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned
int.
By integral promotions, the unsigned char will be promoted to int. The other operand is already int, so no changes in type are made to it. The return type then becomes int as well.
Thus, what you have is the first unsigned char's bits shifted left, but still in the now-bigger int, and then the second unsigned char's bits at the end.
You'll notice that the return type of operator| is the result of usual arithmetic conversions between the two operands. At this point, those are the int from the shift and the second unsigned char.
This conversion is defined as follows (§5 [expr]/10):
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield
result types in a similar way. The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follows:
…
Otherwise, the integral promotions (4.5) shall be performed on both operands. Then the following
rules shall be applied to the promoted operands:
…
If both operands have the same type, no further conversion is needed.
Since L and R, being promoted before this, are already int, the promotion leaves them the same and the overall return type of the expression is thus int, which is then converted to u2, whatever that happens to be.
There are no operations (other than type conversions) on
unsigned char. Before any operation, integral promotion
occurs, which converts the unsigned char to an int. So the
operation is shifting an int left, not an unsigned char.
C11 6.5.7 Bitwise shift operators
The integer promotions are performed on each of the operands. The type
of the result is that of the promoted left operand. If the value of
the right operand is negative or is greater than or equal to the
width of the promoted left operand, the behavior is undefined.
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with
zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo
one more than the maximum value representable in the result type. If E1 has a signed
type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is
the resulting value; otherwise, the behavior is undefined.
So pSrc[0] is integer promoted to an int. The literal 8 is already an int, so no integer promotion takes place. The usual arithmetic converstions do not apply to shift operators: they are a special case.
Since the original variable was an unsigned char which gets left shifted 8 bits, we also encounter the issue where "E1" (our promoted variable) is signed and potentially the result cannot be representable in the result type, which leads to undefined behavior if this is a 16 bit system.
In plain English: if you shift something into the sign bits of a signed variable, anything can happen. In general: relying on implicit type promotions is bad programming and dangerous practice.
You should fix the code to this:
((unsigned int)pSrc[0] << 8) | (unsigned int)pSrc[1]
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How do promotion rules work when the signedness on either side of a binary operator differ?
I'm trying to wrap my head around integer promotion and overflow in C++. I'm a bit confused with several points:
a) If I have the following code segment:
int i = -15;
unsigned j = 10;
std::cout << i + j;
I get out -5 % UINT_MAX. Is this because the expression i + j is automatically promoted to an unsigned? I was trying to read the standard (4.13):
— The rank of any unsigned integer type shall equal the rank of the corresponding signed integer type.
I'm not sure if I'm reading this incorrectly, but if that is true, why is i + j ending up as unsigned?
b) Adding onto the previous segment, I now have:
int k = j + i;
That is getting evaluated to -5. Shouldn't the expression j + i be evaluated first, giving 4294967291 on my system, and setting that equal to j? That should be out of bounds, so is this behavior undefined? I'm not sure why I get -5.
c) If I change the segment from a) slightly using short, I have:
short i = -15;
unsigned short j = 10;
std::cout << i + j;
I figured when I did this, I would get the same result as a), just with -5 % USHRT_MAX. However, when I execute this, I get -5. Why does using short give a different value than int?
d) I have always learned that the overflow behavior of a signed integral is undefined. For example: int r = ++INT_MAX would be undefined.
However, if there was an unsigned overflow, the quantity would be defined. For example: unsigned a = ++UINT_MAX, then a would be 0. Is that correct?
However, the standard didn't seem to say anything about it. Is that true? If so, why is that?
a) From §5/9:
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:
If either operand is of type long double, the other shall be
converted to long double.
Otherwise, if either operand is double, the other shall be converted to double.
Otherwise, if either operand is float, the other shall be converted to float.
Otherwise, the integral promotions (4.5) shall be performed on both operands.
Then, if either operand is unsigned long the other shall be converted to unsigned long.
Otherwise, if one operand is a long int and the other unsigned int, then if a long int can represent all the values of an unsigned int, the unsigned int shall be converted to a long int; otherwise both operands shall be converted to unsigned long int.
Otherwise, if either operand is long, the other shall be converted to long.
Otherwise, if either operand is unsigned, the other shall be converted to unsigned.
[Note: otherwise, the only remaining case is that both operands are int]
Therefore, since j is unsigned, i is promoted to unsigned and the addition is performed using unsigned int arithmetic.
b) This is UB. The result of the addition is unsigned int (as per (a)), and thus you overflow the int in the assignment.
c) From §4.5/1:
An rvalue of type char, signed char, unsigned char, short int, or unsigned short int can be converted to an rvalue of type int if int can represent all the values of the source type; otherwise, the source rvalue can be converted to an rvalue of type unsigned int.
Therefore, since a 4-byte int can represent any value in a 2-byte short or unsigned short, both are promoted to int (per §5.9's integral promotions rule), and then added as ints.
d) From §3.9.1/4:
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.
Therefore, UINT_MAX+1 is legal (not UB) and equal to 0.
Is the following undefined and why?
int i = 0xFF;
unsigned int r = i << 24;
The behaviour is technically undefined unless the int type has more than 32 bits.
From C++11, 5.8/2 (describing an expression E1 << E2):
if E1 has a signed type and non-negative value, and E1×2E2 is representable
in the result type, then that is the resulting value; otherwise, the behavior is undefined.
The result type of i << 24 is (signed) int; if that has 32 bits or less, then 0xff * 2^24 == 0xff000000 is not representable (the maximum representable 32-bit signed value being 0x7fffffff), so behaviour is undefined as specified in that clause.
According to N3242 section 5.8 Shift operators:
The shift operators << and >> group left-to-right.
shift-expression: additive-expression shift-expression << additive-expression shift-expression >> additive-expression
The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.
So my answer? Depends on the number of bits in your left operand (which depends on your system).