Right shift and signed integer - c++

On my compiler, the following pseudo code (values replaced with binary):
sint32 word = (10000000 00000000 00000000 00000000);
word >>= 16;
produces a word with a bitfield that looks like this:
(11111111 11111111 10000000 00000000)
Can I rely on this behaviour for all platforms and C++ compilers?

From the following link:
INT34-C. Do not shift an expression by a negative number of bits or by greater than or equal to the number of bits that exist in the operand
Noncompliant Code Example (Right Shift)
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2. If E1 has a signed type and a negative value, the resulting value is implementation defined and can be either an arithmetic (signed) shift:
Or a logical (unsigned) shift:
This noncompliant code example fails to test whether the right operand is greater than or equal to the width of the promoted left operand, allowing undefined behavior.
unsigned int ui1;
unsigned int ui2;
unsigned int uresult;
/* Initialize ui1 and ui2 */
uresult = ui1 >> ui2;
Making assumptions about whether a right shift is implemented as an arithmetic (signed) shift or a logical (unsigned) shift can also lead to vulnerabilities. See recommendation INT13-C. Use bitwise operators only on unsigned operands.

From the latest C++20 draft:
Right-shift on signed integral types is an arithmetic right shift, which performs sign-extension.

No, you can't rely on this behaviour. Right shifting of negative quantities (which I assume your example is dealing with) is implementation defined.

In C++, no. It is implementation and/or platform dependent.
In some other languages, yes. In Java, for example, the >> operator is precisely defined to always fill using the left most bit (thereby preserving sign). The >>> operator fills using 0s. So if you want reliable behavior, one possible option would be to change to a different language. (Although obviously, this may not be an option depending on your circumstances.)

AFAIK integers may be represented as sign-magnitude in C++, in which case sign extension would fill with 0s. So you can't rely on this.

Related

does signed integers now behave differently, with regards to left shift?

In c++20, signed integers are now defined to use two's complement,
see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r3.html
This is a welcome change, however one of the bullet-points caught my eye:
Change Left-shift on signed integer types produces the same
results as left-shift on the corresponding unsigned integer type.
This seem like a strange change. Will this not shift away the sign bit?
The C++17 wording for signed left shifts (E1 << E2) was:
Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.
Note that it speaks of being representable in "the corresponding unsigned type". So if you have a 32-bit signed integer whose value is the 0x7FFFFFFF, and you left-shift it by 1, the resulting shift is representable in a 32-bit unsigned integer (0xFFFFFFFE). But then this unsigned value gets converted into the result type. And converting an unsigned integer whose value is too big for the corresponding signed type is implementation-defined.
Overall, in C++17, left-shifting into the sign bit could happen through implementation-defined behavior, and even then only if you don't shift beyond the unsigned result type's size. Going past that is explicitly UB.
The C++20 wording, for both signed and unsigned integers, is:
The value of E1 << E2 is the unique value congruent to E1×2E2 modulo 2N, where N is the width of the type of the result.
Integer congruence modulo a number basically means cutting off the bits beyond the modulo number. The "width" of an integer is explicitly defined as:
The range of representable values for a signed integer type is −2N−1 to 2N−1−1 (inclusive), where N is called the width of the type.
This means that for a 32-bit signed integer, the width is 31. So the modulous of the result of a shift is 31 bits, which cuts off the sign bit, explicitly preventing shifting into it.
So in C++20, we have a harder guarantee; implementations can never do a signed left-shift into the sign bit. This is different from C++17 only in the sense that implementation variance/UB has been explicitly defined to not happen.
So left shift wasn't defined to shift into the sign bit in C++17, and is defined not to do so in C++20.
What exactly that quote means probably refers to the fact that left shift on a negative number is now valid, shifting is always well-defined no matter how much shifting you do, and the wording for the signed/unsigned shifting is overall the same.
Yes, the left shifting signed integer behavior changed with C++20.
With C++17, left-shifting a positive signed integer into the sign bit invokes implementation defined behavior.1 Example:
int i = INT_MAX;
int j = i << 1; // implementation defined behavior with std < C++20
C++20 changed this to defined behavior because it mandates two's complement representation for signed integers.2,3
With C++17, shifting a negative signed integer invokes undefined behavior.1 Example:
int i = -1;
int j = i << 1; // undefined behavior with std < C++20
In C++20, this changed as well and this operation now also invokes defined behavior.3
This seem like a strange change. Will this not shift away the sign bit?
Yes, a signed left shift shifts away the sign bit. Example:
int i = 1 << (sizeof(int)*8-1); // C++20: defined behavior, set most significant bit
int j = i << 1; // C++20: defined behavior, set to 0
The main reason for specifying something as undefined or implementation defined behavior is to allow for efficient implementations on different hardware.
Nowadays, since all CPUs implement two's complement it's natural that the C++ standard mandates it. And if you mandate two's complement it's only consequential that you make the above operations defined behavior because this is also how left shift behaves in all two's complement instruction set architectures (ISAs).
IOW, leaving it implementation defined and undefined wouldn't buy you anything.
Or, if you liked the previous undefined behavior why would you care if it gets changed to defined behavior? You can still avoid this operation as before. You wouldn't have to change your code.
1
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned
type, the value of the result is E1 × 2**E2, reduced modulo one more than the maximum value representable in
the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2**E2 is representable
in the corresponding unsigned type of the result type, then that value, converted to the result type, is the
resulting value; otherwise, the behavior is undefined.
(C++17 final working draft, Section 8.8 Shift operators [expr.shift], Paragraph 2, page 132 - emphasis mine)
2
[..] For each value x of a signed integer type, the value of the
corresponding unsigned integer type congruent to x modulo 2 N has the same value of corresponding bits in
its value representation. 41) This is also known as two’s complement representation. [..]
(C++20 latest working draft, Section 6.8.1 Fundamental types [basic.fundamental], Paragraph 3, page 66)
3
The value of E1 << E2 is the unique value congruent to E1 × 2**E2 modulo 2**N, where N is the width of the
type of the result. [Note: E1 is left-shifted E2 bit positions; vacated bits are zero-filled. — end note]
(C++20 latest working draft, Section 7.6.7 Shift operators [expr.shift], Paragraph 2, page 129, link mine)

Bitwise operators and signed types

I'm reading C++ Primer and I'm slightly confused by a few comments which talk about how Bitwise operators deal with signed types. I'll quote:
Quote #1
(When talking about Bitwise operators) "If the operand is signed and
its value is negative, then the way that the “sign bit” is handled in
a number of the bitwise operations is machine dependent. Moreover,
doing a left shift that changes the value of the sign bit is
undefined"
Quote #2
(When talking about the rightshift operator) "If that operand is
unsigned, then the operator inserts 0-valued bits on the left; if it
is a signed type, the result is implementation defined—either copies
of the sign bit or 0-valued bits are inserted on the left."
The bitwise operators promote small integers (such as char) to signed ints. Isn't there an issue with this promotion to signed ints when bitwise operators often gives undefined or implementation-defined behaviour on signed operator types? Why wouldn't the standard promote char to unsigned int?
Edit: Here is the question I took out, but I've placed it back for context with some answers below.
An exercise later asks
"What is the value of ~'q' << 6 on a machine with 32-bit ints and 8 bit chars, that uses Latin-1 character set in which 'q' has the bit pattern 01110001?"
Well, 'q' is a character literal and would be promoted to int, giving
~'q' == ~0000000 00000000 00000000 01110001 == 11111111 11111111 11111111 10001110
The next step is to apply a left shift operator to the bits above, but as quote #1 mentions
"doing a left shift that changes the value of the sign bit is
undefined"
well I don't exactly know which bit is the sign bit, but surely the answer is undefined?
You're quite correct -- the expression ~'q' << 6 is undefined behavior according to the standard. Its even worse than you state, as the ~ operator is defined as computing "The one's complement" of the value, which is meaningless for a signed (2s-complement) integer -- the term "one's complement" only really means anything for an unsigned integer.
When doing bitwise operations, if you want strictly well-defined (according to the standard) results, you generally have to ensure that the values being operated on are unsigned. You can do that either with explicit casts, or by using explicitly unsigned constants (U-suffix) in binary operations. Doing a binary operation with a signed and unsigned int is done as unsigned (the signed value is converted to unsigned).
C and C++ are subtley different with the integer promotions, so you need to be careful here -- C++ will convert a smaller-than-int unsigned value to int (signed) before comparing with the other operand to see what should be done, while C will compare operands first.
It might be simplest to read the exact text of the Standard, instead of a summary like in Primer Plus. (The summary has to leave out detail by virtue of being a summary!)
The relevant portions are:
[expr.shift]
The shift operators << and >> group left-to-right.
The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E2 , reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.
[expr.unary.op]/10
The operand of ˜ shall have integral or unscoped enumeration type; the result is the one’s complement of its operand. Integral promotions are performed. The type of the result is the type of the promoted operand.
Note that neither of these performs the usual arithmetic conversions (which is the conversion to a common type that is done by most of the binary operators).
The integral promotions:
[conv.prom]/1
A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.
(There are other entries for the types in the "other than" list, I have omitted them here but you can look it up in a Standard draft).
The thing to remmeber about the integer promotions is that they are value-preserving , if you have a char of value -30, then after promotion it will be an int of value -30. You don't need to think about things like "sign extension".
Your initial analysis of ~'q' is correct, and the result has type int (because int can represent all the values of char on normal systems).
It turns out that any int whose most significant bit is set represents a negative value (there are rules about this in another part of the standard that I haven't quoted here), so ~'q' is a negative int.
Looking at [expr.shift]/2 we see that this means left-shifting it causes undefined behaviour (it's not covered by any of the earlier cases in that paragraph).
Of course, by editing the question, my answer is now partly answering a different question than the one posed, so here goes an attempt to answer the "new" question:
The promotion rules (what gets converted to what) are well defined in the standard. The type char may be either signed or unsigned - in some compilers you can even give a flag to the compiler to say "I want unsigned char type" or "I want signed char type" - but most compilers just define char as either signed or unsigned.
A constant, such as 6 is signed by default. When an operation, such as 'q' << 6 is written in the code, the compiler will convert any smaller type to any larger type [or if you do any arithmetic in general, char is converted to int], so 'q' becomes the integer value of 'q'. If you want to avoid that, you should use 6u, or an explicit cast, such as static_cast<unsigned>('q') << 6 - that way, you are ensured that the operand is converted to unsigned, rather than signed.
The operations are undefined because different hardware behaves differently, and there are architectures with "strange" numbering systems, which means that the standards committee has to choose between "ruling out/making operations extremely inefficient" or "defining the standard in a way that isn't very clear". In a few architectures, overflowing integers may also be a trap, and if you shift such that you change the sign on the number, that typically counts as an overflow - and since trapping typically means "your code no longer runs", that would not be what your average programmer expects -> falls under the umbrella of "undefined behaviour". Most processors don't, and nothing really bad will happen if you do that.
Old answer:
So the solution to avoid this is to always cast your signed values (including char) to unsigned before shifting them (or accept that your code may not work on another compiler, the same compiler with different options, or the next release of the same compiler).
It is also worth noting that the resulting value is "nearly always what you expect" (in that the compiler/processor will just perform the left or right shift on the value, on right shifts using the sign bit to shift down), it's just undefined or implementation defined because SOME machine architectures may not have hardware to "do this right", and C compilers still need to work on those systems.
The sign bit is the highest bit in a twos-complement, and you are not changing that by shifting that number:
11111111 11111111 11111111 10001110 << 6 =
111111 11111111 11111111 11100011 10000000
^^^^^^--- goes away.
result=11111111 11111111 11100011 10000000
Or as a hex number: 0xffffe380.

C++: Are left/right bitshifts for negative and large values defined?

My question is, within C++, is the following code defined? Some of it? And if it is, what's it supposed to do in these four scenarios?
word << 100;
word >> 100;
word << -100;
word >> -100;
word is a uint32_t
(This is for a bottleneck in a 3d lighting renderer. One of the more minor improvements in the inner most loop I wanna make is eliminating needless conditional forks. One of those forks is checking to see if a left shift should be done on several 32 bit words as part of a hamming weight count. If the left shift accepts absurd values, the checks don't need done at all)
In the C++0X draft N3290, §5.8:
The behavior is undefined if the right operand is negative,
or greater than or equal to the length in bits of the promoted left
operand.
Note: the above paragraph is identical in the C++03 standard.
So the last two are undefined. The others, I believe depend on whether word is signed or not, if word is at least 101bits long. If word is "smaller" than 101bits, the above applies and the behavior is undefined.
Here are the next two sections of that paragraph in C++0X (these do differ in C++03):
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned
type, the value of the result is E1 × 2E2 , reduced modulo one more than the maximum value representable
in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 is representable
in the result type, then that is the resulting value; otherwise, the behavior is undefined.
The value of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed
type and a non-negative value, the value of the result is the integral part of the quotient of E1/2E2 . If E1
has a signed type and a negative value, the resulting value is implementation-defined.
The C standard doesn't say what should happen when shift count is negative or greater than (or even equal) to the precision of the variable.
The reason is that the C standard didn't want to impose a behavior that would require extra code to be emitted in case of parametric shift. Since different CPUs do different things the standard says that anything can happen.
With x86 hardware the shift operator only uses last 5 bits of the shift counter to decide the shift amount (this can be seen by reading the CPU reference manual) so this is what most probably will happen with any C or C++ compiler on that platform.
See also this answer for a similar question.

Applying bitwise shift operators to signed types: UB and Impl. defined

C++03 standard tells us that the result of applying the bitwise shift operators to signed types can be UB and Impl. defined for negative values. My question is following: why for operator << it has undefined behaviour, while for operator >> it is just implementation defined ? Is there a strict reason why the result of << couldn't be implementation defined also ?
Thanks in advance.
According to 5.8/2 (admittedly in C++ 98 which is all I have access to):
The value of E1 << E2 is E1
(interpreted as a bit pattern)
left shifted E2 bit positions; vacated
bits are zero filled. If E1 has an
unsigned type, the value of the result
is E1 multiplied by the quantity 2
raised to the power E2, reduced modulo
ULONG_MAX+1 if E1 has type unsigned
long, UINT_MAX+1 otherwise.
From this it looks to me like it's perfectly well defined for left shift. What's not defined is the representation of signed values (such as twos-complement) used so the numeric value of the result is implementation defined for negative values.
This is in contrast to right-shifting where the vacated bits may be zero or one filled depending on the representation of signed values.

Shift Operators in C++

If the value after the shift operator
is greater than the number of bits in
the left-hand operand, the result is
undefined. If the left-hand operand is
unsigned, the right shift is a logical
shift so the upper bits will be filled
with zeros. If the left-hand operand
is signed, the right shift may or may
not be a logical shift (that is, the
behavior is undefined).
Can somebody explain me what the above lines mean??
It doesn't matter too much what those lines mean, they are substantially incorrect.
"If the value after the shift operator
is greater than the number of bits in
the left-hand operand, the result is
undefined."
Is true, but should say "greater than or equal to". 5.8/1:
... the behavior is undefined if the
right hand operand is negative, or
greater than or equal to the length in
bits of the promoted left operand.
Undefined behavior means "don't do it" (see later). That is, if int is 32 bits on your system, then you can't validly do any of the following:
int a = 0; // this is OK
a >> 32; // undefined behavior
a >> -1; // UB
a << 32; // UB
a = (0 << 32); // Either UB, or possibly an ill-formed program. I'm not sure.
"If the left-hand operand is unsigned, the right shift is a logical shift so the upper bits will be filled with zeros."
This is true. 5.8/3 says:
If E1 has unsigned type or if E1 has a
signed type and a nonnegative value,
the result is the integral part of the
quotient of E1 divided by the quantity
2 raised to the power E2
if that makes any more sense to you. >>1 is the same as dividing by 2, >>2 dividing by 4, >>3 by 8, and so on. In a binary representation of a positive value, dividing by 2 is the same as moving all the bits one to the right, discarding the smallest bit, and filling in the largest bit with 0.
"If the left-hand operand is signed, the right shift may or may not be a logical shift (that is, the behavior is undefined)."
First part is true (it may or may not be a logical shift - it is on some compilers/platforms but not others. I think by far the most common behaviour is that it is not). Second part is false, the behavior is not undefined. Undefined behavior means that anything is permitted to happen - a crash, demons flying out of your nose, a random value, whatever. The standard doesn't care. There are plenty of cases where the C++ standard says behavior is undefined, but this is not one of them.
In fact, if the left hand operand is signed, and the value is positive, then it behaves the same as an unsigned shift.
If the left hand operand is signed, and the value is negative, then the resulting value is implementation-defined. It isn't allowed to crash or catch fire. The implementation must produce a result, and the documentation for the implementation must contain enough information to define what the result will be. In practice, the "documentation for the implementation" starts with the compiler documentation, but that might refer you implicitly or explicitly to other docs for the OS and/or the CPU.
Again from the standard, 5.8/3:
If E1 has signed type and negative
value, the resulting value is
implementation-defined.
I'm assuming you know what it means by shifting. Lets say you're dealing with a 8-bit chars
unsigned char c;
c >> 9;
c >> 4;
signed char c;
c >> 4;
The first shift, the compiler is free to do whatever it wants, because 9 > 8 [the number of bits in a char]. Undefined behavior means all bets are off, there is no way of knowing what will happen. The second shift is well defined. You get 0s on the left: 11111111 becomes 00001111. The third shift is, like the first, undefined.
Note that, in this third case, it doesn't matter what the value of c is. When it refers to signed, it means the type of the variable, not whether or not the actual value is greater than zero. signed char c = 5 and signed char c = -5 are both signed, and shifting to the right is undefined behavior.
If the value after the shift operator is greater than the number of bits in the left-hand operand, the result is undefined.
It means (unsigned int)x >> 33 can do anything[1].
If the left-hand operand is unsigned, the right shift is a logical shift so the upper bits will be filled with zeros.
It means 0xFFFFFFFFu >> 4 must be 0x0FFFFFFFu
If the left-hand operand is signed, the right shift may or may not be a logical shift (that is, the behavior is undefined).
It means 0xFFFFFFFF >> 4 can be 0xFFFFFFFF (arithmetic shift) or 0x0FFFFFFF (logical shift) or anything-allowed-by-physical-law, i.e. the result is undefined.
[1]: on 32-bit machine with a 32-bit int.
If the value after the shift operator
is greater than the number of bits in
the left-hand operand, the result is
undefined.
If you try to shift a 32-bit integer by 33 the result is undefined. i.e., It may or may not be all zeros.
If the left-hand operand is unsigned,
the right shift is a logical shift so
the upper bits will be filled with
zeros.
Unsigned data type will be padded with zeros when right shifting.
so 1100 >> 1 == 0110
If the left-hand operand is signed,
the right shift may or may not be a
logical shift (that is, the behavior
is undefined).
If the data type is signed, the behavior is not defined. Signed data types are stored in a special format, where the left most bit indicates positive or negative. So shifting on a signed integer may not do what you expect. See the Wikipedia article for details.
http://en.wikipedia.org/wiki/Logical_shift
To give some context, here's the start of that paragraph:
The shift operators also manipulate bits. The left-shift operator (<<) produces the operand to the left of the operator shifted to the left by the number of bits specified after the operator. The right-shift operator (>>) produces the operand to the left of the operator shifted to the right by the number of bits specified after the operator.
Now the rest, with explanations:
If the value after the shift operator is greater than the number of bits in the left-hand operand, the result is undefined.
If you have a 32 bit integer and you try to bit shift 33 bits, that's not allowed and the result is undefined. In other words, the result could be anything, or your program could crash.
If the left-hand operand is unsigned, the right shift is a logical shift so the upper bits will be filled with zeros.
This says that it's defined to write a >> b when a is an unsigned int. As you shift right, the least significant bits are removed, other bits are shifted down, and the most significant bits become zero.
In other words:
This: 110101000101010 >> 1
becomes: 011010100010101
If the left-hand operand is signed, the right shift may or may not be a logical shift (that is, the behavior is undefined).
Actually I believe that the behaviour here is implementation defined when a is negative and defined when a is positive rather than undefined as suggested in the quote. This means that if you do a >> b when a is a negative integer, there are many different things that might happen. To see which you get, you should read the documentation for your compiler. A common implementation is to shift in zeros if the number is positive, and ones if the number is negative, but you shouldn't rely on this behaviour if you wish to write portable code.
I suppose the key word is "undefined", which means that the specification does not say what should happen. Most compilers will do something sensible in such cases, but you cannot depend on any behaviour generally. It is usually best to avoid invoking undefined behavior unless the documentation for the compiler you are using states what it does in the specific case.
The first sentence says it's undefined if you try to shift, for example, a 32 bit value by more than 32 bits.
The second says that if you shift an unsigned int right, the left hand bits will get filled with zeros.
The third says that if you shift a signed int right, it is not defined what will be put in the left hand bits.