What happens when you bit shift beyond the end of a variable? - c++

If you have some variable (on the stack) and you left or right bit shift beyond its end what happens?
i.e.
byte x = 1;
x >> N;
What if x is a pointer to memory cast to a byte and you do the same thing?
byte* x = obtain pointer from somewhere;
*x = 1;
*x >> N;

It does not (necessarily) become zero. The behavior is undefined (C99 §6.5.7, "Bitwise shift operators"):
If the value of the right operand is
negative or is greater than or equal
to the width of the promoted left
operand, the behavior is undefined.
(C++0x §5.8, "Shift operators"):
The behavior is undefined if the right
operand is negative, or greater than
or equal to the length in bits of the
promoted left operand.
The storage of the value being shifted has no effect on any of this.

I think you're confused about what bitshifts do. They are arithmetic operators equivalent to multiplication or division by a power of 2 (modulo some weirdness about how C treats negative numbers). They do not move any bits in memory. The only way the contents of any variable/memory get changed are if you assign the result of the expression back somewhere.
As for what happens when the righthand operand of a bitshift operator is greater than or equal to the width of the type of the lefthand expression, the behavior is undefined.

I think you're confused. x >> y does not actually change x in the first place. It calculates a new value.
As Stephen noted, y must not be negative, and it must be less than "the width of the promoted left operand" (read up on type promotion). But otherwise, bits that shift "off the end" are simply discarded. 1 >> 2 (notice that 2 is not negative, and it is less than the number of bits used to represent 1, which is probably 32 but certainly at least 16) evaluates to 0.

Related

strange c++ behaviour for shift operator [duplicate]

If the value after the shift operator
is greater than the number of bits in
the left-hand operand, the result is
undefined. If the left-hand operand is
unsigned, the right shift is a logical
shift so the upper bits will be filled
with zeros. If the left-hand operand
is signed, the right shift may or may
not be a logical shift (that is, the
behavior is undefined).
Can somebody explain me what the above lines mean??
It doesn't matter too much what those lines mean, they are substantially incorrect.
"If the value after the shift operator
is greater than the number of bits in
the left-hand operand, the result is
undefined."
Is true, but should say "greater than or equal to". 5.8/1:
... the behavior is undefined if the
right hand operand is negative, or
greater than or equal to the length in
bits of the promoted left operand.
Undefined behavior means "don't do it" (see later). That is, if int is 32 bits on your system, then you can't validly do any of the following:
int a = 0; // this is OK
a >> 32; // undefined behavior
a >> -1; // UB
a << 32; // UB
a = (0 << 32); // Either UB, or possibly an ill-formed program. I'm not sure.
"If the left-hand operand is unsigned, the right shift is a logical shift so the upper bits will be filled with zeros."
This is true. 5.8/3 says:
If E1 has unsigned type or if E1 has a
signed type and a nonnegative value,
the result is the integral part of the
quotient of E1 divided by the quantity
2 raised to the power E2
if that makes any more sense to you. >>1 is the same as dividing by 2, >>2 dividing by 4, >>3 by 8, and so on. In a binary representation of a positive value, dividing by 2 is the same as moving all the bits one to the right, discarding the smallest bit, and filling in the largest bit with 0.
"If the left-hand operand is signed, the right shift may or may not be a logical shift (that is, the behavior is undefined)."
First part is true (it may or may not be a logical shift - it is on some compilers/platforms but not others. I think by far the most common behaviour is that it is not). Second part is false, the behavior is not undefined. Undefined behavior means that anything is permitted to happen - a crash, demons flying out of your nose, a random value, whatever. The standard doesn't care. There are plenty of cases where the C++ standard says behavior is undefined, but this is not one of them.
In fact, if the left hand operand is signed, and the value is positive, then it behaves the same as an unsigned shift.
If the left hand operand is signed, and the value is negative, then the resulting value is implementation-defined. It isn't allowed to crash or catch fire. The implementation must produce a result, and the documentation for the implementation must contain enough information to define what the result will be. In practice, the "documentation for the implementation" starts with the compiler documentation, but that might refer you implicitly or explicitly to other docs for the OS and/or the CPU.
Again from the standard, 5.8/3:
If E1 has signed type and negative
value, the resulting value is
implementation-defined.
I'm assuming you know what it means by shifting. Lets say you're dealing with a 8-bit chars
unsigned char c;
c >> 9;
c >> 4;
signed char c;
c >> 4;
The first shift, the compiler is free to do whatever it wants, because 9 > 8 [the number of bits in a char]. Undefined behavior means all bets are off, there is no way of knowing what will happen. The second shift is well defined. You get 0s on the left: 11111111 becomes 00001111. The third shift is, like the first, undefined.
Note that, in this third case, it doesn't matter what the value of c is. When it refers to signed, it means the type of the variable, not whether or not the actual value is greater than zero. signed char c = 5 and signed char c = -5 are both signed, and shifting to the right is undefined behavior.
If the value after the shift operator is greater than the number of bits in the left-hand operand, the result is undefined.
It means (unsigned int)x >> 33 can do anything[1].
If the left-hand operand is unsigned, the right shift is a logical shift so the upper bits will be filled with zeros.
It means 0xFFFFFFFFu >> 4 must be 0x0FFFFFFFu
If the left-hand operand is signed, the right shift may or may not be a logical shift (that is, the behavior is undefined).
It means 0xFFFFFFFF >> 4 can be 0xFFFFFFFF (arithmetic shift) or 0x0FFFFFFF (logical shift) or anything-allowed-by-physical-law, i.e. the result is undefined.
[1]: on 32-bit machine with a 32-bit int.
If the value after the shift operator
is greater than the number of bits in
the left-hand operand, the result is
undefined.
If you try to shift a 32-bit integer by 33 the result is undefined. i.e., It may or may not be all zeros.
If the left-hand operand is unsigned,
the right shift is a logical shift so
the upper bits will be filled with
zeros.
Unsigned data type will be padded with zeros when right shifting.
so 1100 >> 1 == 0110
If the left-hand operand is signed,
the right shift may or may not be a
logical shift (that is, the behavior
is undefined).
If the data type is signed, the behavior is not defined. Signed data types are stored in a special format, where the left most bit indicates positive or negative. So shifting on a signed integer may not do what you expect. See the Wikipedia article for details.
http://en.wikipedia.org/wiki/Logical_shift
To give some context, here's the start of that paragraph:
The shift operators also manipulate bits. The left-shift operator (<<) produces the operand to the left of the operator shifted to the left by the number of bits specified after the operator. The right-shift operator (>>) produces the operand to the left of the operator shifted to the right by the number of bits specified after the operator.
Now the rest, with explanations:
If the value after the shift operator is greater than the number of bits in the left-hand operand, the result is undefined.
If you have a 32 bit integer and you try to bit shift 33 bits, that's not allowed and the result is undefined. In other words, the result could be anything, or your program could crash.
If the left-hand operand is unsigned, the right shift is a logical shift so the upper bits will be filled with zeros.
This says that it's defined to write a >> b when a is an unsigned int. As you shift right, the least significant bits are removed, other bits are shifted down, and the most significant bits become zero.
In other words:
This: 110101000101010 >> 1
becomes: 011010100010101
If the left-hand operand is signed, the right shift may or may not be a logical shift (that is, the behavior is undefined).
Actually I believe that the behaviour here is implementation defined when a is negative and defined when a is positive rather than undefined as suggested in the quote. This means that if you do a >> b when a is a negative integer, there are many different things that might happen. To see which you get, you should read the documentation for your compiler. A common implementation is to shift in zeros if the number is positive, and ones if the number is negative, but you shouldn't rely on this behaviour if you wish to write portable code.
I suppose the key word is "undefined", which means that the specification does not say what should happen. Most compilers will do something sensible in such cases, but you cannot depend on any behaviour generally. It is usually best to avoid invoking undefined behavior unless the documentation for the compiler you are using states what it does in the specific case.
The first sentence says it's undefined if you try to shift, for example, a 32 bit value by more than 32 bits.
The second says that if you shift an unsigned int right, the left hand bits will get filled with zeros.
The third says that if you shift a signed int right, it is not defined what will be put in the left hand bits.

How does shift operator work with negative numbers in c++

int main()
{
int x = -2;
cout << (1<<x) << endl;
cout << (1<<-2) << endl;
}
Here the (1<<x) prints 1073741824 (how is this calculated)
Whereas (1<<-2) prints a garbage value.
And why do these two return different answers?
According to the C Standard (6.5.7 Bitwise shift operators)
3 The integer promotions are performed on each of the operands. The
type of the result is that of the promoted left operand. If the
value of the right operand is negative or is greater than or equal to
the width of the promoted left operand, the behavior is undefined
The same is written in the C++ Standard (C++20, 7.6.7 Shift operators)
... The operands shall be of integral or unscoped enumeration type and
integral promotions are performed. The type of the result is that
of the promoted left operand. The behavior is undefined if the right
operand is negative, or greater than or equal to the width of the
promoted left operand.
In the standard, http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3690.pdf
Page 118, Section 5.8.1:
The behavior is undefined if the right operand is negative, or
greater than or equal to the length in bits of the promoted left
operand
Meaning the compiler can do whatever it wants here - all bets are off.

Is left-shifting (<<) a negative integer undefined behavior in C++11?

Is left-shifting a negative int Undefined Behavior in C++11?
The relevant Standard passages here are from 5.8:
2/The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are zero-filled. If E1 has an unsigned type, the value of the
result is E1 × 2E2, reduced modulo one more than the maximum value
representable in the result type. Otherwise, if E1 has a signed type
and non-negative value, and E1×2E2 is representable in the result
type, then that is the resulting value; otherwise, the behavior is
undefined.
The part that confuses me is:
Otherwise, if E1 has a signed type and non-negative value, and E1×2E2
is representable in the result type, then that is the resulting value;
otherwise, the behavior is undefined.
Should this be interpreted to mean that left-shifting any negative number is UB? Or does it only mean if you LS a negative and the result doesn't fit in the result type, then it's UB?
Moreover, the preceding clause says:
1/The shift operators << and >> group left-to-right.
shift-expression:
additive-expression
shift-expression << additive-expression
shift-expression >> additive-expression
The operands shall be of integral or unscoped enumeration type and
integral promotions are performed.
The type of the result is that of the promoted left operand. The
behavior is undefined if the right operand is negative, or greater
than or equal to the length in bits of the promoted left operand.
This makes it explicit that using a negative number for one of the operands is UB. If it were UB to use a negative for the other operand, I would expect that to be made clear here as well.
So, bottom line, is:
-1 << 1
Undefined Behavior?
#Angew provided a psudocode interpretation of the Standardese which succinctly expresses one possible (likely) valid interpretation. Others have questioned whether this question is really about the applicability of the language "behavior is undefined" versus our (StackOverflow's) use of the phrase "Undefined Behavior." This edit is to provide some more clarification on what I'm trying to ask.
#Angew's interpretation of the Standardese is:
if (typeof(E1) == unsigned integral)
value = E1 * 2^E2 % blah blah;
else if (typeof(E1) == signed integral && E1 >= 0 && representable(E1 * 2^E2))
value = E1 * 2^E2;
else
value = undefined;
What this question really boils down to is this -- is the correct interpretation actually:
value = E1 left-shift-by (E2)
switch (typeof(E1))
{
case unsigned integral :
value = E1 * 2^E2 % blah blah;
break;
case signed integral :
if (E1 >= 0)
{
if (representable(E1 * 2^E2))
{
value = E1 * 2^E2;
}
else
{
value = undefined;
}
}
break;
}
?
Sidenote, in looking at this in terms of psudocode makes it fairly clear in my mind that #Agnew's interpretation is the correct one.
Yes, I would say it's undefined. If we translate the standardese to pseudo-code:
if (typeof(E1) == unsigned integral)
value = E1 * 2^E2 % blah blah;
else if (typeof(E1) == signed integral && E1 >= 0 && representable(E1 * 2^E2))
value = E1 * 2^E2;
else
value = undefined;
I'd say the reason why they're explicit about the right-hand operand and not about the left-hand one is that the paragrpah you quote (the one with the right-hand operand case) applies to both left and right shifts.
For the left-hand operand, the ruling differs. Left-shifting a negative is undefined, right-shifting it is implementation-defined.
Should this be interpreted to mean that left-shifting any negative number is UB?
Yes, the behavior is undefined when given any negative number. The behavior is only defined when both of the following are true:
the number is non-negative
E1 × 2E2 is representable in the result type
That's literally what "if E1 has a signed type and non-negative value, and E1×2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined," is saying:
if X and Y
then Z
else U
Answer as per the Question:
The question really is:
Can we equate the term "behavior is undefined" equates exactly to the term "Undefined Behavior".
As it is currently worded it means "Undefined Behavior."
Personal comment about the situation
But I am not convinced that is the authors intention.
If it is the authors intention, then we should probably have a note explaining why. But I am more inclined to believe the author meant that the result of that operation is undefined because the representation of negative numbers is not explicitly defined by the standard. If the representation of negative numbers is not explicitly defined for negatives, then moving the bits around would lead to an undefined value.
Either way, the wording (or explanation) needs to be tightened/expanded to make it less ambiguous.

C++: Are left/right bitshifts for negative and large values defined?

My question is, within C++, is the following code defined? Some of it? And if it is, what's it supposed to do in these four scenarios?
word << 100;
word >> 100;
word << -100;
word >> -100;
word is a uint32_t
(This is for a bottleneck in a 3d lighting renderer. One of the more minor improvements in the inner most loop I wanna make is eliminating needless conditional forks. One of those forks is checking to see if a left shift should be done on several 32 bit words as part of a hamming weight count. If the left shift accepts absurd values, the checks don't need done at all)
In the C++0X draft N3290, §5.8:
The behavior is undefined if the right operand is negative,
or greater than or equal to the length in bits of the promoted left
operand.
Note: the above paragraph is identical in the C++03 standard.
So the last two are undefined. The others, I believe depend on whether word is signed or not, if word is at least 101bits long. If word is "smaller" than 101bits, the above applies and the behavior is undefined.
Here are the next two sections of that paragraph in C++0X (these do differ in C++03):
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned
type, the value of the result is E1 × 2E2 , reduced modulo one more than the maximum value representable
in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 is representable
in the result type, then that is the resulting value; otherwise, the behavior is undefined.
The value of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed
type and a non-negative value, the value of the result is the integral part of the quotient of E1/2E2 . If E1
has a signed type and a negative value, the resulting value is implementation-defined.
The C standard doesn't say what should happen when shift count is negative or greater than (or even equal) to the precision of the variable.
The reason is that the C standard didn't want to impose a behavior that would require extra code to be emitted in case of parametric shift. Since different CPUs do different things the standard says that anything can happen.
With x86 hardware the shift operator only uses last 5 bits of the shift counter to decide the shift amount (this can be seen by reading the CPU reference manual) so this is what most probably will happen with any C or C++ compiler on that platform.
See also this answer for a similar question.

Shift Operators in C++

If the value after the shift operator
is greater than the number of bits in
the left-hand operand, the result is
undefined. If the left-hand operand is
unsigned, the right shift is a logical
shift so the upper bits will be filled
with zeros. If the left-hand operand
is signed, the right shift may or may
not be a logical shift (that is, the
behavior is undefined).
Can somebody explain me what the above lines mean??
It doesn't matter too much what those lines mean, they are substantially incorrect.
"If the value after the shift operator
is greater than the number of bits in
the left-hand operand, the result is
undefined."
Is true, but should say "greater than or equal to". 5.8/1:
... the behavior is undefined if the
right hand operand is negative, or
greater than or equal to the length in
bits of the promoted left operand.
Undefined behavior means "don't do it" (see later). That is, if int is 32 bits on your system, then you can't validly do any of the following:
int a = 0; // this is OK
a >> 32; // undefined behavior
a >> -1; // UB
a << 32; // UB
a = (0 << 32); // Either UB, or possibly an ill-formed program. I'm not sure.
"If the left-hand operand is unsigned, the right shift is a logical shift so the upper bits will be filled with zeros."
This is true. 5.8/3 says:
If E1 has unsigned type or if E1 has a
signed type and a nonnegative value,
the result is the integral part of the
quotient of E1 divided by the quantity
2 raised to the power E2
if that makes any more sense to you. >>1 is the same as dividing by 2, >>2 dividing by 4, >>3 by 8, and so on. In a binary representation of a positive value, dividing by 2 is the same as moving all the bits one to the right, discarding the smallest bit, and filling in the largest bit with 0.
"If the left-hand operand is signed, the right shift may or may not be a logical shift (that is, the behavior is undefined)."
First part is true (it may or may not be a logical shift - it is on some compilers/platforms but not others. I think by far the most common behaviour is that it is not). Second part is false, the behavior is not undefined. Undefined behavior means that anything is permitted to happen - a crash, demons flying out of your nose, a random value, whatever. The standard doesn't care. There are plenty of cases where the C++ standard says behavior is undefined, but this is not one of them.
In fact, if the left hand operand is signed, and the value is positive, then it behaves the same as an unsigned shift.
If the left hand operand is signed, and the value is negative, then the resulting value is implementation-defined. It isn't allowed to crash or catch fire. The implementation must produce a result, and the documentation for the implementation must contain enough information to define what the result will be. In practice, the "documentation for the implementation" starts with the compiler documentation, but that might refer you implicitly or explicitly to other docs for the OS and/or the CPU.
Again from the standard, 5.8/3:
If E1 has signed type and negative
value, the resulting value is
implementation-defined.
I'm assuming you know what it means by shifting. Lets say you're dealing with a 8-bit chars
unsigned char c;
c >> 9;
c >> 4;
signed char c;
c >> 4;
The first shift, the compiler is free to do whatever it wants, because 9 > 8 [the number of bits in a char]. Undefined behavior means all bets are off, there is no way of knowing what will happen. The second shift is well defined. You get 0s on the left: 11111111 becomes 00001111. The third shift is, like the first, undefined.
Note that, in this third case, it doesn't matter what the value of c is. When it refers to signed, it means the type of the variable, not whether or not the actual value is greater than zero. signed char c = 5 and signed char c = -5 are both signed, and shifting to the right is undefined behavior.
If the value after the shift operator is greater than the number of bits in the left-hand operand, the result is undefined.
It means (unsigned int)x >> 33 can do anything[1].
If the left-hand operand is unsigned, the right shift is a logical shift so the upper bits will be filled with zeros.
It means 0xFFFFFFFFu >> 4 must be 0x0FFFFFFFu
If the left-hand operand is signed, the right shift may or may not be a logical shift (that is, the behavior is undefined).
It means 0xFFFFFFFF >> 4 can be 0xFFFFFFFF (arithmetic shift) or 0x0FFFFFFF (logical shift) or anything-allowed-by-physical-law, i.e. the result is undefined.
[1]: on 32-bit machine with a 32-bit int.
If the value after the shift operator
is greater than the number of bits in
the left-hand operand, the result is
undefined.
If you try to shift a 32-bit integer by 33 the result is undefined. i.e., It may or may not be all zeros.
If the left-hand operand is unsigned,
the right shift is a logical shift so
the upper bits will be filled with
zeros.
Unsigned data type will be padded with zeros when right shifting.
so 1100 >> 1 == 0110
If the left-hand operand is signed,
the right shift may or may not be a
logical shift (that is, the behavior
is undefined).
If the data type is signed, the behavior is not defined. Signed data types are stored in a special format, where the left most bit indicates positive or negative. So shifting on a signed integer may not do what you expect. See the Wikipedia article for details.
http://en.wikipedia.org/wiki/Logical_shift
To give some context, here's the start of that paragraph:
The shift operators also manipulate bits. The left-shift operator (<<) produces the operand to the left of the operator shifted to the left by the number of bits specified after the operator. The right-shift operator (>>) produces the operand to the left of the operator shifted to the right by the number of bits specified after the operator.
Now the rest, with explanations:
If the value after the shift operator is greater than the number of bits in the left-hand operand, the result is undefined.
If you have a 32 bit integer and you try to bit shift 33 bits, that's not allowed and the result is undefined. In other words, the result could be anything, or your program could crash.
If the left-hand operand is unsigned, the right shift is a logical shift so the upper bits will be filled with zeros.
This says that it's defined to write a >> b when a is an unsigned int. As you shift right, the least significant bits are removed, other bits are shifted down, and the most significant bits become zero.
In other words:
This: 110101000101010 >> 1
becomes: 011010100010101
If the left-hand operand is signed, the right shift may or may not be a logical shift (that is, the behavior is undefined).
Actually I believe that the behaviour here is implementation defined when a is negative and defined when a is positive rather than undefined as suggested in the quote. This means that if you do a >> b when a is a negative integer, there are many different things that might happen. To see which you get, you should read the documentation for your compiler. A common implementation is to shift in zeros if the number is positive, and ones if the number is negative, but you shouldn't rely on this behaviour if you wish to write portable code.
I suppose the key word is "undefined", which means that the specification does not say what should happen. Most compilers will do something sensible in such cases, but you cannot depend on any behaviour generally. It is usually best to avoid invoking undefined behavior unless the documentation for the compiler you are using states what it does in the specific case.
The first sentence says it's undefined if you try to shift, for example, a 32 bit value by more than 32 bits.
The second says that if you shift an unsigned int right, the left hand bits will get filled with zeros.
The third says that if you shift a signed int right, it is not defined what will be put in the left hand bits.