Bit shifting and assignment [duplicate] - c++

This question already has answers here:
Why doesn't left bit-shift, "<<", for 32-bit integers work as expected when used more than 32 times?
(10 answers)
Closed 9 years ago.
This is sort of driving me crazy.
int a = 0xffffffff;
int b = 32;
cout << (a << b) << "\n";
cout << (0xffffffff << 32) << "\n";
My output is
-1
0
Why am I not getting
0
0

Undefined behavior occurs when you shift a value by a number of bits which is not less than its size (e.g, 32 or more bits for a 32-bit integer). You've just encountered an example of that undefined behavior.

The short answer is that, since you're using an implementation with 32-bit int, the language standard says that a 32-bit shift is undefined behavior. Both the C standard (section 6.5.7) and the C++ standard (section 5.8) say
If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
But if you want to know why:
Many computers have an instruction that can shift the value in a register, but the hardware only handles shift values that are actually needed. For instance, when shifting a 32 bit word, only 5 bits are needed to represent a shift value of 0 ... 31 and so the hardware may ignore higher order bits, and does on *86 machines (except for the 8086). So that compiler implementations could just use the instruction without generating extra code to check whether the shift value is too big, the authors of the C Standard (many of whom represented compiler vendors) ruled that the result of shifting by larger amounts is undefined.
Your first shift is performed at run time and it encounters this situation ... only the low order 5 bits of b are considered by your machine, and they are 0, so no shift happens. Your second shift is done at compile time, and the compiler calculates the value differently and actually does the 32-bit shift.
If you want to shift by an amount that may be larger than the number of bits in the thing you're shifting, you need to check the range of the value yourself. One possible way to do that is
#define LEFT_SHIFT(a, b) ((b) >= CHAR_BIT * sizeof(a)? 0 : (a) << (b))

C++ standard says ::
If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
As GCC has no options to handle shifts by negative amounts or by amounts outside the width of the type predictably or trap on them; they are always treated as undefined.
So behavior is not defined.

Related

Recursive function for finding power returns zero for large exponents

The function bellow works fine for small positive exponents and base. If the exponent is large then memory wouldn't be enough and the program should be terminated. Instead, if that function is called for large exponents, zero is returned.Why? One guess is that a multiplication with zero occurred but there is no such case.
One example where zero is returned is power(2,64) .
unsigned long long int power(unsigned long long int base,int exp){
if (exp == 0 && base != 0)
return 1;
return base*power(base,exp-1);
}
Aside from filling memory, you should also worry about overflowing the result. 2^64 is 1<<64, which is 1 bit above the size of a 64-bit integer, so due to unsigned modulo arithmetic, that bit ceases to exist, and you end up with 0.
Not relevant in your case, as you shift by smaller amounts, but left-shifting by a number of places equal to or greater than the width of the (promoted) left operand is undefined behaviour, for either signed or unsigned (see e.g. this link and these comments). This contrasts from overflow due to smaller shifts (or other arithmetic ops), which is well-defined as performing modulo for unsigned types but invokes UB for signed.
You are seeing integer-overflow and that happends to hit zero at some point.
pow(2ULL, 64) is equal to (1ULL << 64) (if (1ULL << 64) were defined).
It is trivial to see that if you bitshift 1ULL 64 bits to the left there is no data left in a 64 bit unsigned long long. This is called overflow.

Why doesn't the bit-shift of a variable and a number have the same result?

I'm shifting some bits and just realized that doing the operation using a variable doesn't have the same result as using a number. See the example below.
int a = 97;
int b = 0;
b = 1 << a;
printf("%d\n", b);
// 2
b = 1 << 97;
printf("%d\n", b);
// 0 - warning: shift count >= width of type [-Wshift-count-overflow]
Since the result of a left shift with a right operand larger than the length in bits of the left operand is undefined, any result is possible from the expression.
In the variable case (1 << a), since a is 97 (larger than the number of bits in an int), the most likely results are 1 << (97 % 32) == 1 << 1 == 2 or 0, typically depending on how the hardware (CPU) handles these shifts.
With a constant (1 << 97), the compiler knows you're shifting too far, issues the warning (which is not required), and defines the result as 0 (also not required).
The warning you are seeing is a compile time warning. Now, you can clearly see that your int b is a 32-bit variable which will be overflown if left-shifted 97 times. So, it's a valid concern. But the compiler can only detect this overflow for the constant number of shifts as it is evaluated during compilation and the compiler immediately knows that it'll overflow.
In case of variable number of shifts, the compiler isn't smart enough to know what value int a will posses when it'll come down to shifting. So, the compiler leaves it upto you.
The undefined behavior is outlined in the C++ standard here.
http://eel.is/c++draft/expr.shift
The behavior is undefined if the right operand is negative, or greater
than or equal to the width of the promoted left operand.
You'll get different results depending on compiler and optimization level. If you turn on optimization, the compiler will easily optimize out the first shifting operation and then make it 0 as well.
Why exactly does it act like that though? The x86 instruction for shifting by a variable is SAL (shift-arithmetic-left). You can see the instruction list for bit shifting operations here:
https://c9x.me/x86/html/file_module_x86_id_285.html
The one that would be used in an unoptimized build would be SAL r/m32, CL. The CL register is 8 bits, but the processor masks it to 5 bits internally:
The destination operand can be a register or a memory location. The count operand can be an immediate value or register CL. The count is masked to 5 bits, which limits the count range to 0 to 31. A special opcode encoding is provided for a count of 1.

Typecasting char to long

Say I have a variable, a
char a = 0x01;
and I want to cast this to a long, as in
long b;
b = (long)a;
Will the upper 3 bytes in b be guaranteed to be 0? With my setup they are 0, but I'm not sure if this is compiler-dependent.
Yes, b is guaranteed to have the value 0x1 after this assignment even without the cast. The assignment operator in c++ is generally semantic or value driven, it will copy the value or state, rather than preform bit wise copy (even if the two are sometimes equivalent, such as for trivial types).
In some cases, specially because of operator overloading, this may not be the case. Developers are very strongly encouraged to keep to this concept when they design new types, but a careless programmer could overload the assignment operator for non-fundamental types to do anything he/she wants.
As a long can represent all values for a char (be it signed or unsigned) the conversion is guaranteed to not change the value.
If you initially have a positive value, because either char is signed in you architecture or because the char values is between 0 and 127 (assuming 8 bit characters), the resulting long is guaranteed to be positive and less that 256. So in an architecture where long is 4 bytes large, the 3 high order bytes are guaranteed to be 0.
If char is signed and if the initial value is negative, things will be different! The value will be unchanged and will still be negative. In a common 2'complement architecture, the 3 high order bits will be 0xFF
The answer already given is right, but I thought I'd add that for C++, it is recommended to use one of the C++-specific casting notations, to make it abundantly clear what you are doing. Here, you would use:
long b;
b = static_cast<long>(a);
This makes it very clear what you are doing (a cast whereby how the cast is performed is calculated at compile time to a long), and you know that the "right" sort of cast will be performed.
char a = 0x01;
long b;
b = (long)a;
C and C++ are two different (but closely related) languages. Their rules happen to be the same in this case.
The cast (not "typecast") is not necessary. The assignment could, and probably should, be written as:
b = a;
which causes an implicit conversion from char to long. Since the value being converted is within the representable range of type long, the result of the conversion is 1. The result of the conversion is specified in terms of values, not representations.
The representation of the value 1 in type long probably has a 1 in the low-order bit, and 0s in all the other bits. (And the position of the low-order bit can vary; some systems are big-endian, some are little-endian, and there are other possibilities.)
There is no guarantee that type long even has three high-order bytes. Type long is at least 32 bits wide, but a byte can be wider than 8 bits. It's even possible that there are values of type char that exceed LONG_MAX (if plain char is signed and long is 1 byte, which implies CHAR_BIT >= 32).
It's also possible that the representation of type long includes padding bits, bits that do not contribute to the value. It's guaranteed that the sign bit is 0, the low-order value bit is 1, and all other value bits are 0, but if there are padding bits their values are not guaranteed. (Some combinations of padding bits can result in a trap representation that does not represent any value, but that can't happen in this particular case.)
Most of these exotic possibilities are very unlikely to occur in real life. C implementations for some DSPs do have bytes wider than 8 bits, but any system you're using almost certainly has 8-bit bytes.
The point is that the result of the conversion is defined in terms of values, not representations, and 99% of the time that's all you need to care about. If you write:
char a = 1; /* same as 0x01 */
long b = a;
printf("b = %ld\n", b);
it will print b = 1, even if you're using some exotic system where the value 1 is represented strangely.
b will be 1; this is always, compiler and endianness-independent, true. Additionally, the following expressions will be true:
b == 1
b == 01
b == 0x1
b == 0x00000001
b == 0x00000000000000000000000000000000000000000000000000001
The right hand side in all cases is an int constant with the value 1; not more, not less. Note that the zeroes do not represent bytes in memory (an int most likely does not have the number of bytes the last expression appears to suggest). The hexadecimal notation is just another way to write down a 1, exactly like 1.
In particular, we don't know where in memory the byte with the value 1 is located, because that is architecture dependent. It may be the one at the address of the int, or it may be the other end, or even in between.
Now comes the sweet thing: C does not care how the memory in an int is laid out. None of the ways to write an integer constant is architecture dependent. That seems self-understood with decimal constants — did we expect that the meaning of int i = 1 is architecture dependent? Certainly not. Nor is int i = 0x00000001;. The same is true for the bit shift operators: << shifts towards more significant bits, >> towards less significant bits. The digits in (decimal or hexadecimal) integer constants are ordered so that the most significant digits are on the left side, aligning with the "direction" indicated by the arrow-like bit shift operators. That may or may not reflect your machine's int representation; on a PC it does not.
Bottom line: If you use the standard C (or C++) means to test the "upper 3 bytes", you are home free, and the following is always true, independent of the implementation or architecture:
char a = 0x01;
long b = a;
(b & 0x11) == 1 // least significant byte is 1
(b & 0x00000011) == 1 // exactly the same as above
(b & 0x11111100) == 0 // more significant three bytes are all 0
It's possible that your long has more bits, but that is implementation dependent. How many more there are: they are all zero, save for the least significant one.

C++ shift left with big value

I'm wonder how to shift left value in C++.
For example:
1 << 180
and I beleve that result of that should be:
1532495540865888858358347027150309183618739122183602176
(tested in python [1 << 180]);
Python supports arbitrary precision arithmetic, C++ does not.
Moreover, according to the Standard [expr.shift]:
The behavior is undefined if the right operand is negative, or greater
than or equal to the length in bits of the promoted left operand.
In order to use big integers in C++ you may use Boost library, which provides wrappers to different libraries with long arithmetic implementations:
#include <boost/multiprecision/gmp.hpp>
#include <iostream>
int main()
{
boost::multiprecision::mpz_int one(1);
std::cout << (one << 180) << std::endl;
return 0;
}
Prints
1532495540865888858358347027150309183618739122183602176
You can do this using a std::bitset:
std::bitset<200> bits = 1; // 200 bits long
bits <<= 180;
How useful that is depends on what you want to do with it. It can't be converted to a single build-in type because they are not large enough. But there are other potentially useful operations that can be performed on it.
In C++ (as in C) a left-shift by a value larger than the number of bits in the shifted operand's type actually gives undefined behaviour.
In this case you are shifting an int value which is most likely 32 bits in size left by a value greater than 32, hence, the behaviour is undefined.
If you need to deal with integers larger than the word size on your machine, you're probably going to need to use a library. GMP is one option.
Integers (or longs) are stored in 32bits and can therefore not be shifted 180.
If you need the exact value, try to write/download a class that manages big integers.
Otherwise, use a double and call pow(2,180). It has an accuracy 0f 15 digits

Confused by undefined C++ shift operator behavior and wrapping "pattern space"

I'm confused by something I read in the Shift Operators section of an article on undefined C++ behavior.
On the ARM architecture, the shift operators always behave as if they take place in a 256-bit pattern space, regardless of the operand size--that is, the pattern repeats, or "wraps around", only every 256 positions. Another way of thinking of this is that the pattern is shifted the specified number of positions modulo 256. Then, of course, the result contains just the least-significant bits of the pattern space.
The tables are especially strange:
Given a 32-bit integer with a value of 1:
+-----------------------------------+
| Shift left ARM x86 x64 |
+-----------------------------------+
| 32 0 1 1 |
| 48 0 32768 32768 |
| 64 0 1 1 |
+-----------------------------------+
What are these values, and why do they matter?
The shift operators don't wrap. According to the C++ specification if you shift a 32-bit value left by 32, the result is always 0. (EDIT: I'm wrong, see the answers!) So what is this article getting at? What is the undefined behavior?
When I run this code on x86 I get 0:
printf("%d", 1 << 32);
Supposedly this code snippet illustrates the problem:
// C4293.cpp
// compile with: /c /W1
unsigned __int64 combine (unsigned lo, unsigned hi) {
return (hi << 32) | lo; // C4293
// try the following line instead
// return ( (unsigned __int64)hi << 32) | lo;
}
I would expect the returned value to be lo, since the programmer shifted away all the hi bits. A warning is nice, since this was probably a mistake, but I don't see any undefined behavior...
If you use the x86 or x64 machine instructions for shifting the value, they will mask off the shift amount and only use the lower bits for the actual shift. Some other hardware might not do that.
That's why it is undefined.
In your example with literals 1 << 32, it is likely that the compiler computes the value and that's why it is 0. Trying the operation on real x86 hardware, you would get 1.
The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.
This is from §5.8/1 from C++11. If your ints are 32bit, you can't shift by 32 (right- or left-shift, regardless of the signedness of the left operand).
According to the C++ specification if you shift a 32-bit value left by 32, the result is always 0
No, that's not what the standard says. It is undefined behavior to shift a 32-bit type by 32 or more (5.8/1)
Since it is undefined behavior to shift by 256 bits on ARM (which has no 257-bit-or-greater type), the CPU is perfectly entitled to wrap at that point.