c/c++ left shift unsigned vs signed - c++

I have this code.
#include <iostream>
int main()
{
unsigned long int i = 1U << 31;
std::cout << i << std::endl;
unsigned long int uwantsum = 1 << 31;
std::cout << uwantsum << std::endl;
return 0;
}
It prints out.
2147483648
18446744071562067968
on Arch Linux 64 bit, gcc, ivy bridge architecture.
The first result makes sense, but I don't understand where the second number came from. 1 represented as a 4byte int signed or unsigned is
00000000000000000000000000000001
When you shift it 31 times to the left, you end up with
10000000000000000000000000000000
no? I know shifting left for positive numbers is essentially 2^k where k is how many times you shift it, assuming it still fits within bounds. Why is it I get such a bizarre number?

Presumably you're interested in why this: unsigned long int uwantsum = 1 << 31; produces a "strange" value.
The problem is pretty simple: 1 is a plain int, so the shift is done on a plain int, and only after it's complete is the result converted to unsigned long.
In this case, however, 1<<31 overflows the range of a 32-bit signed int, so the result is undefined1. After conversion to unsigned, the result remains undefined.
That said, in most typical cases, what's likely to happen is that 1<<31 will give a bit pattern of 10000000000000000000000000000000. When viewed as a signed 2's complement2 number, this is -2147483648. Since that's negative, when it's converted to a 64-bit type, it'll be sign extended, so the top 32 bits will be filled with copies of what's in bit 31. That gives: 1111111111111111111111111111111110000000000000000000000000000000 (33 1-bits followed by 31 0-bits).
If we then treat that as an unsigned 64-bit number, we get 18446744071562067968.
§5.8/2:
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.
In theory, the computer could use 1's complement or signed magnitude for signed numbers--but 2's complement is currently much more common than either of those. If it did use one of those, we'd expect a different final result.

The literal 1 with no U is a signed int, so when you shift << 31, you get integer overflow, generating a negative number (under the umbrella of undefined behavior).
Assigning this negative number to an unsigned long causes sign extension, because long has more bits than int, and it translates the negative number into a large positive number by taking its modulus with 264, which is the rule for signed-to-unsigned conversion.

It's not "bizarre".
Try printing the number in hex and see if it's any more recognizable:
std::cout << std::hex << i << std::endl;
And always remember to qualify your literals with "U", "L" and/or "LL" as appropriate:
http://en.cppreference.com/w/cpp/language/integer_literal
unsigned long long l1 = 18446744073709550592ull;
unsigned long long l2 = 18'446'744'073'709'550'592llu;
unsigned long long l3 = 1844'6744'0737'0955'0592uLL;
unsigned long long l4 = 184467'440737'0'95505'92LLU;

I think it is compiler dependent .
It gives same value
2147483648
2147483648
on my machiene (g++) .
Proof : http://ideone.com/cvYzxN
And if overflow is there , then because uwantsum is unsigned long int and unsigned values are ALWAYS positive , conversion is done from signed to unsigned by using (uwantsum)%2^64 .
Hope this helps !

Its in the way you printed it out.
using formar specifier %lu should represent a proper long int

Related

Left-shift bit operation for multiplying int-variable: Limited Range for multiplying. Arithmetic pattern after exceeding?

My actual concern is about this:
The left-shift bit operation is used to multiply values of integer variables quickly.
But an integer variable has a defined range of available integers it can store, which is obviously very logical due to the place in bytes which is reserved for it.
Depending on 16-bit or 32-bit system, it preserves either 2 or 4 bytes, which range the available integers from
-32,768 to 32,767 [for signed int] (2 bytes), or
0 to 65,535 [for unsigned int] (2 bytes) on 16-bit
OR
-2,147,483,648 to 2,147,483,647 [for signed int] (4 bytes), or
0 to 4,294,967,295 [for unsigned int] (4 bytes) on 32-bit
My thought is, it should´t be able to multiply the values over the exact half of the maximum integer of the according range.
But what happens then to the values if you proceed the bitwise operation after the value has reached the integer value of the half of the max int value?
Is there an arithmetic pattern which will be applied to it?
One example (in case of 32-bit system):
unsigned int redfox_1 = 2147483647;
unsigned int redfox_2;
redfox_2 = redfox_1 << 1;
/* Which value has redfox_2 now? */
redfox_2 = redfox_1 << 2;
/* Which value has redfox_2 now? */
redfox_2 = redfox_1 << 3;
/* Which value has redfox_2 now? */
/* And so on and on */
/* Is there a arithmetic pattern what will be applied to the value of redfox_2 now? */
the value stored inside redfox_2 shouldn´t be able to go over 2.147.483.647 because its datatype is unsigned int, which can handle only integers up to 4,294,967,295.
What will happen now with the value of redfox_2?
And Is there a arithmetic pattern in what will happen to the value of redfox_2?
Hope you can understand what i mean.
Thank you very much for any answers.
Per the C 2018 standard, 6.5.7 4:
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
So, for unsigned integer types, the bits are merely shifted left, and vacated bit positions are filled with zeroes. For signed integer types, the consequences of overflow are not defined by the C standard.
Many C implementations will, in signed shifts, slavishly shift the bits, including shifting value bits into the sign bit, resulting in various positive or negative values that a naïve programmer might not expect. However, since the behavior is not defined by the C standard, a C implementation could also:
Clamp the result at INT_MAX or INT_MIN (for int, or the corresponding maxima for the particular type).
Shift the value bits without affecting the sign bit.
Generate a trap.
Transform the program, when the undefined shift is recognized during compilation and optimization, in arbitrary ways, such as removing the entire code path that performs the shift.
If you really want to see the pattern, then just write a program that prints it:
#include <iostream>
#include <ios>
#include <bitset>
int main()
{
unsigned int redfox = 2147483647;
std::bitset<32> b;
for (int i = 0; i < 32; ++i)
{
redfox = redfox << 1;
b = redfox;
std::cout << std::dec << redfox << ", " << std::hex << redfox << ", " << b << std::endl;
}
}
This produces:
4294967294, fffffffe, 11111111111111111111111111111110
4294967292, fffffffc, 11111111111111111111111111111100
4294967288, fffffff8, 11111111111111111111111111111000
4294967280, fffffff0, 11111111111111111111111111110000
4294967264, ffffffe0, 11111111111111111111111111100000
4294967232, ffffffc0, 11111111111111111111111111000000
4294967168, ffffff80, 11111111111111111111111110000000
4294967040, ffffff00, 11111111111111111111111100000000
4294966784, fffffe00, 11111111111111111111111000000000
4294966272, fffffc00, 11111111111111111111110000000000
4294965248, fffff800, 11111111111111111111100000000000
4294963200, fffff000, 11111111111111111111000000000000
4294959104, ffffe000, 11111111111111111110000000000000
4294950912, ffffc000, 11111111111111111100000000000000
4294934528, ffff8000, 11111111111111111000000000000000
4294901760, ffff0000, 11111111111111110000000000000000
4294836224, fffe0000, 11111111111111100000000000000000
4294705152, fffc0000, 11111111111111000000000000000000
4294443008, fff80000, 11111111111110000000000000000000
4293918720, fff00000, 11111111111100000000000000000000
4292870144, ffe00000, 11111111111000000000000000000000
4290772992, ffc00000, 11111111110000000000000000000000
4286578688, ff800000, 11111111100000000000000000000000
4278190080, ff000000, 11111111000000000000000000000000
4261412864, fe000000, 11111110000000000000000000000000
4227858432, fc000000, 11111100000000000000000000000000
4160749568, f8000000, 11111000000000000000000000000000
4026531840, f0000000, 11110000000000000000000000000000
3758096384, e0000000, 11100000000000000000000000000000
3221225472, c0000000, 11000000000000000000000000000000
2147483648, 80000000, 10000000000000000000000000000000
0, 0, 00000000000000000000000000000000

Assign a negative number to an unsigned int

This code gives the meaningful output
#include <iostream>
int main() {
unsigned int ui = 100;
unsigned int negative_ui = -22u;
std::cout << ui + negative_ui << std::endl;
}
Output:
78
The variable negative_ui stores -22, but is an unsigned int.
My question is why does unsigned int negative_ui = -22u; work.
How can an unsigned int store a negative number? Is it save to be used or does this yield undefined behaviour?
I use the intel compiler 18.0.3. With the option -Wall no warnings occurred.
Ps. I have read What happens if I assign a negative value to an unsigned variable? and Why unsigned int contained negative number
How can an unsigned int store a negative number?
It doesn't. Instead, it stores a representable number that is congruent with that negative number modulo the number of all representable values. The same is also true with results that are larger than the largest representable value.
Is it save to be used or does this yield undefined behaviour?
There is no UB. Unsigned arithmetic overflow is well defined.
It is safe to rely on the result. However, it can be brittle. For example, if you add -22u and 100ull, then you get UINT_MAX + 79 (i.e. a large value assuming unsigned long long is a larger type than unsigned) which is congruent with 78 modulo UINT_MAX + 1 that is representable in unsigned long long but not representable in unsigned.
Note that signed arithmetic overflow is undefined.
Signed/Unsigned is a convention. It uses the last bit of the variable (in case of x86 int, the last 31th bit). What you store in the variable takes the full bit length.
It's the calculations that follow that take the upper bit as a sign indicator or ignore it. Therefore, any "unsigned" variable can contain a signed value which will be converted to the unsigned form when the unsigned variable participates in a calculation.
unsigned int x = -1; // x is now 0xFFFFFFFF.
x -= 1; // x is now 0xFFFFFFFE.
if (x < 0) // false. x is compared as 0xFFFFFFFE.
int x = -1; // x stored as 0xFFFFFFFF
x -= 1; // x stored as 0xFFFFFFFE
if (x < 0) // true, x is compared as -2.
Technically valid, bad programming.

Is a negative integer summed with a greater unsigned integer promoted to unsigned int?

After getting advised to read "C++ Primer 5 ed by Stanley B. Lipman" I don't understand this:
Page 66. "Expressions Involving Unsigned Types"
unsigned u = 10;
int i = -42;
std::cout << i + i << std::endl; // prints -84
std::cout << u + i << std::endl; // if 32-bit ints, prints 4294967264
He said:
In the second expression, the int value -42 is converted to unsigned before the addition is done. Converting a negative number to unsigned behaves exactly as if we had attempted to assign that negative value to an unsigned object. The value “wraps around” as described above.
But if I do something like this:
unsigned u = 42;
int i = -10;
std::cout << u + i << std::endl; // Why the result is 32?
As you can see -10 is not converted to unsigned int. Does this mean a comparison occurs before promoting a signed integer to an unsigned integer?
-10 is being converted to a unsigned integer with a very large value, the reason you get a small number is that the addition wraps you back around. With 32 bit unsigned integers -10 is the same as 4294967286. When you add 42 to that you get 4294967328, but the max value is 4294967296, so we have to take 4294967328 modulo 4294967296 and we get 32.
Well, I guess this is an exception to "two wrongs don't make a right" :)
What's happening is that there are actually two wrap arounds (unsigned overflows) under the hood and the final result ends up being mathematically correct.
First, i is converted to unsigned and as per the wrap around behavior the value is std::numeric_limits<unsigned>::max() - 9.
When this value is summed with u the mathematical result would be std::numeric_limits<unsigned>::max() - 9 + 42 == std::numeric_limits<unsigned>::max() + 33 which is an overflow and we get another wrap around. So the final result is 32.
As a general rule in an arithmetic expression if you only have unsigned overflows (no matter how many) and if the final mathematical result is representable in the expression data type, then the value of the expression will be the mathematically correct one. This is a consequence of the fact that unsigned integers in C++ obey the laws of arithmetic modulo 2n (see bellow).
Important notice. According to C++ unsigned arithmetic does not overflow:
§6.9.1 Fundamental types [basic.fundamental]
Unsigned integers shall obey the laws of arithmetic modulo 2n where n
is the number of bits in the value representation of that particular
size of integer 49
49) This implies that unsigned arithmetic does not overflow because a
result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest
value that can be represented by the resulting unsigned integer type.
I will however leave "overflow" in my answer to express values that cannot be represented in regular arithmetic.
Also what we colloquially call "wrap around" is in fact just the arithmetic modulo nature of the unsigned integers. I will however use "wrap around" also because it is easier to understand.
i is in fact promoted to unsigned int.
Unsigned integers in C and C++ implement arithmetic in ℤ / 2nℤ, where n is the number of bits in the unsigned integer type. Thus we get
[42] + [-10] ≡ [42] + [2n - 10] ≡ [2n + 32] ≡ [32],
with [x] denoting the equivalence class of x in ℤ / 2nℤ.
Of course, the intermediate step of picking only non-negative representatives of each equivalence class, while it formally occurs, is not necessary to explain the result; the immediate
[42] + [-10] ≡ [32]
would also be correct.
"In the second expression, the int value -42 is converted to unsigned before the addition is done"
yes this is true
unsigned u = 42;
int i = -10;
std::cout << u + i << std::endl; // Why the result is 32?
Supposing we are in 32 bits (that change nothing in 64b, this is just to explain) this is computed as 42u + ((unsigned) -10) so 42u + 4294967286u and the result is 4294967328u truncated in 32 bits so 32. All was done in unsigned
This is part of what is wonderful about 2's complement representation. The processor doesn't know or care if a number is signed or unsigned, the operations are the same. In both cases, the calculation is correct. It's only how the binary number is interpreted after the fact, when printing, that is actually matters (there may be other cases, as with comparison operators)
-10 in 32BIT binary is FFFFFFF6
42 IN 32bit BINARY is 0000002A
Adding them together, it doesn't matter to the processor if they are signed or unsigned, the result is: 100000020. In 32bit, the 1 at the start will be placed in the overflow register, and in c++ is just disappears. You get 0x20 as the result, which is 32.
In the first case, it is basically the same:
-42 in 32BIT binary is FFFFFFD6
10 IN 32bit binary is 0000000A
Add those together and get FFFFFFE0
FFFFFFE0 as a signed int is -32 (decimal). The calculation is correct! But, because it is being PRINTED as an unsigned, it shows up as 4294967264. It's about interpreting the result.

When is int value ordinarily converted to unsigned?

Here is from C++ Prime 5th:
if we use both unsigned and int values in an arithmetic expression, the int value ordinarily is converted to unsigned.
And I tried the following code:
int i = -40;
unsigned u = 42;
unsigned i2 = i;
std::cout << i2 << std::endl; // 4294967256
std::cout << i + u << std::endl; // 2
For i + u, if i is converted to unsigned, it shall be 4294967256, and then i + u cannot equal to 2.
Is int value always converted to unsigned if we use both unsigned and int values in an arithmetic expression?
Yes.
What you're experiencing is integer overflow. The sign of a negative number is given by the leading sign bit, which is either 0 (positive) or 1 (negative). However, for unsigned integers, this leading bit just gives you one more bit of integer precision, rather than an indication of sign. "Negative" signed numbers end up counting backwards from the maximum unsigned integer size.
How does this relate to your particular code? Well, -40 does get turned into 4294967256. However, 4294967256+42 can't fit in an unsigned integer, as if you remember -40 just turned into the max unsigned integer minus 39. So by adding 42, you exceed the capacity of your data-type. The would-be 33rd bit just gets chopped off, and you start over from 0. Hence, you still get 2 as an answer.
Signed: -42, -41, ..., -1, 0, 1, 2
Unsigned: 4294967256, 4294967257, ..., 4294967295, 0, 1, 2, ...
^--^ Overflow!
You'd probably be interested in reading about Two's Complement

Wrap around range of unsigned int in C++?

I am going through C++ Primer 5th Edition and am currently doing the signed/unsigned section. A quick question I have is when there is a wrap-around, say, in this block of code:
unsigned u = 10;
int i = -42;
std::cout << i + i << std::endl; // prints -84
std::cout << u + i << std::endl; // if 32-bit ints, prints 4294967264
I thought that the max range was 4294967295 with the 0 being counted, so I was wondering why the wrap-around seems to be done from 4294967296 in this problem.
Unsigned arithmetic is modulo (maximum value of the type plus 1).
If maximum value of an unsigned is 4294967295 (2^32 - 1), the result will be mathematically equal to (10-42) modulo 4294967296 which equals 10-42+4294967296 i.e. 4294967264
When an out-of-range value is converted to an unsigned type, the result is the remainder of it modulo the number of values the target unsigned type can hold. For instance, the result of n converted to unsigned char is n % 256, because unsigned char can hold values 0 to 255.
It's similar in your example, the wrap-around is done using 4294967296, the number of values that a 32-bit unsigned integer can hold.
Given unsigned int that is 32 bits you're correct that the range is [0, 4294967295].
Therefore -1 is 4294967295. Which is logically equivalent to 4294967296 - 1 which should explain the behavior you're seeing.