I am trying to understand the meaning of the statement:
(int)(unsigned)-1 == -1;
To my current understanding the following things happen:
-1 is a signed int and is casted to unsigned int. The result of this is that due to wrap-around behavior we get the maximum value that can be represented by the unsigned type.
Next, this unsigned type maximum value that we got in step 1, is now casted to signed int. But note that this maximum value is an unsigned type. So this is out of range of the signed type. And since signed integer overflow is undefined behavior, the program will result in undefined behavior.
My questions are:
Is my above explanation correct? If not, then what is actually happening.
Is this undefined behavior as i suspected or implementation defined behavior.
PS: I know that if it is undefined behavior(as opposed to implementation defined) then we cannot rely on the output of the program. So we cannot say whether we will always get true or false.
Cast to unsigned int wraps around, this part is legal.
Out-of-range cast to int is legal starting from C++20, and was implementation-defined before (but worked correctly in practice anyway). There's no UB here.
The two casts cancel each other out (again, guaranteed in C++20, implementation-defined before, but worked in practice anyway).
Signed overflow is normally UB, yes, but that only applies to overflow caused by a computation. Overflow caused by a conversion is different.
cppreference
If the destination type is signed, the value does not change if the source integer can be represented in the destination type. Otherwise the result is:
(until C++20) implementation-defined
(since C++20) the unique value of the destination type equal to the source value modulo 2n where n is the number of bits used to represent the destination type.
(Note that this is different from signed integer arithmetic overflow, which is undefined).
More specifics on how the conversions work.
Lets's say int and unsigned int occupy N bits.
The values that are representable by both int and unsigned int are unchanged by the conversion. All other values are increased or decreased by 2N to fit into the range.
This conveniently doesn't change the binary representation of the values.
E.g. int -1 corresponds to unsigned int 2N-1 (largest unsigned int value), and both are represented as 11...11 in binary. Similarly, int -2N-1 (smallest int value) corresponds to unsigned int 2N-1 (largest int value + 1).
int: [-2^(n-1)] ... [-1] [0] [1] ... [2^(n-1)-1]
| | | | |
| '---|---|-----------|-----------------------.
| | | | |
'---------------|---|-----------|----------. |
| | | | |
V V V V V
unsigned int: [0] [1] ... [2^(n-1)-1] [2^(n-1)] ... [2^n-1]
Edit: Versions prior to C++ 20 do not require two's complement
It might help understanding the binary representation of -1 which is:
0b1111'1111'1111'1111'1111'1111'1111'1111. I should note, that this is a 4-byte representation of -1. Different machines might represent int differently.
You are casting to unsigned and then back to int which changes how you interpret the 1's and 0's. In decimal (or sometimes called denary), what you're doing is the equivalent of casting it to 4294967295 and then back to -1.
The type of the integer literal is the first type in which the value
can fit, from the list of types which depends on which numeric base
and which integer-suffix was used
Because -1 fits into int, that is how it's interpreted.
See: https://en.cppreference.com/w/cpp/language/integer_literal
this is out of range of the signed type. And since signed integer
overflow
This is not integer overflow. This is binary representation. Integer overflow would be:
int int_max = 2147483647; // this is the maximum integer int on my machine
int one = 1;
int overflow = x + s; // sum is equal to -2147483648
The above is an example integer overflow because the maximum representation
Related
According to the rules on implicit conversions between signed and unsigned integer types, discussed here and here, when summing an unsigned int with a int, the signed int is first converted to an unsigned int.
Consider, e.g., the following minimal program
#include <iostream>
int main()
{
unsigned int n = 2;
int x = -1;
std::cout << n + x << std::endl;
return 0;
}
The output of the program is, nevertheless, 1 as expected: x is converted first to an unsigned int, and the sum with n leads to an integer overflow, giving the "right" answer.
In a code like the previous one, if I know for sure that n + x is positive, can I assume that the sum of unsigned int n and int x gives the expected value?
In a code like the previous one, if I know for sure that n + x is positive, can I assume that the sum of unsigned int n and int x gives the expected value?
Yes.
First, the signed value converted to unsigned, using modulo arithmetic:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n
where n is the number of bits used to represent the unsigned type).
Then two unsigned values will be added using modulo arithmetic:
Unsigned integers shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.
This means that you'll get the expected answer.
Even, if the result would be negative in the mathematical sense, the result in C++ would be a number which is modulo-equal to the negative number.
Note that I've supposed here that you add two same-sized integers.
I think you can be sure and it is not implementation defined, although this statement requires some interpretations of the standard when it comes to systems that do not use two's complement for representing negative values.
First, let's state the things that are clear: unsigned integrals do not overflow but take on a modulo 2^nrOfBits-value (cf this online C++ standard draft):
6.7.1 Fundamental types
(7) Unsigned integers shall obey the laws of arithmetic modulo 2n
where n is the number of bits in the value representation of that
particular size of integer.
So it's just a matter of whether a negative value nv is converted correctly into an unsigned integral bit pattern nv(conv) such that x + nv(conv) will always be the same as x - nv. For the case of a system using two's complement, things are clear, since the two's complement is actually designed such that this arithmetic works immediately.
For systems using other representations of negative values, we'll have to read the standard carefully:
7.8 Integral conversions
(2) If the destination type is unsigned, the resulting value is the
least unsigned integer congruent to the source integer (modulo 2n
where n is the number of bits used to represent the unsigned type). [
Note: In a two’s complement representation, this conversion is
conceptual and there is no change in the bit pattern (if there is
notruncation). —endnote]
As the footnote explicitly says, that in a two's complement representation, there is no change in the bit pattern, we may assume that in systems other than 2s complement a real conversion will take place such that x + nv(conv) == x - nv.
So due to 7.8 (2), I'd say that your assumption is valid.
In C or C++ it is said that the maximum number a size_t (an unsigned int data type) can hold is the same as casting -1 to that data type. for example see Invalid Value for size_t
Why?
I mean, (talking about 32 bit ints) AFAIK the most significant bit holds the sign in a signed data type (that is, bit 0x80000000 to form a negative number). then, 1 is 0x00000001.. 0x7FFFFFFFF is the greatest positive number a int data type can hold.
Then, AFAIK the binary representation of -1 int should be 0x80000001 (perhaps I'm wrong). why/how this binary value is converted to anything completely different (0xFFFFFFFF) when casting ints to unsigned?? or.. how is it possible to form a binary -1 out of 0xFFFFFFFF?
I have no doubt that in C: ((unsigned int)-1) == 0xFFFFFFFF or ((int)0xFFFFFFFF) == -1 is equally true than 1 + 1 == 2, I'm just wondering why.
C and C++ can run on many different architectures, and machine types. Consequently, they can have different representations of numbers: Two's complement, and Ones' complement being the most common. In general you should not rely on a particular representation in your program.
For unsigned integer types (size_t being one of those), the C standard (and the C++ standard too, I think) specifies precise overflow rules. In short, if SIZE_MAX is the maximum value of the type size_t, then the expression
(size_t) (SIZE_MAX + 1)
is guaranteed to be 0, and therefore, you can be sure that (size_t) -1 is equal to SIZE_MAX. The same holds true for other unsigned types.
Note that the above holds true:
for all unsigned types,
even if the underlying machine doesn't represent numbers in Two's complement. In this case, the compiler has to make sure the identity holds true.
Also, the above means that you can't rely on specific representations for signed types.
Edit: In order to answer some of the comments:
Let's say we have a code snippet like:
int i = -1;
long j = i;
There is a type conversion in the assignment to j. Assuming that int and long have different sizes (most [all?] 64-bit systems), the bit-patterns at memory locations for i and j are going to be different, because they have different sizes. The compiler makes sure that the values of i and j are -1.
Similarly, when we do:
size_t s = (size_t) -1
There is a type conversion going on. The -1 is of type int. It has a bit-pattern, but that is irrelevant for this example because when the conversion to size_t takes place due to the cast, the compiler will translate the value according to the rules for the type (size_t in this case). Thus, even if int and size_t have different sizes, the standard guarantees that the value stored in s above will be the maximum value that size_t can take.
If we do:
long j = LONG_MAX;
int i = j;
If LONG_MAX is greater than INT_MAX, then the value in i is implementation-defined (C89, section 3.2.1.2).
It's called two's complement. To make a negative number, invert all the bits then add 1. So to convert 1 to -1, invert it to 0xFFFFFFFE, then add 1 to make 0xFFFFFFFF.
As to why it's done this way, Wikipedia says:
The two's-complement system has the advantage of not requiring that the addition and subtraction circuitry examine the signs of the operands to determine whether to add or subtract. This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic.
Your first question, about why (unsigned)-1 gives the largest possible unsigned value is only accidentally related to two's complement. The reason -1 cast to an unsigned type gives the largest value possible for that type is because the standard says the unsigned types "follow the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer."
Now, for 2's complement, the representation of the largest possible unsigned value and -1 happen to be the same -- but even if the hardware uses another representation (e.g. 1's complement or sign/magnitude), converting -1 to an unsigned type must still produce the largest possible value for that type.
Two's complement is very nice for doing subtraction just like addition :)
11111110 (254 or -2)
+00000001 ( 1)
---------
11111111 (255 or -1)
11111111 (255 or -1)
+00000001 ( 1)
---------
100000000 ( 0 + 256)
That is two's complement encoding.
The main bonus is that you get the same encoding whether you are using an unsigned or signed int. If you subtract 1 from 0 the integer simply wraps around. Therefore 1 less than 0 is 0xFFFFFFFF.
Because the bit pattern for an int
-1 is FFFFFFFF in hexadecimal unsigned.
11111111111111111111111111111111 binary unsigned.
But in int the first bit signifies whether it is negative.
But in unsigned int the first bit is just extra number because a unsigned int cannot be negative. So the extra bit makes an unsigned int able to store bigger numbers.
As with an unsigned int 11111111111111111111111111111111 (binary) or FFFFFFFF (hexadecimal) is the biggest number a uint can store.
Unsigned Ints are not recommended because if they go negative then it overflows and goes to the biggest number.
I understand that I am assigning signed int a value that is larger than what it can handle. Also I should be using %d for signed and %u for unsigned. Similarly I should not be assigning -ve value to unsigned. But if I make such assignments and use printf as below, I get the results show below.
My understanding is that in each case, the number of converted to its two's compliment binary representation which is same for -1 or 4294967295. That is why %u for signed prints 4294967295 by ignoring -ve leftmost bit. When used %d for signed int, it uses left most bit as -ve flag and prints -1. Similarly %u for unsigned prints unsigned value but %d causes it to treat the number as signed and thus prints -1. Is that correct?
signed int si = 4294967295;
unsigned int ui = 4294967295;
printf("si = u=%u d=%d\n", si, si);
printf("ui = u=%u d=%d\n", ui, ui);
Output:
si = u=4294967295 d=-1
ui = u=4294967295 d=-1
signed int si = -1;
unsigned int ui = -1;
printf("si = u=%u d=%d\n", si, si);
printf("ui = u=%u d=%d\n", ui, ui);
Output:
si = u=4294967295 d=-1
ui = u=4294967295 d=-1
That is why %u for signed prints 4294967295 by ignoring -ve leftmost bit. When used %d for signed int, it uses left most bit as -ve flag and prints -1.
In the case of unsigned, the "leftmost" or most significant bit is not ignored, and is not negative; rather it has a place value of 231.
In the negative case, the sign bit is not a flag; instead it is a bit with a place value of -231.
In both cases the value of the integer is equal to the sum of the place values of all the binary digits (bits) set to 1.
The encoding of signed values in this way is known as two's complement. It is not the only possible encoding; what you described is known as sign and magnitude for example, and one's complement is another possibility. However, these alternative encodings are seldom encountered in practice, not least because two's complement is how arithmetic works on modern hardware in all but perhaps the most arcane architectures.
There are a few things going on here let's start out by saying that using the incorrect format specifier to printf is undefined behavior which means the results of your program are unpredictable, what actually happens will depends on many factors including your compiler, architecture, optimization level, etc...
For signed/unsigned conversions, that is defined by the respective standards, both C and C++ make it implementation defined behavior to convert a value that is larger than be stored in a signed integer type, from the C++ draft standard:
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and
bit-field width); otherwise, the value is implementation-defined.
for example gcc chooses to use the same convention as unsigned:
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
When you assign -1 to an unsigned value in both C and C++ the result will always be the maximum unsigned value of the type, from the draft C++ standard:
If the destination type is unsigned, the resulting value is the least
unsigned integer congruent to the source integer (modulo 2n where n is
the number of bits used to represent the unsigned type). [ Note: In a
two’s complement representation, this conversion is conceptual and
there is no change in the bit pattern (if there is no truncation).
—end note ]
The wording from C99 is easier to digest:
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
So we have the following:
-1 + (UNSIGNED_MAX + 1)
the result of which is UNSIGNED_MAX
As for printf and incorrect format specifier we can see form the draft C99 standard section 7.19.6.1 The fprintf function says:
If a conversion specification is invalid, the behavior is
undefined.248) If any argument is not the correct type for the
corresponding conversion specification, the behavior is undefined.
fprintf covers printf with respect to format specifiers and C++ falls back in C with respect to printf.
I am new to C++, I am confused with C++'s behavior for the code below:
#include <iostream>
void hello(unsigned int x, unsigned int y){
std::cout<<x<<std::endl;
std::cout<<y<<std::endl;
std::cout<<x+y<<std::endl;
}
int main(){
int a = -1;
int b = 3;
hello(a,b);
return 1;
}
The x in the output is a very large integer:4294967295, I know that negative integer convert to unsigned will behave like this. But why x+y in the output is 2?
Contrary to the other answers, there is no undefined behavior here, and there is no overflow. Unsigned integers use modulo 2n arithmetic.
Section 4.7 paragraph 2 of the standard says "If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type)." This dictates that -1 is equal to the largest possible unsigned int (modulo 2n).
Section 3.9.1 paragraph 4 says "Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer." To make it clear what this means, the footnote to this clause says "This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type."
In other words, converting -1 to 4294967295 is not just defined behavior, it is required behavior (assuming 32 bit integers). Similarly, adding 3 to that value and yielding 2 as a result is also required behavior. In this case, the value of n is irrelevant. The third value printed by hello() must be 2 or the implementation is not compliant with the standard.
Because unsigned int's will overflow. In other words, a=-1 (signed) which is 1 value below the maximum value for unsigned int's, 4294967295.
Then you add 3, the int will overflow and start at 0, so -1+3 =2
Passing negative number to unsigned int as parameter gives you an undefined behavior. The default int is a signed. It has a range of –2,147,483,648 to 2,147,483,647. Unsigned int range is from 0 to 4,294,967,295.
This isn't so much having to do with C++ but with how computers represent signed and unsigned numbers.
This is a good source on that. Basically, signed numbers are (usually) represented using two's complement, in which the most significant bit has a value of -2^n. In effect, what his means is that the positive numbers are represented the same in two's complement as they are in regular unsigned binary.
-1 is represented as all ones, which when interpreted as an unsigned integer will be the largest integer that can be represented (4294967295, when dealing with 32 bits).
One of the great things about using two complement to represent signed numbers is that you can perform addition and subtraction in the exact same way as with unsigned numbers and it will work out correctly, so long as the number does not exceed the bounds that can be represented. This isn't as easy with other forms such as signed-magnitude.
So, what this means is that because the result of -1 + 3 = 2, and because 2 is positive, it will be interpreted the same as if it were unsigned. Thus, it prints 2.
In C or C++ it is said that the maximum number a size_t (an unsigned int data type) can hold is the same as casting -1 to that data type. for example see Invalid Value for size_t
Why?
I mean, (talking about 32 bit ints) AFAIK the most significant bit holds the sign in a signed data type (that is, bit 0x80000000 to form a negative number). then, 1 is 0x00000001.. 0x7FFFFFFFF is the greatest positive number a int data type can hold.
Then, AFAIK the binary representation of -1 int should be 0x80000001 (perhaps I'm wrong). why/how this binary value is converted to anything completely different (0xFFFFFFFF) when casting ints to unsigned?? or.. how is it possible to form a binary -1 out of 0xFFFFFFFF?
I have no doubt that in C: ((unsigned int)-1) == 0xFFFFFFFF or ((int)0xFFFFFFFF) == -1 is equally true than 1 + 1 == 2, I'm just wondering why.
C and C++ can run on many different architectures, and machine types. Consequently, they can have different representations of numbers: Two's complement, and Ones' complement being the most common. In general you should not rely on a particular representation in your program.
For unsigned integer types (size_t being one of those), the C standard (and the C++ standard too, I think) specifies precise overflow rules. In short, if SIZE_MAX is the maximum value of the type size_t, then the expression
(size_t) (SIZE_MAX + 1)
is guaranteed to be 0, and therefore, you can be sure that (size_t) -1 is equal to SIZE_MAX. The same holds true for other unsigned types.
Note that the above holds true:
for all unsigned types,
even if the underlying machine doesn't represent numbers in Two's complement. In this case, the compiler has to make sure the identity holds true.
Also, the above means that you can't rely on specific representations for signed types.
Edit: In order to answer some of the comments:
Let's say we have a code snippet like:
int i = -1;
long j = i;
There is a type conversion in the assignment to j. Assuming that int and long have different sizes (most [all?] 64-bit systems), the bit-patterns at memory locations for i and j are going to be different, because they have different sizes. The compiler makes sure that the values of i and j are -1.
Similarly, when we do:
size_t s = (size_t) -1
There is a type conversion going on. The -1 is of type int. It has a bit-pattern, but that is irrelevant for this example because when the conversion to size_t takes place due to the cast, the compiler will translate the value according to the rules for the type (size_t in this case). Thus, even if int and size_t have different sizes, the standard guarantees that the value stored in s above will be the maximum value that size_t can take.
If we do:
long j = LONG_MAX;
int i = j;
If LONG_MAX is greater than INT_MAX, then the value in i is implementation-defined (C89, section 3.2.1.2).
It's called two's complement. To make a negative number, invert all the bits then add 1. So to convert 1 to -1, invert it to 0xFFFFFFFE, then add 1 to make 0xFFFFFFFF.
As to why it's done this way, Wikipedia says:
The two's-complement system has the advantage of not requiring that the addition and subtraction circuitry examine the signs of the operands to determine whether to add or subtract. This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic.
Your first question, about why (unsigned)-1 gives the largest possible unsigned value is only accidentally related to two's complement. The reason -1 cast to an unsigned type gives the largest value possible for that type is because the standard says the unsigned types "follow the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer."
Now, for 2's complement, the representation of the largest possible unsigned value and -1 happen to be the same -- but even if the hardware uses another representation (e.g. 1's complement or sign/magnitude), converting -1 to an unsigned type must still produce the largest possible value for that type.
Two's complement is very nice for doing subtraction just like addition :)
11111110 (254 or -2)
+00000001 ( 1)
---------
11111111 (255 or -1)
11111111 (255 or -1)
+00000001 ( 1)
---------
100000000 ( 0 + 256)
That is two's complement encoding.
The main bonus is that you get the same encoding whether you are using an unsigned or signed int. If you subtract 1 from 0 the integer simply wraps around. Therefore 1 less than 0 is 0xFFFFFFFF.
Because the bit pattern for an int
-1 is FFFFFFFF in hexadecimal unsigned.
11111111111111111111111111111111 binary unsigned.
But in int the first bit signifies whether it is negative.
But in unsigned int the first bit is just extra number because a unsigned int cannot be negative. So the extra bit makes an unsigned int able to store bigger numbers.
As with an unsigned int 11111111111111111111111111111111 (binary) or FFFFFFFF (hexadecimal) is the biggest number a uint can store.
Unsigned Ints are not recommended because if they go negative then it overflows and goes to the biggest number.