I wonder about type conversion from a smaller signed integer to a larger unsigned integer. It appears that the compiler first converts the signed integer to a signed integer of the same size as the destination, then to the unsigned integer.
Regard the following C++ code:
#include <assert.h>
#include <iostream>
typedef int sint;
typedef unsigned __int64 luint;
int main(int, char**) {
assert(sizeof(luint) > sizeof(sint));
sint i = -10;
luint j = i;
std::cout << std::hex << j;
}
Under Visual C++ this yields: fffffffffffffff6.
This is as I like it. Can I be sure that all compilers will behave this way? If the signed integer would be converted to an unsigned first and then to the new size, the result would have been fffffff6.
Signed to unsigned conversion uses modulo 2n arithmetic. From the C++11 Standard, section 4.7 Integral conversions [conv.integral] (§4.7/2):
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type).
So j takes the value 264 − 10, which is 0xfffffffffffffff6.
Related
I have this simple C program.
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
bool foo (unsigned int a) {
return (a > -2L);
}
bool bar (unsigned long a) {
return (a > -2L);
}
int main() {
printf("foo returned = %d\n", foo(99));
printf("bar returned = %d\n", bar(99));
return 0;
}
Output when I run this -
foo returned = 1
bar returned = 0
Recreated in godbolt here
My question is why does foo(99) return true but bar(99) return false.
To me it makes sense that bar would return false. For simplicity lets say longs are 8 bits, then (using twos complement for signed value):
99 == 0110 0011
-2 == unsigned 254 == 1111 1110
So clearly the CMP instruction will see that 1111 1110 is bigger and return false.
But I dont understand what is going on behind the scenes in the foo function. The assembly for foo seems to hardcode to always return mov eax,0x1. I would have expected foo to do something similar to bar. What is going on here?
This is covered in C classes and is specified in the documentation. Here is how you use documents to figure this out.
In the 2018 C standard, you can look up > or “relational expressions” in the index to see they are discussed on pages 68-69. On page 68, you will find clause 6.5.8, which covers relational operators, including >. Reading it, paragraph 3 says:
If both of the operands have arithmetic type, the usual arithmetic conversions are performed.
“Usual arithmetic conversions” is listed in the index as defined on page 39. Page 39 has clause 6.3.1.8, “Usual arithmetic conversions.” This clause explains that operands of arithmetic types are converted to a common type, and it gives rules determining the common type. For two integer types of different signedness, such as the unsigned long and the long int in bar (a and -2L), it says that, if the unsigned type has rank greater than or equal to the rank of the other type, the signed type is converted to the unsigned type.
“Rank” is not in the index, but you can search the document to find it is discussed in clause 6.3.1.1, where it tells you the rank of long int is greater than the rank of int, and the any unsigned type has the same rank as the corresponding type.
Now you can consider a > -2L in bar, where a is unsigned long. Here we have an unsigned long compared with a long. They have the same rank, so -2L is converted to unsigned long. Conversion of a signed integer to unsigned is discussed in clause 6.3.1.3. It says the value is converted by wrapping it modulo ULONG_MAX+1, so converting the signed long −2 produces a ULONG_MAX+1−2 = ULONG_MAX−1, which is a large integer. Then comparing a, which has the value 99, to a large integer with > yields false, so zero is returned.
For foo, we continue with the rules for the usual arithmetic conversions. When the unsigned type does not have rank greater than or equal to the rank of the signed type, but the signed type can represent all the values of the type of the operand with unsigned type, the operand with the unsigned type is converted to the operand of the signed type. In foo, a is unsigned int and -2L is long int. Presumably in your C implementation, long int is 64 bits, so it can represent all the values of a 32-bit unsigned int. So this rule applies, and a is converted to long int. This does not change the value. So the original value of a, 99, is compared to −2 with >, and this yields true, so one is returned.
In the first function
bool foo (unsigned int a) {
return (a > -2L);
}
the both operands of the expression a > -2L have the type long (the first operand is converted to the type long due to the usual arithmetic conversions because the rank of the type long is greater than the rank of the type unsigned int and all values of the type unsigned int in the used system can be represented by the type long). And it is evident that the positive value 99L is greater than the negative value -2L.
The first function could produce the result 0 provided that sizeof( long ) is equal to sizeof( unsigned int ). In this case the type long is unable to represent all (positive) values of the type unsigned int. As a result due to the usual arithmetic conversions the both operands will be converted to the type unsigned long.
For example running the function foo using MS VS 2019 where sizeof( long ) is equal to 4 as sizeof( unsigned int ) you will get the result 0.
Here is a demonstration program written in C++ that visually shows the reason why the result of a call of the function foo using MS VS 2019 can be equal to 0.
#include <iostream>
#include <iomanip>
#include <type_traits>
int main()
{
unsigned int x = 0;
long y = 0;
std::cout << "sizeof( unsigned int ) = " << sizeof( unsigned int ) << '\n';
std::cout << "sizeof( long ) = " << sizeof(long) << '\n';
std::cout << "std::is_same_v<decltype( x + y ), unsigned long> is "
<< std::boolalpha
<< std::is_same_v<decltype( x + y ), unsigned long>
<< '\n';
}
The program output is
sizeof( unsigned int ) = 4
sizeof( long ) = 4
std::is_same_v<decltype( x + y ), unsigned long> is true
That is in general the result of the first function is implementation defined.
In the second functions
bool bar (unsigned long a) {
return (a > -2L);
}
the both operands have the type unsigned long (again due to the usual arithmetic conversions and ranks of the types unsigned long and signed long are equal each other, so an object of the type signed long is converted to the type unsigned long) and -2L interpreted as unsigned long is greater than 99.
The reason for this has to do with the rules of integer conversions.
In the first case, you compare an unsigned int with a long using the > operator, and in the second case you compare a unsigned long with a long.
These operands must first be converted to a common type using the usual arithmetic conversions. These are spelled out in section 6.3.1.8p1 of the C standard, with the following excerpt focusing on integer conversions:
If both operands have the same type, then no further conversion is
needed.
Otherwise, if both operands have signed integer types or both have
unsigned integer types, the operand with the type of lesser integer
conversion rank is converted to the type of the operand with greater
rank.
Otherwise, if the operand that has unsigned integer type has rank
greater or equal to the rank of the type of the other operand, then
the operand with signed integer type is converted to the type of the
operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can
represent all of the values of the type of the operand with unsigned
integer type, then the operand with unsigned integer type is converted
to the type of the operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
In the case of comparing an unsigned int with a long the second bolded paragraph applies. long has higher rank and (assuming long is 64 bit and int is 32 bit) can hold all values than an unsigned int can, so the unsigned int operand a is converted to a long. Since the value in question is in the range of long, section 6.3.1.3p1 dictates how the conversion happens:
When a value with integer type is converted to another integer type
other than _Bool, if the value can be represented by the new type, it
is unchanged
So the value is preserved and we're left with 99 > -2 which is true.
In the case of comparing an unsigned long with a long, the first bolded paragraph applies. Both types are of the same rank with different signs, so the long constant -2L is converted to unsigned long. -2 is outside the range of an unsigned long so a value conversion must happen. This conversion is specified in section 6.3.1.3p2:
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
So the long value -2 will be converted to the unsigned long value 264-2, assuming unsigned long is 64 bit. So we're left with 99 > 264-2, which is false.
I think what is happening here is implicit promotion by the compiler. When you perform comparison on two different primitives, the compiler will promote one of them to the same type as the other. I believe the rules are that the type with the larger possible value is used as the standard.
So in foo() you are implicitly promoting your argument to a signed long type and the comparison works as expected.
In bar() your argument is an unsigned long, which has a larger maximum value than signed long. Here the compiler promotes -2L to unsigned long, which turns into a very large number.
I have a simple program. Notice that I use an unsigned fixed-width integer 1 byte in size.
#include <cstdint>
#include <iostream>
#include <limits>
int main()
{
uint8_t x = 12;
std::cout << (x << 1) << '\n';
std::cout << ~x;
std::cin.clear();
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::cin.get();
return 0;
}
My output is the following.
24
-13
I tested larger numbers and operator << always gives me positive numbers, while operator ~ always gives me negative numbers. I then used sizeof() and found...
When I use the left shift bitwise operator(<<), I receive an unsigned 4 byte integer.
When I use the bitwise not operator(~), I receive a signed 4 byte integer.
It seems that the bitwise not operator(~) does a signed integral promotion like the arithmetic operators do. However, the left shift operator(<<) seems to promote to an unsigned integral.
I feel obligated to know when the compiler is changing something behind my back. If I'm correct in my analysis, do all the bitwise operators promote to a 4 byte integer? And why are some signed and some unsigned? I'm so confused!
Edit: My assumption of always getting positive or always getting negative values was wrong. But from being wrong, I understand what was really happening thanks to the great answers below.
[expr.unary.op]
The operand of ~ shall have integral or unscoped enumeration type; the
result is the one’s complement of its operand. Integral promotions are
performed.
[expr.shift]
The shift operators << and >> group left-to-right. [...] The operands shall be of integral or unscoped enumeration type and integral promotions are performed.
What's the integral promotion of uint8_t (which is usually going to be unsigned_char behind the scenes)?
[conv.prom]
A prvalue of an integer type other than bool, char16_t, char32_t, or
wchar_t whose integer conversion rank (4.13) is less than the rank of
int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be
converted to a prvalue of type unsigned int.
So int, because all of the values of a uint8_t can be represented by int.
What is int(12) << 1 ? int(24).
What is ~int(12) ? int(-13).
For performance reasons the C and C++ language consider int to be the "most natural" integer type and instead types that are "smaller" than an int are considered sort of "storage" type.
When you use a storage type in an expression it gets automatically converted to an int or to an unsigned int implicitly. For example:
// Assume a char is 8 bit
unsigned char x = 255;
unsigned char one = 1;
int y = x + one; // result will be 256 (too large for a byte!)
++x; // x is now 0
what happened is that x and one in the first expression have been implicitly converted to integers, the addition has been computed and the result has been stored back in an integer. In other words the computation has NOT been performed using two unsigned chars.
Likewise if you have a float value in an expression the first thing the compiler will do is promoting it to a double (in other words float is a storage type and double is instead the natural size for floating point numbers). This is the reason for which if you use printf to print floats you don't need to say %lf int the format strings and %f is enough (%lf is needed for scanf however because that function stores a result and a float can be smaller than a double).
C++ complicated the matter quite a bit because when passing parameters to functions you can discriminate between ints and smaller types. Thus it's not ALWAYS true that a conversion is performed in every expression... for example you can have:
void foo(unsigned char x);
void foo(int x);
where
unsigned char x = 255, one = 1;
foo(x); // Calls foo(unsigned char), no promotion
foo(x + one); // Calls foo(int), promotion of both x and one to int
I tested larger numbers and operator << always gives me positive
numbers, while operator ~ always gives me negative numbers. I then
used sizeof() and found...
Wrong, test it:
uint8_t v = 1;
for (int i=0; i<32; i++) cout << (v<<i) << endl;
gives:
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912
1073741824
-2147483648
uint8_t is an 8-bit long unsigned integer type, which can represent values in the range [0,255], as that range in included in the range of int it is promoted to int (not unsigned int). Promotion to int has precedence over promotion to unsigned.
Look into two's complement and how computer stores negative integers.
Try this
#include <cstdint>
#include <iostream>
#include <limits>
int main()
{
uint8_t x = 1;
int shiftby=0;
shiftby=8*sizeof(int)-1;
std::cout << (x << shiftby) << '\n'; // or std::cout << (x << 31) << '\n';
std::cout << ~x;
std::cin.clear();
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::cin.get();
return 0;
}
The output is -2147483648
In general if the first bit of a signed number is 1 it is considered negative. when you take a large number and shift it. If you shift it so that the first bit is 1 it will be negative
** EDIT **
Well I can think of a reason why shift operators would use unsigned int. Consider right shift operation >> if you right shift -12 you will get 122 instead of -6. This is because it adds a zero in the beginning without considering the sign
I'm facing a problem where signed integers should be converted to unsigneds, preserving their range and order.
Given the following definition:
#include <limits>
#define MIN(X) std::numeric_limits<X>::min();
#define MAX(X) std::numeric_limits<X>::max();
What is the fastest and correct way to map the signed range [MIN(T), MAX(T)] to the unsigned range [0, MAX(U)]?
where:
T is a signed integer type
U is an unsigned integer type
sizeof(T) == sizeof(U)
I tried various bit twiddling and numeric methods to come up with a solution, without success.
unsigned int signedToUnsigned(signed int s) {
unsigned int u = 1U + std::numeric_limits<int>::max();
u += s;
return u;
}
Live example here
This will add signed_max + 1 to signed int to ensure [MIN(int), MAX(int)] is mapped to [0, MAX(unsigned int)]
Why would this answer work and map correctly:
When you add a signed integral number to an unsigned, the signed number is promoted to unsigned type. From Section 4.7 [conv.integral]
If the destination type is unsigned, the resulting value is the least
unsigned integer congruent to the source integer (modulo 2n
where n is the number of bits used to represent the unsigned type). [
Note: In a two’s complement representation, this conversion is
conceptual and there is no change in the bit pattern (if there is no
truncation). —end note ]
I have a simple program. Notice that I use an unsigned fixed-width integer 1 byte in size.
#include <cstdint>
#include <iostream>
#include <limits>
int main()
{
uint8_t x = 12;
std::cout << (x << 1) << '\n';
std::cout << ~x;
std::cin.clear();
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::cin.get();
return 0;
}
My output is the following.
24
-13
I tested larger numbers and operator << always gives me positive numbers, while operator ~ always gives me negative numbers. I then used sizeof() and found...
When I use the left shift bitwise operator(<<), I receive an unsigned 4 byte integer.
When I use the bitwise not operator(~), I receive a signed 4 byte integer.
It seems that the bitwise not operator(~) does a signed integral promotion like the arithmetic operators do. However, the left shift operator(<<) seems to promote to an unsigned integral.
I feel obligated to know when the compiler is changing something behind my back. If I'm correct in my analysis, do all the bitwise operators promote to a 4 byte integer? And why are some signed and some unsigned? I'm so confused!
Edit: My assumption of always getting positive or always getting negative values was wrong. But from being wrong, I understand what was really happening thanks to the great answers below.
[expr.unary.op]
The operand of ~ shall have integral or unscoped enumeration type; the
result is the one’s complement of its operand. Integral promotions are
performed.
[expr.shift]
The shift operators << and >> group left-to-right. [...] The operands shall be of integral or unscoped enumeration type and integral promotions are performed.
What's the integral promotion of uint8_t (which is usually going to be unsigned_char behind the scenes)?
[conv.prom]
A prvalue of an integer type other than bool, char16_t, char32_t, or
wchar_t whose integer conversion rank (4.13) is less than the rank of
int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be
converted to a prvalue of type unsigned int.
So int, because all of the values of a uint8_t can be represented by int.
What is int(12) << 1 ? int(24).
What is ~int(12) ? int(-13).
For performance reasons the C and C++ language consider int to be the "most natural" integer type and instead types that are "smaller" than an int are considered sort of "storage" type.
When you use a storage type in an expression it gets automatically converted to an int or to an unsigned int implicitly. For example:
// Assume a char is 8 bit
unsigned char x = 255;
unsigned char one = 1;
int y = x + one; // result will be 256 (too large for a byte!)
++x; // x is now 0
what happened is that x and one in the first expression have been implicitly converted to integers, the addition has been computed and the result has been stored back in an integer. In other words the computation has NOT been performed using two unsigned chars.
Likewise if you have a float value in an expression the first thing the compiler will do is promoting it to a double (in other words float is a storage type and double is instead the natural size for floating point numbers). This is the reason for which if you use printf to print floats you don't need to say %lf int the format strings and %f is enough (%lf is needed for scanf however because that function stores a result and a float can be smaller than a double).
C++ complicated the matter quite a bit because when passing parameters to functions you can discriminate between ints and smaller types. Thus it's not ALWAYS true that a conversion is performed in every expression... for example you can have:
void foo(unsigned char x);
void foo(int x);
where
unsigned char x = 255, one = 1;
foo(x); // Calls foo(unsigned char), no promotion
foo(x + one); // Calls foo(int), promotion of both x and one to int
I tested larger numbers and operator << always gives me positive
numbers, while operator ~ always gives me negative numbers. I then
used sizeof() and found...
Wrong, test it:
uint8_t v = 1;
for (int i=0; i<32; i++) cout << (v<<i) << endl;
gives:
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912
1073741824
-2147483648
uint8_t is an 8-bit long unsigned integer type, which can represent values in the range [0,255], as that range in included in the range of int it is promoted to int (not unsigned int). Promotion to int has precedence over promotion to unsigned.
Look into two's complement and how computer stores negative integers.
Try this
#include <cstdint>
#include <iostream>
#include <limits>
int main()
{
uint8_t x = 1;
int shiftby=0;
shiftby=8*sizeof(int)-1;
std::cout << (x << shiftby) << '\n'; // or std::cout << (x << 31) << '\n';
std::cout << ~x;
std::cin.clear();
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::cin.get();
return 0;
}
The output is -2147483648
In general if the first bit of a signed number is 1 it is considered negative. when you take a large number and shift it. If you shift it so that the first bit is 1 it will be negative
** EDIT **
Well I can think of a reason why shift operators would use unsigned int. Consider right shift operation >> if you right shift -12 you will get 122 instead of -6. This is because it adds a zero in the beginning without considering the sign
I understand that, regarding implicit conversions, if we have an unsigned type operand and a signed type operand, and the type of the unsigned operand is the same as (or larger) than the type of the signed operand, the signed operand will be converted to unsigned.
So:
unsigned int u = 10;
signed int s = -8;
std::cout << s + u << std::endl;
//prints 2 because it will convert `s` to `unsigned int`, now `s` has the value
//4294967288, then it will add `u` to it, which is an out-of-range value, so,
//in my machine, `4294967298 % 4294967296 = 2`
What I don't understand - I read that if the signed operand has a larger type than the unsigned operand:
if all values in the unsigned type fit in the larger type then the unsigned operand is converted to the signed type
if the values in the unsigned type don't fit in the larger type, then the signed operand will be converted to the unsigned type
so in the following code:
signed long long s = -8;
unsigned int u = 10;
std::cout << s + u << std::endl;
u will be converted to signed long long because int values can fit in signed long long??
If that's the case, in what scenario the smaller type values won't fit in the larger one?
Relevant quote from the Standard:
5 Expressions [expr]
10 Many binary operators that expect operands of arithmetic or
enumeration type cause conversions and yield result types in a similar
way. The purpose is to yield a common type, which is also the type of
the result. This pattern is called the usual arithmetic conversions,
which are defined as follows:
[2 clauses about equal types or types of equal sign omitted]
— Otherwise, if the operand that has unsigned integer type has rank
greater than or equal to the rank of the type of the other operand,
the operand with signed integer type shall be converted to the type of
the operand with unsigned integer type.
— Otherwise, if the type of
the operand with signed integer type can represent all of the values
of the type of the operand with unsigned integer type, the operand
with unsigned integer type shall be converted to the type of the
operand with signed integer type.
— Otherwise, both operands shall be
converted to the unsigned integer type corresponding to the type of
the operand with signed integer type.
Let's consider the following 3 example cases for each of the 3 above clauses on a system where sizeof(int) < sizeof(long) == sizeof(long long) (easily adaptable to other cases)
#include <iostream>
signed int s1 = -4;
unsigned int u1 = 2;
signed long int s2 = -4;
unsigned int u2 = 2;
signed long long int s3 = -4;
unsigned long int u3 = 2;
int main()
{
std::cout << (s1 + u1) << "\n"; // 4294967294
std::cout << (s2 + u2) << "\n"; // -2
std::cout << (s3 + u3) << "\n"; // 18446744073709551614
}
Live example with output.
First clause: types of equal rank, so the signed int operand is converted to unsigned int. This entails a value-transformation which (using two's complement) gives te printed value.
Second clause: signed type has higher rank, and (on this platform!) can represent all values of the unsigned type, so unsigned operand is converted to signed type, and you get -2
Third clause: signed type again has higher rank, but (on this platform!) cannot represent all values of the unsigned type, so both operands are converted to unsigned long long, and after the value-transformation on the signed operand, you get the printed value.
Note that when the unsigned operand would be large enough (e.g. 6 in these examples), then the end result would give 2 for all 3 examples because of unsigned integer overflow.
(Added) Note that you get even more unexpected results when you do comparisons on these types. Lets consider the above example 1 with <:
#include <iostream>
signed int s1 = -4;
unsigned int u1 = 2;
int main()
{
std::cout << (s1 < u1 ? "s1 < u1" : "s1 !< u1") << "\n"; // "s1 !< u1"
std::cout << (-4 < 2u ? "-4 < 2u" : "-4 !< 2u") << "\n"; // "-4 !< 2u"
}
Since 2u is made unsigned explicitly by the u suffix the same rules apply. And the result is probably not what you expect when comparing -4 < 2 when writing in C++ -4 < 2u...
signed int does not fit into unsigned long long. So you will have this conversion:
signed int -> unsigned long long.
Note that the C++11 standard doesn't talk about the larger or smaller types here, it talks about types with lower or higher rank.
Consider the case of long int and unsigned int where both are 32-bit. The long int has a larger rank than the unsigned int, but since long int and unsigned int are both 32-bit, long int can't represent all the values of unsigned int.
Therefore we fall into to the last case (C++11: 5.6p9):
Otherwise, both operands shall be converted to the unsigned integer type corresponding to the
type of the operand with signed integer type.
This means that both the long int and the unsigned int will be converted to unsigned long int.