C++ Implicit Conversion (Signed + Unsigned) - c++

I understand that, regarding implicit conversions, if we have an unsigned type operand and a signed type operand, and the type of the unsigned operand is the same as (or larger) than the type of the signed operand, the signed operand will be converted to unsigned.
So:
unsigned int u = 10;
signed int s = -8;
std::cout << s + u << std::endl;
//prints 2 because it will convert `s` to `unsigned int`, now `s` has the value
//4294967288, then it will add `u` to it, which is an out-of-range value, so,
//in my machine, `4294967298 % 4294967296 = 2`
What I don't understand - I read that if the signed operand has a larger type than the unsigned operand:
if all values in the unsigned type fit in the larger type then the unsigned operand is converted to the signed type
if the values in the unsigned type don't fit in the larger type, then the signed operand will be converted to the unsigned type
so in the following code:
signed long long s = -8;
unsigned int u = 10;
std::cout << s + u << std::endl;
u will be converted to signed long long because int values can fit in signed long long??
If that's the case, in what scenario the smaller type values won't fit in the larger one?

Relevant quote from the Standard:
5 Expressions [expr]
10 Many binary operators that expect operands of arithmetic or
enumeration type cause conversions and yield result types in a similar
way. The purpose is to yield a common type, which is also the type of
the result. This pattern is called the usual arithmetic conversions,
which are defined as follows:
[2 clauses about equal types or types of equal sign omitted]
— Otherwise, if the operand that has unsigned integer type has rank
greater than or equal to the rank of the type of the other operand,
the operand with signed integer type shall be converted to the type of
the operand with unsigned integer type.
— Otherwise, if the type of
the operand with signed integer type can represent all of the values
of the type of the operand with unsigned integer type, the operand
with unsigned integer type shall be converted to the type of the
operand with signed integer type.
— Otherwise, both operands shall be
converted to the unsigned integer type corresponding to the type of
the operand with signed integer type.
Let's consider the following 3 example cases for each of the 3 above clauses on a system where sizeof(int) < sizeof(long) == sizeof(long long) (easily adaptable to other cases)
#include <iostream>
signed int s1 = -4;
unsigned int u1 = 2;
signed long int s2 = -4;
unsigned int u2 = 2;
signed long long int s3 = -4;
unsigned long int u3 = 2;
int main()
{
std::cout << (s1 + u1) << "\n"; // 4294967294
std::cout << (s2 + u2) << "\n"; // -2
std::cout << (s3 + u3) << "\n"; // 18446744073709551614
}
Live example with output.
First clause: types of equal rank, so the signed int operand is converted to unsigned int. This entails a value-transformation which (using two's complement) gives te printed value.
Second clause: signed type has higher rank, and (on this platform!) can represent all values of the unsigned type, so unsigned operand is converted to signed type, and you get -2
Third clause: signed type again has higher rank, but (on this platform!) cannot represent all values of the unsigned type, so both operands are converted to unsigned long long, and after the value-transformation on the signed operand, you get the printed value.
Note that when the unsigned operand would be large enough (e.g. 6 in these examples), then the end result would give 2 for all 3 examples because of unsigned integer overflow.
(Added) Note that you get even more unexpected results when you do comparisons on these types. Lets consider the above example 1 with <:
#include <iostream>
signed int s1 = -4;
unsigned int u1 = 2;
int main()
{
std::cout << (s1 < u1 ? "s1 < u1" : "s1 !< u1") << "\n"; // "s1 !< u1"
std::cout << (-4 < 2u ? "-4 < 2u" : "-4 !< 2u") << "\n"; // "-4 !< 2u"
}
Since 2u is made unsigned explicitly by the u suffix the same rules apply. And the result is probably not what you expect when comparing -4 < 2 when writing in C++ -4 < 2u...

signed int does not fit into unsigned long long. So you will have this conversion:
signed int -> unsigned long long.

Note that the C++11 standard doesn't talk about the larger or smaller types here, it talks about types with lower or higher rank.
Consider the case of long int and unsigned int where both are 32-bit. The long int has a larger rank than the unsigned int, but since long int and unsigned int are both 32-bit, long int can't represent all the values of unsigned int.
Therefore we fall into to the last case (C++11: 5.6p9):
Otherwise, both operands shall be converted to the unsigned integer type corresponding to the
type of the operand with signed integer type.
This means that both the long int and the unsigned int will be converted to unsigned long int.

Related

Comparing unsigned integer with negative literals

I have this simple C program.
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
bool foo (unsigned int a) {
return (a > -2L);
}
bool bar (unsigned long a) {
return (a > -2L);
}
int main() {
printf("foo returned = %d\n", foo(99));
printf("bar returned = %d\n", bar(99));
return 0;
}
Output when I run this -
foo returned = 1
bar returned = 0
Recreated in godbolt here
My question is why does foo(99) return true but bar(99) return false.
To me it makes sense that bar would return false. For simplicity lets say longs are 8 bits, then (using twos complement for signed value):
99 == 0110 0011
-2 == unsigned 254 == 1111 1110
So clearly the CMP instruction will see that 1111 1110 is bigger and return false.
But I dont understand what is going on behind the scenes in the foo function. The assembly for foo seems to hardcode to always return mov eax,0x1. I would have expected foo to do something similar to bar. What is going on here?
This is covered in C classes and is specified in the documentation. Here is how you use documents to figure this out.
In the 2018 C standard, you can look up > or “relational expressions” in the index to see they are discussed on pages 68-69. On page 68, you will find clause 6.5.8, which covers relational operators, including >. Reading it, paragraph 3 says:
If both of the operands have arithmetic type, the usual arithmetic conversions are performed.
“Usual arithmetic conversions” is listed in the index as defined on page 39. Page 39 has clause 6.3.1.8, “Usual arithmetic conversions.” This clause explains that operands of arithmetic types are converted to a common type, and it gives rules determining the common type. For two integer types of different signedness, such as the unsigned long and the long int in bar (a and -2L), it says that, if the unsigned type has rank greater than or equal to the rank of the other type, the signed type is converted to the unsigned type.
“Rank” is not in the index, but you can search the document to find it is discussed in clause 6.3.1.1, where it tells you the rank of long int is greater than the rank of int, and the any unsigned type has the same rank as the corresponding type.
Now you can consider a > -2L in bar, where a is unsigned long. Here we have an unsigned long compared with a long. They have the same rank, so -2L is converted to unsigned long. Conversion of a signed integer to unsigned is discussed in clause 6.3.1.3. It says the value is converted by wrapping it modulo ULONG_MAX+1, so converting the signed long −2 produces a ULONG_MAX+1−2 = ULONG_MAX−1, which is a large integer. Then comparing a, which has the value 99, to a large integer with > yields false, so zero is returned.
For foo, we continue with the rules for the usual arithmetic conversions. When the unsigned type does not have rank greater than or equal to the rank of the signed type, but the signed type can represent all the values of the type of the operand with unsigned type, the operand with the unsigned type is converted to the operand of the signed type. In foo, a is unsigned int and -2L is long int. Presumably in your C implementation, long int is 64 bits, so it can represent all the values of a 32-bit unsigned int. So this rule applies, and a is converted to long int. This does not change the value. So the original value of a, 99, is compared to −2 with >, and this yields true, so one is returned.
In the first function
bool foo (unsigned int a) {
return (a > -2L);
}
the both operands of the expression a > -2L have the type long (the first operand is converted to the type long due to the usual arithmetic conversions because the rank of the type long is greater than the rank of the type unsigned int and all values of the type unsigned int in the used system can be represented by the type long). And it is evident that the positive value 99L is greater than the negative value -2L.
The first function could produce the result 0 provided that sizeof( long ) is equal to sizeof( unsigned int ). In this case the type long is unable to represent all (positive) values of the type unsigned int. As a result due to the usual arithmetic conversions the both operands will be converted to the type unsigned long.
For example running the function foo using MS VS 2019 where sizeof( long ) is equal to 4 as sizeof( unsigned int ) you will get the result 0.
Here is a demonstration program written in C++ that visually shows the reason why the result of a call of the function foo using MS VS 2019 can be equal to 0.
#include <iostream>
#include <iomanip>
#include <type_traits>
int main()
{
unsigned int x = 0;
long y = 0;
std::cout << "sizeof( unsigned int ) = " << sizeof( unsigned int ) << '\n';
std::cout << "sizeof( long ) = " << sizeof(long) << '\n';
std::cout << "std::is_same_v<decltype( x + y ), unsigned long> is "
<< std::boolalpha
<< std::is_same_v<decltype( x + y ), unsigned long>
<< '\n';
}
The program output is
sizeof( unsigned int ) = 4
sizeof( long ) = 4
std::is_same_v<decltype( x + y ), unsigned long> is true
That is in general the result of the first function is implementation defined.
In the second functions
bool bar (unsigned long a) {
return (a > -2L);
}
the both operands have the type unsigned long (again due to the usual arithmetic conversions and ranks of the types unsigned long and signed long are equal each other, so an object of the type signed long is converted to the type unsigned long) and -2L interpreted as unsigned long is greater than 99.
The reason for this has to do with the rules of integer conversions.
In the first case, you compare an unsigned int with a long using the > operator, and in the second case you compare a unsigned long with a long.
These operands must first be converted to a common type using the usual arithmetic conversions. These are spelled out in section 6.3.1.8p1 of the C standard, with the following excerpt focusing on integer conversions:
If both operands have the same type, then no further conversion is
needed.
Otherwise, if both operands have signed integer types or both have
unsigned integer types, the operand with the type of lesser integer
conversion rank is converted to the type of the operand with greater
rank.
Otherwise, if the operand that has unsigned integer type has rank
greater or equal to the rank of the type of the other operand, then
the operand with signed integer type is converted to the type of the
operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can
represent all of the values of the type of the operand with unsigned
integer type, then the operand with unsigned integer type is converted
to the type of the operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
In the case of comparing an unsigned int with a long the second bolded paragraph applies. long has higher rank and (assuming long is 64 bit and int is 32 bit) can hold all values than an unsigned int can, so the unsigned int operand a is converted to a long. Since the value in question is in the range of long, section 6.3.1.3p1 dictates how the conversion happens:
When a value with integer type is converted to another integer type
other than _Bool, if the value can be represented by the new type, it
is unchanged
So the value is preserved and we're left with 99 > -2 which is true.
In the case of comparing an unsigned long with a long, the first bolded paragraph applies. Both types are of the same rank with different signs, so the long constant -2L is converted to unsigned long. -2 is outside the range of an unsigned long so a value conversion must happen. This conversion is specified in section 6.3.1.3p2:
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
So the long value -2 will be converted to the unsigned long value 264-2, assuming unsigned long is 64 bit. So we're left with 99 > 264-2, which is false.
I think what is happening here is implicit promotion by the compiler. When you perform comparison on two different primitives, the compiler will promote one of them to the same type as the other. I believe the rules are that the type with the larger possible value is used as the standard.
So in foo() you are implicitly promoting your argument to a signed long type and the comparison works as expected.
In bar() your argument is an unsigned long, which has a larger maximum value than signed long. Here the compiler promotes -2L to unsigned long, which turns into a very large number.

Strange type deduction

Today I saw a really strange type deduction. Here is the code:
unsigned int y = 15;
int k = 5;
auto t = k - y / 2;
Since k is int, I assumed that type of t should be int too. But to my surprise, its type is unsigned int. I cannot find why type is deduced as unsigned int. Any idea why?
Due to the usual arithmetic conversions if two operands have the same conversion rank and one of the operands has unsigned integer type then the type of the expression has the same unsigned integer type.
From the C++ 17 Standard (5 Expressions, p.#10)
— Otherwise, if the operand that has unsigned integer type has rank
greater than or equal to the rank of the type of the other operand,
the operand with signed integer type shall be converted to the type of
the operand with unsigned integer type.
Pay attention to that the conversion rank of the type unsigned int is equal to the rank of the type int (signed int). From the C++ 17 Standard (4.13 Integer conversion rank, p.#1)
— The rank of any unsigned integer type shall equal the rank of the
corresponding signed integer type
A more interesting example is the following. Let's assume that there are two declarations
unsigned int x = 0;
long y = 0;
and the width of the both types is the same and equal for example to 4 bytes. As it is known the rank of the type long is greater than the rank of the type unsigned int. A question arises what id the type of the expression
x + y
The type of the expression is unsigned long.:)
Here is a demonstrative program but instead of the types long and unsigned int there are used the types long long and unsigned long.
#include <iostream>
#include <iomanip>
#include <type_traits>
int main()
{
unsigned long int x = 0;
long long int y = 0;
std::cout << "sizeof( unsigned long ) = "
<< sizeof( unsigned long )
<< '\n';
std::cout << "sizeof( long long ) = "
<< sizeof( long long )
<< '\n';
std::cout << std::boolalpha
<< std::is_same<unsigned long long, decltype( x + y )>::value
<< '\n';
return 0;
}
The program output is
sizeof( unsigned long ) = 8
sizeof( long long ) = 8
true
That is the type of the expression x + y is unsigned long long though neither operand of the expression has this type.

Why is the output of fixed width unsigned integer negative while unsigned integer output wraps around as expected?

#include <iostream>
#define TRY_INT
void testRun()
{
#ifdef TRY_INT //test with unsigned
unsigned int value1{1}; //define some unsigned variables
unsigned int value2{1};
unsigned int value3{2};
#else //test with fixed width
uint16_t value1{1}; //define fixed width unsigned variables
uint16_t value2{1};
uint16_t value3{2};
#endif
if ( value1 > value2 - value3 )
{
std::cout << value1 << " is bigger than: " << value2 - value3 << "\n";
}
else
{
std::cout << value1 << " is smaller than: " << value2 - value3 << "\n";
}
}
int main()
{
testRun();
return 0;
}
with unsigned integers I get:
1 is smaller than: 4294967295
with fixed width unsigned int, output is:
1 is smaller than: -1
My expectation was it would wrap around as well, does this have something to do with std::cout?
I guess it is caused by integral promotion. Citing form cppreference:
...arithmetic operators do not accept types smaller than int as arguments, and integral promotions are automatically applied after lvalue-to-rvalue conversion, if applicable.
unsigned char, char8_t (since C++20) or unsigned short can be converted to int if it can hold its entire value range...
Consequently, if uint16_t is just an alias for unsigned short on your implementation, value2 - value3 is calculated with int type and the result is also int, that's why -1 is shown.
With unsigned int, no promotion is applied and the whole calculation is performed in this type.
In the latest online C++ Draft, see [conv.prom/1]:
A prvalue of an integer type other than bool, char16_­t, char32_­t, or wchar_­t whose integer conversion rank is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.
unsigned int is equivalent to uint32_t and unsigned short int is equivalent to uint16_t.
Therefore, if you use unsigned short int instead of unsigned int you will get the same behavior as for uint16_t.
Why do you get -1?
Integral promotion will try to convert unsigned short int to int if int can hold all possible values of unsigned short int. On the other hand, if that is not the case, integral promotion to unsigned int will be performed.
Therefore the subtraction is most likely done in the type int, not uint16_t.

C++ Standard: Strange signed/unsigned arithmetic division behavior for 32 and 64 bits

I encountered a wrong behavior of my code. Investigating it leads me to a short example which shows the problem:
//g++ 5.4.0
#include <iostream>
#include <vector>
int main()
{
std::vector<short> v(20);
auto D = &v[5] - &v[10];
auto C = D / sizeof(short);
std::cout << "C = " << C;
}
The example is a quite common. What is the result it will print?
C = 9223372036854775805
Tested here: https://rextester.com/l/cpp_online_compiler_gcc
Tested also for Clang C++, VS C++ and C. Result the same.
Discussing with colleagues I was pointed to the document https://en.cppreference.com/w/cpp/language/operator_arithmetic#Conversions .
It tells:
If both operands are signed or both are unsigned, the operand with lesser conversion rank is converted to the operand with the greater integer conversion rank
Otherwise, if the unsigned operand's conversion rank is greater or equal to the conversion rank of the signed operand, the signed operand is converted to the unsigned operand's type.
Otherwise, if the signed operand's type can represent all values of the unsigned operand, the unsigned operand is converted to the signed operand's type
It seems the second rule is working here. But it is not true.
To confirm the second rule, I have tested such example:
//g++ 5.4.0
#include <iostream>
int main()
{
typedef uint32_t u_t; // uint64_t, uint32_t, uint16_t uint8_t
typedef int32_t i_t; // int64_t, int32_t, int16_t int8_t
const u_t B = 2;
const i_t X = -1;
const i_t A1 = X * B;
std::cout << "A1 = X * B = " << A1 << "\n";
const i_t C = A1 / B; // signed / unsigned division
std::cout << "A1 / B = " << C << "\n";
}
with different rank combinations of u_t and i_t and found that it works correctly for any combination, EXCEPT for 32 and 64 bits (int64_t/uint64_t and int32_t/uint32_t). So the second rule DOES NOT work for 16 and 8 bits.
Note: the multiplication operation is working correct for all cases. So it is only division problem.
Also the SECOND rule sounds like it is wrong:
the signed operand is converted to the unsigned operand's type
The signed cannot be converted to unsigned - it is an !! error !! for NEGATIVE values!!
But opposite conversion is correct - the unsigned operand is converted to the signed operand's type
Looking at this I can note that here is a possible mistake in the C++ Standard Arithmetic operations.
Instead of:
Otherwise, if the unsigned operand's conversion rank is greater or equal to the conversion rank of the signed operand, the signed operand is converted to the unsigned operand's type.
it SHALL be:
Otherwise, if the signed operand's conversion rank is greater or equal to the conversion rank of the unsigned operand, the unsigned operand is converted to the signed operand's type.
On my opinion, if signed and unsigned multiplication/division is met then unsigned operand is converted to signed and after that it is casted to correct rank. At least the x86 Assembler follows it.
Please, explain me where here is an error. I want the first test in this post works correct for any type involved in place of the auto type, but now it is not possible and the C++ Standard tells that it is correct behavior.
Sorry for a strange question, but I am in a stuck with the problem. I am coding on C/C++ for 30 years but it is first problem I cannot explain it clearly - whether it is a bug or an expected behavior.
There's a lot to chew on here... I'll address only one point as you forgot to actually ask a question.
In your second code snippet:
const u_t B = 2;
const i_t X = -1;
const i_t A1 = X * B;
you see than A1 is -2 and conclude that in the expression X * B both operands are promoted to signed integers. This is not true.
In X * B, both operands are promoted to unsigned integers, as per the Standard, but its result is then converted to a signed integer with the affectation const i_t A1 = ....
You can easily check that:
const u_t B = 2;
const i_t X = -1;
const auto A1 = X * B; // unsigned
You can also play with decltype(expression) and std::is_signed:
#include <iostream>
#include <iomanip>
#include <type_traits>
int main()
{
signed s = 1;
unsigned u = 1;
std::cout << std::boolalpha
<< " signed * signed is signed? " << std::is_signed_v<decltype(s * s)> << "\n"
<< " signed * unsigned is signed? " << std::is_signed_v<decltype(s * u)> << "\n"
<< "unsigned * signed is signed? " << std::is_signed_v<decltype(u * s)> << "\n"
<< "unsigned * unsigned is signed? " << std::is_signed_v<decltype(u * u)> << "\n";
}
/*
signed * signed is signed? true
signed * unsigned is signed? false
unsigned * signed is signed? false
unsigned * unsigned is signed? false
*/
demo

Bit wise '&' with signed vs unsigned operand

I faced an interesting scenario in which I got different results depending on the right operand type, and I can't really understand the reason for it.
Here is the minimal code:
#include <iostream>
#include <cstdint>
int main()
{
uint16_t check = 0x8123U;
uint64_t new_check = (check & 0xFFFF) << 16;
std::cout << std::hex << new_check << std::endl;
new_check = (check & 0xFFFFU) << 16;
std::cout << std::hex << new_check << std::endl;
return 0;
}
I compiled this code with g++ (gcc version 4.5.2) on Linux 64bit: g++ -std=c++0x -Wall example.cpp -o example
The output was:
ffffffff81230000
81230000
I can't really understand the reason for the output in the first case.
Why at some point would any of the temporal calculation results be promoted to a signed 64bit value (int64_t) resulting in the sign extension?
I would accept a result of '0' in both cases if a 16bit value is shifted 16 bits left in the first place and then promoted to a 64bit value. I also do accept the second output if the compiler first promotes the check to uint64_t and then performs the other operations.
But how come & with 0xFFFF (int32_t) vs. 0xFFFFU (uint32_t) would result in those two different outputs?
That's indeed an interesting corner case. It only occurs here because you use uint16_t for the unsigned type when you architecture use 32 bits for ìnt
Here is a extract from Clause 5 Expressions from draft n4296 for C++14 (emphasize mine):
10 Many binary operators that expect operands of arithmetic or enumeration type cause conversions ...
This pattern is called the usual arithmetic conversions, which are defined as follows:
...(10.5.3) — Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the
rank of the type of the other operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type.
(10.5.4) — Otherwise, if the type of the operand with signed integer type can represent all of the values of
the type of the operand with unsigned integer type, the operand with unsigned integer type shall
be converted to the type of the operand with signed integer type.
You are in the 10.5.4 case:
uint16_t is only 16 bits while int is 32
int can represent all the values of uint16_t
So the uint16_t check = 0x8123U operand is converted to the signed 0x8123 and result of the bitwise & is still 0x8123.
But the shift (bitwise so it happens at the representation level) causes the result to be the intermediate unsigned 0x81230000 which converted to an int gives a negative value (technically it is implementation defined, but this conversion is a common usage)
5.8 Shift operators [expr.shift]...Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable
in the corresponding unsigned type of the result type, then that value, converted to the result type, is the
resulting value;...
and
4.7 Integral conversions [conv.integral]...
3 If the destination type is signed, the value is unchanged if it can be represented in the destination type;
otherwise, the value is implementation-defined.
(beware this was true undefined behaviour in C++11...)
So you end with a conversion of the signed int 0x81230000 to an uint64_t which as expected gives 0xFFFFFFFF81230000, because
4.7 Integral conversions [conv.integral]...
2 If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type).
TL/DR: There is no undefined behaviour here, what causes the result is the conversion of signed 32 bits int to unsigned 64 bits int. The only part part that is undefined behaviour is a shift that would cause a sign overflow but all common implementations share this one and it is implementation defined in C++14 standard.
Of course, if you force the second operand to be unsigned everything is unsigned and you get evidently the correct 0x81230000 result.
[EDIT] As explained by MSalters, the result of the shift is only implementation defined since C++14, but was indeed undefined behaviour in C++11. The shift operator paragraph said:
...Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable
in the result type, then that is the resulting value; otherwise, the behavior is undefined.
Let's take a look at
uint64_t new_check = (check & 0xFFFF) << 16;
Here, 0xFFFF is a signed constant, so (check & 0xFFFF) gives us a signed integer by the rules of integer promotion.
In your case, with 32-bit int type, the MSbit for this integer after the left shift is 1, and so the extension to 64-bit unsigned will do a sign extension, filling the bits to the left with 1's. Interpreted as a two's complement representation that gives the same negative value.
In the second case, 0xFFFFU is unsigned, so we get unsigned integers and the left shift operator works as expected.
If your toolchain supports __PRETTY_FUNCTION__, a most-handy feature, you can quickly determine how the compiler perceives expression types:
#include <iostream>
#include <cstdint>
template<typename T>
void typecheck(T const& t)
{
std::cout << __PRETTY_FUNCTION__ << '\n';
std::cout << t << '\n';
}
int main()
{
uint16_t check = 0x8123U;
typecheck(0xFFFF);
typecheck(check & 0xFFFF);
typecheck((check & 0xFFFF) << 16);
typecheck(0xFFFFU);
typecheck(check & 0xFFFFU);
typecheck((check & 0xFFFFU) << 16);
return 0;
}
Output
void typecheck(const T &) [T = int]
65535
void typecheck(const T &) [T = int]
33059
void typecheck(const T &) [T = int]
-2128412672
void typecheck(const T &) [T = unsigned int]
65535
void typecheck(const T &) [T = unsigned int]
33059
void typecheck(const T &) [T = unsigned int]
2166554624
The first thing to realize is that binary operators like a&b for built-in types only work if both sides have the same type. (With user-defined types and overloads, anything goes). This might be realized via implicit conversions.
Now, in your case, there definitely is such a conversion, because there simply isn't a binary operator & that takes a type smaller than int. Both sides are converted to at least int size, but what exact types?
As it happens, on your GCC int is indeed 32 bits. This is important, because it means that all values of uint16_t can be represented as an int. There is no overflow.
Hence, check & 0xFFFF is a simple case. The right side is already an int, the left side promotes to int, so the result is int(0x8123). This is perfectly fine.
Now, the next operation is 0x8123 << 16. Remember, on your system int is 32 bits, and INT_MAX is 0x7FFF'FFFF. In the absence of overflow, 0x8123 << 16 would be 0x81230000, but that clearly is bigger than INT_MAX so there is in fact overflow.
Signed integer overflow in C++11 is Undefined Behavior. Literally any outcome is correct, including purple or no output at all. At least you got a numerical value, but GCC is known to outright eliminate code paths which unavoidably cause overflow.
[edit]
Newer GCC versions support C++14, where this particular form of overflow has become implementation-defined - see Serge's answer.
0xFFFF is a signed int. So after the & operation, we have a 32-bit signed value:
#include <stdint.h>
#include <type_traits>
uint64_t foo(uint16_t a) {
auto x = (a & 0xFFFF);
static_assert(std::is_same<int32_t, decltype(x)>::value, "not an int32_t")
static_assert(std::is_same<uint16_t, decltype(x)>::value, "not a uint16_t");
return x;
}
http://ideone.com/tEQmbP
Your original 16 bits are then left-shifted which results in 32-bit value with the high-bit set (0x80000000U) so it has a negative value. During the 64-bit conversion sign-extension occurs, populating the upper words with 1s.
This is the result of integer promotion. Before the & operation happens, if the operands are "smaller" than an int (for that architecture), compiler will promote both operands to int, because they both fit into a signed int:
This means that the first expression will be equivalent to (on a 32-bit architecture):
// check is uint16_t, but it fits into int32_t.
// the constant is signed, so it's sign-extended into an int
((int32_t)check & (int32_t)0xFFFFFFFF)
while the other one will have the second operand promoted to:
// check is uint16_t, but it fits into int32_t.
// the constant is unsigned, so the upper 16 bits are zero
((int32_t)check & (int32_t)0x0000FFFFU)
If you explicitly cast check to an unsigned int, then the result will be the same in both cases (unsigned * signed will result in unsigned):
((uint32_t)check & 0xFFFF) << 16
will be equal to:
((uint32_t)check & 0xFFFFU) << 16
Your platform has 32-bit int.
Your code is exactly equivalent to
#include <iostream>
#include <cstdint>
int main()
{
uint16_t check = 0x8123U;
auto a1 = (check & 0xFFFF) << 16
uint64_t new_check = a1;
std::cout << std::hex << new_check << std::endl;
auto a2 = (check & 0xFFFFU) << 16;
new_check = a2;
std::cout << std::hex << new_check << std::endl;
return 0;
}
What's the type of a1 and a2?
For a2, the result is promoted to unsigned int.
More interestingly, for a1 the result is promoted to int, and then it gets sign-extended as it's widened to uint64_t.
Here's a shorter demonstration, in decimal so that the difference between signed and unsigned types is apparent:
#include <iostream>
#include <cstdint>
int main()
{
uint16_t check = 0;
std::cout << check
<< " " << (int)(check + 0x80000000)
<< " " << (uint64_t)(int)(check + 0x80000000) << std::endl;
return 0;
}
On my system (also 32-bit int), I get
0 -2147483648 18446744071562067968
showing where the promotion and sign-extension happens.
The & operation has two operands. The first is an unsigned short, which will undergo the usual promotions to become an int. The second is a constant, in one case of type int, in the other case of type unsigned int. The result of the & is therefore int in one case, unsigned int in the other case. That value is shifted to the left, resulting either in an int with the sign bit set, or an unsigned int. Casting a negative int to uint64_t will give a large negative integer.
Of course you should always follow the rule: If you do something, and you don't understand the result, then don't do that!