I'm running this piece of code, and I'm getting the output value as (converted to hex) 0xFFFFFF93 and 0xFFFFFF94.
#include <iostream>
using namespace std;
int main()
{
char x = 0x91;
char y = 0x02;
unsigned out;
out = x + y;
cout << out << endl;
out = x + y + 1;
cout << out << endl;
}
I'm confused about the arithmetic going on here. Is it because all the higher bits in out are taken to be 1 by default?
When I typecast out to an int, I get the answers as (in int) -109 and -108. Any idea why this is happening?
So there are a couple of things going on here. One, char can be either signed or unsigned, in your case it is signed. Two assignment will covert the right hand side to the type of the left hand side. Using the right warning flags would help, clang with the -Wconversion flags warns:
warning: implicit conversion changes signedness: 'int' to 'unsigned int' [-Wsign-conversion]
out = x + y;
~ ~~^~~
In this case to do this conversion it will basically add or subtract the unsigned max + 1 to the number to be converted.
We can see the same results using the limits header:
#include <limits>
//....
std::cout << std::hex << (std::numeric_limits<unsigned>::max() + 1) + (x+y) << std::endl ;
//...
and the result is:
ffffff93
For reference the draft C++ standard section 5.17 Assignment and compound assignment operators says:
If the left operand is not of class type, the expression is implicitly converted (Clause 4) to the cv-unqualified type of the left operand.
Clause 4 under 4.7 Integral conversions says:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). —end note ]
which is equivalent to adding or subtracting UMAX + 1.
A plain char usually also represents a signed type! Since compatibility reasons with C syntax, this isn't specified further, and may be compiler implementation dependent. You always can make it distinct signed arithmetic behavior, by explicitly specifying the signed / unsigned keywords.
Try replacing your char definitions like this
unsigned char x = 0x91;
unsigned char y = 0x02;
to get the results you expect!
See the fully working sample here.
The negative numbers are represented internally as 2's complement and hence, their first bit is a 1. When you work in hex (and print in hex), the significant bits are displayed as 1's leading to numbers like you showed.
C++ doesn't specify whether char is signed or unsigned. Here they are signed, so when they are promoted to int's, the negative value is used which is then converted to unsigned. Use or cast to unsigned char.
Related
I have a simple program. Notice that I use an unsigned fixed-width integer 1 byte in size.
#include <cstdint>
#include <iostream>
#include <limits>
int main()
{
uint8_t x = 12;
std::cout << (x << 1) << '\n';
std::cout << ~x;
std::cin.clear();
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::cin.get();
return 0;
}
My output is the following.
24
-13
I tested larger numbers and operator << always gives me positive numbers, while operator ~ always gives me negative numbers. I then used sizeof() and found...
When I use the left shift bitwise operator(<<), I receive an unsigned 4 byte integer.
When I use the bitwise not operator(~), I receive a signed 4 byte integer.
It seems that the bitwise not operator(~) does a signed integral promotion like the arithmetic operators do. However, the left shift operator(<<) seems to promote to an unsigned integral.
I feel obligated to know when the compiler is changing something behind my back. If I'm correct in my analysis, do all the bitwise operators promote to a 4 byte integer? And why are some signed and some unsigned? I'm so confused!
Edit: My assumption of always getting positive or always getting negative values was wrong. But from being wrong, I understand what was really happening thanks to the great answers below.
[expr.unary.op]
The operand of ~ shall have integral or unscoped enumeration type; the
result is the one’s complement of its operand. Integral promotions are
performed.
[expr.shift]
The shift operators << and >> group left-to-right. [...] The operands shall be of integral or unscoped enumeration type and integral promotions are performed.
What's the integral promotion of uint8_t (which is usually going to be unsigned_char behind the scenes)?
[conv.prom]
A prvalue of an integer type other than bool, char16_t, char32_t, or
wchar_t whose integer conversion rank (4.13) is less than the rank of
int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be
converted to a prvalue of type unsigned int.
So int, because all of the values of a uint8_t can be represented by int.
What is int(12) << 1 ? int(24).
What is ~int(12) ? int(-13).
For performance reasons the C and C++ language consider int to be the "most natural" integer type and instead types that are "smaller" than an int are considered sort of "storage" type.
When you use a storage type in an expression it gets automatically converted to an int or to an unsigned int implicitly. For example:
// Assume a char is 8 bit
unsigned char x = 255;
unsigned char one = 1;
int y = x + one; // result will be 256 (too large for a byte!)
++x; // x is now 0
what happened is that x and one in the first expression have been implicitly converted to integers, the addition has been computed and the result has been stored back in an integer. In other words the computation has NOT been performed using two unsigned chars.
Likewise if you have a float value in an expression the first thing the compiler will do is promoting it to a double (in other words float is a storage type and double is instead the natural size for floating point numbers). This is the reason for which if you use printf to print floats you don't need to say %lf int the format strings and %f is enough (%lf is needed for scanf however because that function stores a result and a float can be smaller than a double).
C++ complicated the matter quite a bit because when passing parameters to functions you can discriminate between ints and smaller types. Thus it's not ALWAYS true that a conversion is performed in every expression... for example you can have:
void foo(unsigned char x);
void foo(int x);
where
unsigned char x = 255, one = 1;
foo(x); // Calls foo(unsigned char), no promotion
foo(x + one); // Calls foo(int), promotion of both x and one to int
I tested larger numbers and operator << always gives me positive
numbers, while operator ~ always gives me negative numbers. I then
used sizeof() and found...
Wrong, test it:
uint8_t v = 1;
for (int i=0; i<32; i++) cout << (v<<i) << endl;
gives:
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
2097152
4194304
8388608
16777216
33554432
67108864
134217728
268435456
536870912
1073741824
-2147483648
uint8_t is an 8-bit long unsigned integer type, which can represent values in the range [0,255], as that range in included in the range of int it is promoted to int (not unsigned int). Promotion to int has precedence over promotion to unsigned.
Look into two's complement and how computer stores negative integers.
Try this
#include <cstdint>
#include <iostream>
#include <limits>
int main()
{
uint8_t x = 1;
int shiftby=0;
shiftby=8*sizeof(int)-1;
std::cout << (x << shiftby) << '\n'; // or std::cout << (x << 31) << '\n';
std::cout << ~x;
std::cin.clear();
std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
std::cin.get();
return 0;
}
The output is -2147483648
In general if the first bit of a signed number is 1 it is considered negative. when you take a large number and shift it. If you shift it so that the first bit is 1 it will be negative
** EDIT **
Well I can think of a reason why shift operators would use unsigned int. Consider right shift operation >> if you right shift -12 you will get 122 instead of -6. This is because it adds a zero in the beginning without considering the sign
I faced an interesting scenario in which I got different results depending on the right operand type, and I can't really understand the reason for it.
Here is the minimal code:
#include <iostream>
#include <cstdint>
int main()
{
uint16_t check = 0x8123U;
uint64_t new_check = (check & 0xFFFF) << 16;
std::cout << std::hex << new_check << std::endl;
new_check = (check & 0xFFFFU) << 16;
std::cout << std::hex << new_check << std::endl;
return 0;
}
I compiled this code with g++ (gcc version 4.5.2) on Linux 64bit: g++ -std=c++0x -Wall example.cpp -o example
The output was:
ffffffff81230000
81230000
I can't really understand the reason for the output in the first case.
Why at some point would any of the temporal calculation results be promoted to a signed 64bit value (int64_t) resulting in the sign extension?
I would accept a result of '0' in both cases if a 16bit value is shifted 16 bits left in the first place and then promoted to a 64bit value. I also do accept the second output if the compiler first promotes the check to uint64_t and then performs the other operations.
But how come & with 0xFFFF (int32_t) vs. 0xFFFFU (uint32_t) would result in those two different outputs?
That's indeed an interesting corner case. It only occurs here because you use uint16_t for the unsigned type when you architecture use 32 bits for ìnt
Here is a extract from Clause 5 Expressions from draft n4296 for C++14 (emphasize mine):
10 Many binary operators that expect operands of arithmetic or enumeration type cause conversions ...
This pattern is called the usual arithmetic conversions, which are defined as follows:
...(10.5.3) — Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the
rank of the type of the other operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type.
(10.5.4) — Otherwise, if the type of the operand with signed integer type can represent all of the values of
the type of the operand with unsigned integer type, the operand with unsigned integer type shall
be converted to the type of the operand with signed integer type.
You are in the 10.5.4 case:
uint16_t is only 16 bits while int is 32
int can represent all the values of uint16_t
So the uint16_t check = 0x8123U operand is converted to the signed 0x8123 and result of the bitwise & is still 0x8123.
But the shift (bitwise so it happens at the representation level) causes the result to be the intermediate unsigned 0x81230000 which converted to an int gives a negative value (technically it is implementation defined, but this conversion is a common usage)
5.8 Shift operators [expr.shift]...Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable
in the corresponding unsigned type of the result type, then that value, converted to the result type, is the
resulting value;...
and
4.7 Integral conversions [conv.integral]...
3 If the destination type is signed, the value is unchanged if it can be represented in the destination type;
otherwise, the value is implementation-defined.
(beware this was true undefined behaviour in C++11...)
So you end with a conversion of the signed int 0x81230000 to an uint64_t which as expected gives 0xFFFFFFFF81230000, because
4.7 Integral conversions [conv.integral]...
2 If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type).
TL/DR: There is no undefined behaviour here, what causes the result is the conversion of signed 32 bits int to unsigned 64 bits int. The only part part that is undefined behaviour is a shift that would cause a sign overflow but all common implementations share this one and it is implementation defined in C++14 standard.
Of course, if you force the second operand to be unsigned everything is unsigned and you get evidently the correct 0x81230000 result.
[EDIT] As explained by MSalters, the result of the shift is only implementation defined since C++14, but was indeed undefined behaviour in C++11. The shift operator paragraph said:
...Otherwise, if E1 has a signed type and non-negative value, and E1×2E2 is representable
in the result type, then that is the resulting value; otherwise, the behavior is undefined.
Let's take a look at
uint64_t new_check = (check & 0xFFFF) << 16;
Here, 0xFFFF is a signed constant, so (check & 0xFFFF) gives us a signed integer by the rules of integer promotion.
In your case, with 32-bit int type, the MSbit for this integer after the left shift is 1, and so the extension to 64-bit unsigned will do a sign extension, filling the bits to the left with 1's. Interpreted as a two's complement representation that gives the same negative value.
In the second case, 0xFFFFU is unsigned, so we get unsigned integers and the left shift operator works as expected.
If your toolchain supports __PRETTY_FUNCTION__, a most-handy feature, you can quickly determine how the compiler perceives expression types:
#include <iostream>
#include <cstdint>
template<typename T>
void typecheck(T const& t)
{
std::cout << __PRETTY_FUNCTION__ << '\n';
std::cout << t << '\n';
}
int main()
{
uint16_t check = 0x8123U;
typecheck(0xFFFF);
typecheck(check & 0xFFFF);
typecheck((check & 0xFFFF) << 16);
typecheck(0xFFFFU);
typecheck(check & 0xFFFFU);
typecheck((check & 0xFFFFU) << 16);
return 0;
}
Output
void typecheck(const T &) [T = int]
65535
void typecheck(const T &) [T = int]
33059
void typecheck(const T &) [T = int]
-2128412672
void typecheck(const T &) [T = unsigned int]
65535
void typecheck(const T &) [T = unsigned int]
33059
void typecheck(const T &) [T = unsigned int]
2166554624
The first thing to realize is that binary operators like a&b for built-in types only work if both sides have the same type. (With user-defined types and overloads, anything goes). This might be realized via implicit conversions.
Now, in your case, there definitely is such a conversion, because there simply isn't a binary operator & that takes a type smaller than int. Both sides are converted to at least int size, but what exact types?
As it happens, on your GCC int is indeed 32 bits. This is important, because it means that all values of uint16_t can be represented as an int. There is no overflow.
Hence, check & 0xFFFF is a simple case. The right side is already an int, the left side promotes to int, so the result is int(0x8123). This is perfectly fine.
Now, the next operation is 0x8123 << 16. Remember, on your system int is 32 bits, and INT_MAX is 0x7FFF'FFFF. In the absence of overflow, 0x8123 << 16 would be 0x81230000, but that clearly is bigger than INT_MAX so there is in fact overflow.
Signed integer overflow in C++11 is Undefined Behavior. Literally any outcome is correct, including purple or no output at all. At least you got a numerical value, but GCC is known to outright eliminate code paths which unavoidably cause overflow.
[edit]
Newer GCC versions support C++14, where this particular form of overflow has become implementation-defined - see Serge's answer.
0xFFFF is a signed int. So after the & operation, we have a 32-bit signed value:
#include <stdint.h>
#include <type_traits>
uint64_t foo(uint16_t a) {
auto x = (a & 0xFFFF);
static_assert(std::is_same<int32_t, decltype(x)>::value, "not an int32_t")
static_assert(std::is_same<uint16_t, decltype(x)>::value, "not a uint16_t");
return x;
}
http://ideone.com/tEQmbP
Your original 16 bits are then left-shifted which results in 32-bit value with the high-bit set (0x80000000U) so it has a negative value. During the 64-bit conversion sign-extension occurs, populating the upper words with 1s.
This is the result of integer promotion. Before the & operation happens, if the operands are "smaller" than an int (for that architecture), compiler will promote both operands to int, because they both fit into a signed int:
This means that the first expression will be equivalent to (on a 32-bit architecture):
// check is uint16_t, but it fits into int32_t.
// the constant is signed, so it's sign-extended into an int
((int32_t)check & (int32_t)0xFFFFFFFF)
while the other one will have the second operand promoted to:
// check is uint16_t, but it fits into int32_t.
// the constant is unsigned, so the upper 16 bits are zero
((int32_t)check & (int32_t)0x0000FFFFU)
If you explicitly cast check to an unsigned int, then the result will be the same in both cases (unsigned * signed will result in unsigned):
((uint32_t)check & 0xFFFF) << 16
will be equal to:
((uint32_t)check & 0xFFFFU) << 16
Your platform has 32-bit int.
Your code is exactly equivalent to
#include <iostream>
#include <cstdint>
int main()
{
uint16_t check = 0x8123U;
auto a1 = (check & 0xFFFF) << 16
uint64_t new_check = a1;
std::cout << std::hex << new_check << std::endl;
auto a2 = (check & 0xFFFFU) << 16;
new_check = a2;
std::cout << std::hex << new_check << std::endl;
return 0;
}
What's the type of a1 and a2?
For a2, the result is promoted to unsigned int.
More interestingly, for a1 the result is promoted to int, and then it gets sign-extended as it's widened to uint64_t.
Here's a shorter demonstration, in decimal so that the difference between signed and unsigned types is apparent:
#include <iostream>
#include <cstdint>
int main()
{
uint16_t check = 0;
std::cout << check
<< " " << (int)(check + 0x80000000)
<< " " << (uint64_t)(int)(check + 0x80000000) << std::endl;
return 0;
}
On my system (also 32-bit int), I get
0 -2147483648 18446744071562067968
showing where the promotion and sign-extension happens.
The & operation has two operands. The first is an unsigned short, which will undergo the usual promotions to become an int. The second is a constant, in one case of type int, in the other case of type unsigned int. The result of the & is therefore int in one case, unsigned int in the other case. That value is shifted to the left, resulting either in an int with the sign bit set, or an unsigned int. Casting a negative int to uint64_t will give a large negative integer.
Of course you should always follow the rule: If you do something, and you don't understand the result, then don't do that!
Consider the following code:
int32_t x = -2;
cout << uint64_t(x) << endl;
The cast in the second line contains basically two atomic steps. The increase in bitwidth from 32 bits to 64 bits and the change of interpretation from signed to unsigned. If one compiles this with g++ and executes, one gets 18446744073709551614. This suggests that the increase in bitwidth is processed first (as a signed extension) and the change in signed/unsigned interpretation thereafter, i.e. that the code above is equivalent to writing:
int32_t x = -2;
cout << uint64_t(int64_t(x)) << endl;
What confuses me that one could also first interpret x as an unsigned 32-bit bitvector first and then zero-extend it to 64-bit, i.e.
int32_t x = -2;
cout << uint64_t(uint32_t(x)) << endl;
This would yield 4294967294. Would someone please confirm that the behavior of g++ is required by the standard and is not implementation defined? I would be most excited if you could refer me to the norm in the standard that actually concerns the issue at hand. I tried to do so but failed bitterly.
Thanks in advance!
You are looking for Standard section 4.7. In particular, paragraph 2 says:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type).
In the given example, we have that 18446744073709551614 = -2 mod 264.
As said by #aschepler, standard 4.7 §2 (Integral conversions) ensures that the result will be least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type)
So in your case, it will be 0xFFFFFFFFFFFFFFFE == 18446744073709551614
But this is a one step conversion as specified by the standard (what compiler actually does is out of scope)
If you want first unsigned conversion to uint32_t and then conversion to uint64_t, you have to specify 2 conversions : static_cast<uint64_t>(static_cast<uint32_t>(-2))
Per 4.7 §2, first will give 0xFFFFFFFE = 4294967294 but as this number is already a valid uint64_t it is unchanged by the second conversion.
What you observed is required by the standard and will be observable on any conformant compiler (provided uint32_t and uint64_t are defined, because this part is not required ...)
This is an old question but I recently came into this problem. I was using char, which happened to be signed in my computer. I wanted to multiply two values by
char a, b;
uint16 ans = uint16(a) * uint16(b);
However, because of the conversion, when a < 0, the answer is wrong.
Since the signedness of char is implementation-dependent, maybe we should use uint8 instead of char whenever possible.
If got the following code:
uint8_t value = 0xF0;
uint16_t shift = 0x10;
uint32_t result = value << shift;
cout << "The Result is: " << result << endl;
I expected that the the output to be 0 but instead it was 15728640.
My Questions
Isn't the expresion evaluated => 0 and after that implicitly converted
to the type of result?
Is this behaviour standard?
Where can I get further information about this kind of behaviour?
When you write value << shift, value is implicitly converted to int before the shift. This is called "integer promotion". No other rules apply to this particular case.
Integer promotion applies to any type whose range fits within an int. If it can't fit in an int, then unsigned int is tried. If it can't fit in either, then integer promotion does nothing.
For more information... well, I just read the spec.
A shift operation involves an "integer promotion", which means to convert types narrower than int to int. The promotion is done before the shift, so the behavior you are seeing is expected.
If you're working on a 32 or 64 bit word machine, the shift will be in a 32 or 64 bit register. That is, the 8 bit "value" is loaded into the register, shifted, and then (if necessary) truncated to the final variable.
That's why the length of "int" is implementation dependent.
In the code below, why 1-byte anUChar is automatically converted into 4 bytes to produce the desired result 0x300 (instead of 0x0 if anUChar would remain 1 byte in size):
unsigned char anUChar = 0xc0; // only the two most significant bits are set
int anInt = anUChar << 2; // 0x300 (correct)
But in this code, aimed at a 64-bit result, no automatic conversion into 8 bytes happens:
unsigned int anUInt = 0xc0000000; // only the two most significant bits are set
long long aLongLong = anUInt << 2; // 0x0 (wrong, means no conversion occurred)
And only placing an explicit type cast works:
unsigned int anUInt = 0xc0000000;
long long aLongLong = (long long)anUInt << 2; // 0x300000000 (correct)
And most importantly, would this behavior be the same in a program that targets 64-bit machines?
By the way, which of the two is most right and portable: (type)var << 1 or ((type)var) << 1?
char always gets promoted to int during arithmetic. I think this is specified behavior in the C standard.
However, int is not automatically promoted to long long.
Under some situations, some compilers (Visual Studio) will actually warn you about this if you try to left-shift a smaller integer and store it into a larger one.
By the way, which of the two is most right and portable: (type)var <<
1 or ((type)var) << 1?
Both are fine and portable. Though I prefer the first one since it's shorter. Casting has higher precedence than shift.
Conversion does happen. The problem is the result of the expression anUInt << 2 is an unsigned int because anUInt is an unsigned int.
Casting anUInt to a long long (actually, this is conversion in this particular case) is the correct thing to do.
Neither (type)var << 1 or ((type)var) << 1 is more correct or portable because operator precedence is strictly defined by the Standard. However, the latter is probably better because it's easier to understand to humans looking at the code casually. Others may disagree with this assertion.
EDIT:
Note that in your first example:
unsigned char anUChar = 0xc0;
int anInt = anUChar << 2;
...the result of the expression anUChar << 2 is not an unsigned char as you might expect, but an int because of Integral Promotion.
The operands of operator<< are integral or enumeration type (See Standard 5.8/1). When a binary operator that expects operands of arithmetic or enumeration type is called, the compiler attempts to convert both operands to the same type, so that the expression may yield a common type. In this case, integral promotion is performed on both operands (5/9). When an unsigned char takes part in integral promotion, it will be converted to an int if your platform can accomodate all possible values of unsigned char in an int, else it will be converted to an unsigned int (4.5/1).
Shorter integral types are promoted to an int type for bitshift operations. This has nothing to do with the type to which you assign the result of the shift.
On 64-bit machines, your second piece of code would be equally problematic since the int types are usually also 32 bit wide. (Between x86 and x64, long long int is typically always 64 and int 32 bits, only long int depends on the platform.)
(In the spirit of C++, I would write the conversion as (unsigned long long int)(anUInt) << 2, evocative of the conversion-constructor syntax. The first set of parentheses is purely because the type name consists of several tokens.)
I would also prefer to do bitshifting exclusively on unsigned types, because only unsigned types can be considered equivalent (in terms of values) to their own bit pattern value.
Because of integer promotions. For most operators (e.g. <<), char operands are promoted to int first.
This has nothing to do with where the result of the calculation is going. In other words, the fact that your second example assigns to a long long does not affect the promotion of the operands.