Casting both bitwidth and signed/unsigned, which conversion is executed first? - c++

Consider the following code:
int32_t x = -2;
cout << uint64_t(x) << endl;
The cast in the second line contains basically two atomic steps. The increase in bitwidth from 32 bits to 64 bits and the change of interpretation from signed to unsigned. If one compiles this with g++ and executes, one gets 18446744073709551614. This suggests that the increase in bitwidth is processed first (as a signed extension) and the change in signed/unsigned interpretation thereafter, i.e. that the code above is equivalent to writing:
int32_t x = -2;
cout << uint64_t(int64_t(x)) << endl;
What confuses me that one could also first interpret x as an unsigned 32-bit bitvector first and then zero-extend it to 64-bit, i.e.
int32_t x = -2;
cout << uint64_t(uint32_t(x)) << endl;
This would yield 4294967294. Would someone please confirm that the behavior of g++ is required by the standard and is not implementation defined? I would be most excited if you could refer me to the norm in the standard that actually concerns the issue at hand. I tried to do so but failed bitterly.
Thanks in advance!

You are looking for Standard section 4.7. In particular, paragraph 2 says:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type).
In the given example, we have that 18446744073709551614 = -2 mod 264.

As said by #aschepler, standard 4.7 §2 (Integral conversions) ensures that the result will be least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type)
So in your case, it will be 0xFFFFFFFFFFFFFFFE == 18446744073709551614
But this is a one step conversion as specified by the standard (what compiler actually does is out of scope)
If you want first unsigned conversion to uint32_t and then conversion to uint64_t, you have to specify 2 conversions : static_cast<uint64_t>(static_cast<uint32_t>(-2))
Per 4.7 §2, first will give 0xFFFFFFFE = 4294967294 but as this number is already a valid uint64_t it is unchanged by the second conversion.
What you observed is required by the standard and will be observable on any conformant compiler (provided uint32_t and uint64_t are defined, because this part is not required ...)

This is an old question but I recently came into this problem. I was using char, which happened to be signed in my computer. I wanted to multiply two values by
char a, b;
uint16 ans = uint16(a) * uint16(b);
However, because of the conversion, when a < 0, the answer is wrong.
Since the signedness of char is implementation-dependent, maybe we should use uint8 instead of char whenever possible.

Related

Is Sign Extension in C++ a compiler option, or compiler dependent or target dependent?

The following code has been compiled on 3 different compilers and 3 different processors and gave 2 different results:
typedef unsigned long int u32;
typedef signed long long s64;
int main ()
{ u32 Operand1,Operand2;
s64 Result;
Operand1=95;
Operand2=100;
Result= (s64)(Operand1-Operand2);}
Result produces 2 results:
either
-5 or 4294967291
I do understand that the operation of (Operand1-Operand2) is done in as 32-bit unsigned calculation, then when casted to s64 sign extension was done correctly in the first case but not done correctly for the 2nd case.
My question is whether the sign extension is possible to be controlled via compiler options, or it is compiler-dependent or maybe it is target-dependent.
Your issue is that you assume unsigned long int to be 32 bit wide and signed long long to be 64 bit wide. This assumption is wrong.
We can visualize what's going on by using types that have a guaranteed (by the standard) bit width:
int main() {
{
uint32_t large = 100, small = 95;
int64_t result = (small - large);
std::cout << "32 and 64 bits: " << result << std::endl;
} // 4294967291
{
uint32_t large = 100, small = 95;
int32_t result = (small - large);
std::cout << "32 and 32 bits: " << result << std::endl;
} // -5
{
uint64_t large = 100, small = 95;
int64_t result = (small - large);
std::cout << "64 and 64 bits: " << result << std::endl;
} // -5
return 0;
}
In every of these three cases, the expression small - large results in a result of unsigned integer type (of according width). This result is calculated using modular arithmetic.
In the first case, because that unsigned result can be stored in the wider signed integer, no conversion of the value is performed.
In the other cases the result cannot be stored in the signed integer. Thus an implementation defined conversion is performed, which usually means interpreting the bit pattern of the unsigned value as signed value. Because the result is "large", the highest bits will be set, which when treated as signed value (under two's complement) is equivalent to a "small" negative value.
To highlight the comment from Lưu Vĩnh Phúc:
Operand1-Operand2 is unsigned therefore when casting to s64 it's always zero extension. [..]
The sign extension is only done in the first case as only then there is a widening conversion, and it is indeed always zero extension.
Quotes from the standard, emphasis mine. Regarding small - large:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2^n$ where n is the number of bits used to represent the unsigned type). [..]
§ 4.7/2
Regarding the conversion from unsigned to signed:
If the destination type [of the integral conversion] is signed, the value is unchanged if it can be represented in the destination type; otherwise, the value is implementation-defined.
§ 4.7/3
Sign extension is platform dependent, where platform is a combination of a compiler, target hardware architecture and operating system.
Moreover, as Paul R mentioned, width of built-in types (like unsigned long) is platform-dependent too. Use types from <cstdint> to get fixed-width types. Nevertheless, they are just platform-dependent definitions, so their sign extension behavior still depends on the platform.
Here is a good almost-duplicate question about type sizes. And here is a good table about type size relations.
Type promotions, and the corresponding sign-extensions are specified by the C++ language.
What's not specified, but is platform-dependent, is the range of integer types provided. It's even Standard-compliant for char, short int, int, long int and long long int all to have the same range, provided that range satisfies the C++ Standard requirements for long long int. On such a platform, no widening or narrowing would ever happen, but signed<->unsigned conversion could still alter values.

Signed arithmetic

I'm running this piece of code, and I'm getting the output value as (converted to hex) 0xFFFFFF93 and 0xFFFFFF94.
#include <iostream>
using namespace std;
int main()
{
char x = 0x91;
char y = 0x02;
unsigned out;
out = x + y;
cout << out << endl;
out = x + y + 1;
cout << out << endl;
}
I'm confused about the arithmetic going on here. Is it because all the higher bits in out are taken to be 1 by default?
When I typecast out to an int, I get the answers as (in int) -109 and -108. Any idea why this is happening?
So there are a couple of things going on here. One, char can be either signed or unsigned, in your case it is signed. Two assignment will covert the right hand side to the type of the left hand side. Using the right warning flags would help, clang with the -Wconversion flags warns:
warning: implicit conversion changes signedness: 'int' to 'unsigned int' [-Wsign-conversion]
out = x + y;
~ ~~^~~
In this case to do this conversion it will basically add or subtract the unsigned max + 1 to the number to be converted.
We can see the same results using the limits header:
#include <limits>
//....
std::cout << std::hex << (std::numeric_limits<unsigned>::max() + 1) + (x+y) << std::endl ;
//...
and the result is:
ffffff93
For reference the draft C++ standard section 5.17 Assignment and compound assignment operators says:
If the left operand is not of class type, the expression is implicitly converted (Clause 4) to the cv-unqualified type of the left operand.
Clause 4 under 4.7 Integral conversions says:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). —end note ]
which is equivalent to adding or subtracting UMAX + 1.
A plain char usually also represents a signed type! Since compatibility reasons with C syntax, this isn't specified further, and may be compiler implementation dependent. You always can make it distinct signed arithmetic behavior, by explicitly specifying the signed / unsigned keywords.
Try replacing your char definitions like this
unsigned char x = 0x91;
unsigned char y = 0x02;
to get the results you expect!
See the fully working sample here.
The negative numbers are represented internally as 2's complement and hence, their first bit is a 1. When you work in hex (and print in hex), the significant bits are displayed as 1's leading to numbers like you showed.
C++ doesn't specify whether char is signed or unsigned. Here they are signed, so when they are promoted to int's, the negative value is used which is then converted to unsigned. Use or cast to unsigned char.

When does the implicit type conversion take place during an assignment

If got the following code:
uint8_t value = 0xF0;
uint16_t shift = 0x10;
uint32_t result = value << shift;
cout << "The Result is: " << result << endl;
I expected that the the output to be 0 but instead it was 15728640.
My Questions
Isn't the expresion evaluated => 0 and after that implicitly converted
to the type of result?
Is this behaviour standard?
Where can I get further information about this kind of behaviour?
When you write value << shift, value is implicitly converted to int before the shift. This is called "integer promotion". No other rules apply to this particular case.
Integer promotion applies to any type whose range fits within an int. If it can't fit in an int, then unsigned int is tried. If it can't fit in either, then integer promotion does nothing.
For more information... well, I just read the spec.
A shift operation involves an "integer promotion", which means to convert types narrower than int to int. The promotion is done before the shift, so the behavior you are seeing is expected.
If you're working on a 32 or 64 bit word machine, the shift will be in a 32 or 64 bit register. That is, the 8 bit "value" is loaded into the register, shifted, and then (if necessary) truncated to the final variable.
That's why the length of "int" is implementation dependent.

Curious arithmetic error- 255x256x256x256=18446744073692774400

I encountered a strange thing when I was programming under c++. It's about a simple multiplication.
Code:
unsigned __int64 a1 = 255*256*256*256;
unsigned __int64 a2= 255 << 24; // same as the above
cerr()<<"a1 is:"<<a1;
cerr()<<"a2 is:"<<a2;
interestingly the result is:
a1 is: 18446744073692774400
a2 is: 18446744073692774400
whereas it should be:(using calculator confirms)
4278190080
Can anybody tell me how could it be possible?
255*256*256*256
all operands are int you are overflowing int. The overflow of a signed integer is undefined behavior in C and C++.
EDIT:
note that the expression 255 << 24 in your second declaration also invokes undefined behavior if your int type is 32-bit. 255 x (2^24) is 4278190080 which cannot be represented in a 32-bit int (the maximum value is usually 2147483647 on a 32-bit int in two's complement representation).
C and C++ both say for E1 << E2 that if E1 is of a signed type and positive and that E1 x (2^E2) cannot be represented in the type of E1, the program invokes undefined behavior. Here ^ is the mathematical power operator.
Your literals are int. This means that all the operations are actually performed on int, and promptly overflow. This overflowed value, when converted to an unsigned 64bit int, is the value you observe.
It is perhaps worth explaining what happened to produce the number 18446744073692774400. Technically speaking, the expressions you wrote trigger "undefined behavior" and so the compiler could have produced anything as the result; however, assuming int is a 32-bit type, which it almost always is nowadays, you'll get the same "wrong" answer if you write
uint64_t x = (int) (255u*256u*256u*256u);
and that expression does not trigger undefined behavior. (The conversion from unsigned int to int involves implementation-defined behavior, but as nobody has produced a ones-complement or sign-and-magnitude CPU in many years, all implementations you are likely to encounter define it exactly the same way.) I have written the cast in C style because everything I'm saying here applies equally to C and C++.
First off, let's look at the multiplication. I'm writing the right hand side in hex because it's easier to see what's going on that way.
255u * 256u = 0x0000FF00u
255u * 256u * 256u = 0x00FF0000u
255u * 256u * 256u * 256u = 0xFF000000u (= 4278190080)
That last result, 0xFF000000u, has the highest bit of a 32-bit number set. Casting that value to a signed 32-bit type therefore causes it to become negative as-if 232 had been subtracted from it (that's the implementation-defined operation I mentioned above).
(int) (255u*256u*256u*256u) = 0xFF000000 = -16777216
I write the hexadecimal number there, sans u suffix, to emphasize that the bit pattern of the value does not change when you convert it to a signed type; it is only reinterpreted.
Now, when you assign -16777216 to a uint64_t variable, it is back-converted to unsigned as-if by adding 264. (Unlike the unsigned-to-signed conversion, this semantic is prescribed by the standard.) This does change the bit pattern, setting all of the high 32 bits of the number to 1 instead of 0 as you had expected:
(uint64_t) (int) (255u*256u*256u*256u) = 0xFFFFFFFFFF000000u
And if you write 0xFFFFFFFFFF000000 in decimal, you get 18446744073692774400.
As a closing piece of advice, whenever you get an "impossible" integer from C or C++, try printing it out in hexadecimal; it's much easier to see oddities of twos-complement fixed-width arithmetic that way.
The answer is simple -- overflowed.
Here Overflow occurred on int and when you are assigning it to unsigned int64 its converted in to 18446744073692774400 instead of 4278190080

Implicit type casts in expressions with bit shifts

In the code below, why 1-byte anUChar is automatically converted into 4 bytes to produce the desired result 0x300 (instead of 0x0 if anUChar would remain 1 byte in size):
unsigned char anUChar = 0xc0; // only the two most significant bits are set
int anInt = anUChar << 2; // 0x300 (correct)
But in this code, aimed at a 64-bit result, no automatic conversion into 8 bytes happens:
unsigned int anUInt = 0xc0000000; // only the two most significant bits are set
long long aLongLong = anUInt << 2; // 0x0 (wrong, means no conversion occurred)
And only placing an explicit type cast works:
unsigned int anUInt = 0xc0000000;
long long aLongLong = (long long)anUInt << 2; // 0x300000000 (correct)
And most importantly, would this behavior be the same in a program that targets 64-bit machines?
By the way, which of the two is most right and portable: (type)var << 1 or ((type)var) << 1?
char always gets promoted to int during arithmetic. I think this is specified behavior in the C standard.
However, int is not automatically promoted to long long.
Under some situations, some compilers (Visual Studio) will actually warn you about this if you try to left-shift a smaller integer and store it into a larger one.
By the way, which of the two is most right and portable: (type)var <<
1 or ((type)var) << 1?
Both are fine and portable. Though I prefer the first one since it's shorter. Casting has higher precedence than shift.
Conversion does happen. The problem is the result of the expression anUInt << 2 is an unsigned int because anUInt is an unsigned int.
Casting anUInt to a long long (actually, this is conversion in this particular case) is the correct thing to do.
Neither (type)var << 1 or ((type)var) << 1 is more correct or portable because operator precedence is strictly defined by the Standard. However, the latter is probably better because it's easier to understand to humans looking at the code casually. Others may disagree with this assertion.
EDIT:
Note that in your first example:
unsigned char anUChar = 0xc0;
int anInt = anUChar << 2;
...the result of the expression anUChar << 2 is not an unsigned char as you might expect, but an int because of Integral Promotion.
The operands of operator<< are integral or enumeration type (See Standard 5.8/1). When a binary operator that expects operands of arithmetic or enumeration type is called, the compiler attempts to convert both operands to the same type, so that the expression may yield a common type. In this case, integral promotion is performed on both operands (5/9). When an unsigned char takes part in integral promotion, it will be converted to an int if your platform can accomodate all possible values of unsigned char in an int, else it will be converted to an unsigned int (4.5/1).
Shorter integral types are promoted to an int type for bitshift operations. This has nothing to do with the type to which you assign the result of the shift.
On 64-bit machines, your second piece of code would be equally problematic since the int types are usually also 32 bit wide. (Between x86 and x64, long long int is typically always 64 and int 32 bits, only long int depends on the platform.)
(In the spirit of C++, I would write the conversion as (unsigned long long int)(anUInt) << 2, evocative of the conversion-constructor syntax. The first set of parentheses is purely because the type name consists of several tokens.)
I would also prefer to do bitshifting exclusively on unsigned types, because only unsigned types can be considered equivalent (in terms of values) to their own bit pattern value.
Because of integer promotions. For most operators (e.g. <<), char operands are promoted to int first.
This has nothing to do with where the result of the calculation is going. In other words, the fact that your second example assigns to a long long does not affect the promotion of the operands.