64 bit integer initialization error in Visual Studio 2010 SP1 - c++

I have a weird bug/error/self stupidity concern. I'm developing a small application in C, and I'm using Visual Studio 2010 SP1. The offending code:
uint64_t sum_squared_X = 65535*65535*64;
int64_t sum_squared_Y = 65535*65535*64;
When debugging, I get these results:
sum_squared_X = 18446744073701163072;
sum_squared_Y = -8388544;
Question is, why? An uint64_t has a maximum value of 2^64-1 or 18446744073709551615, and an int64_t a maximum value of 2^63-1 or 9223372036854775807.
65535*65535*64 = 274869518400, which is lower than both maximums. Then why am I getting these results?
I'm completely lost here, some help would be appreciated.

Short answer: 65535 is multiplied by 65535 using 32-big signed arithmetic, producing -131,071. This is then multiplied by -64 and converted to uint64_t (creating a larger positive value due to wrapping) or int64_t (preserving the result of multiplying -131,071 by 64).
Long answer:
The type of an unsuffixed integer decimal constant depends on its value. It is the first of this list that can represent its value: int, long int, long long int. (Adding a suffix or using an octal or hexadecimal constant changes the list.) Because these types depend on the C implementation, the behavior depends on the C implementation.
It is likely that, in your machine, int is 32 bits. Therefore, the type of “65535” is int, and so is the type of “64”.
Your expression starts with “65535*65535”. This multiplies 65,535 by 65,535. The mathematical result is 4,924,836,225 (in hex, 0xfffe0001). With a 32-bit signed int, this overflows the representable values. This is undefined behavior in the C standard. What commonly happens in many implementations is that the value “wraps around” from 231-1 (the highest representable value) to -231 (the lowest representable value). Another view of the same behavior is that the bits of the mathematical result, 0xfffe0001, are interpreted as the encoding of a 32-bit signed int. In two’s complement, 0xffffe0001 is -131,071.
Then your expression multiplies by 64. -131,071*64 does not overflow; the result is -8,388,544 (with encoding 0xff800040).
Finally, you use the result to initialize a uint64_t or int64_t object. This initialization causes a conversion to the destination type.
The int64_t conversion is straightforward; the input to the conversion is -8,388,544, and this is exactly representable in int64_t, so the result is -8,388,544, which the compiler likely implements simply by extending the sign bit (producing the encoding 0xffffffffff800040).
The uint64_t conversion is troublesome, because -8,388,544 cannot be represented in a uint64_t. According to the 1999 C standard, 6.3.1.3 2, “the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.” For uint64_t, “one more than the maximum value that can be represented in the new type” is 264. So the result is -8,388,544 + 264, which is 18,446,744,073,701,163,072. This also has the encoding 0xffffffffff800040.
For conversion from a narrower width to a wider width, this operation of adding one more than maximum value is equivalent to copying the sign bit of the old type to all higher bits in the new type (sign extension). For conversion from a wider width to a narrower width, it is equivalent to discarding the high bits. In either case, the result is the residue modulo 2n, where n is the number of bits in the new type.

When I compile your example, I clearly get an integer constant overflow warning for each of those lines. This is because the right side, the constants, is usually stored in a basic integer. You have to change the storage of those values to keep an overflow condition from happening. To fix this, do the following instead:
uint64_t sum_squared_X = (uint64_t)65535*65535*64;
int64_t sum_squared_Y = (uint64_t)65535*65535*64;
Read more here

Related

Why does (unsigned int = -1) show the largest value that it can store? [duplicate]

In C or C++ it is said that the maximum number a size_t (an unsigned int data type) can hold is the same as casting -1 to that data type. for example see Invalid Value for size_t
Why?
I mean, (talking about 32 bit ints) AFAIK the most significant bit holds the sign in a signed data type (that is, bit 0x80000000 to form a negative number). then, 1 is 0x00000001.. 0x7FFFFFFFF is the greatest positive number a int data type can hold.
Then, AFAIK the binary representation of -1 int should be 0x80000001 (perhaps I'm wrong). why/how this binary value is converted to anything completely different (0xFFFFFFFF) when casting ints to unsigned?? or.. how is it possible to form a binary -1 out of 0xFFFFFFFF?
I have no doubt that in C: ((unsigned int)-1) == 0xFFFFFFFF or ((int)0xFFFFFFFF) == -1 is equally true than 1 + 1 == 2, I'm just wondering why.
C and C++ can run on many different architectures, and machine types. Consequently, they can have different representations of numbers: Two's complement, and Ones' complement being the most common. In general you should not rely on a particular representation in your program.
For unsigned integer types (size_t being one of those), the C standard (and the C++ standard too, I think) specifies precise overflow rules. In short, if SIZE_MAX is the maximum value of the type size_t, then the expression
(size_t) (SIZE_MAX + 1)
is guaranteed to be 0, and therefore, you can be sure that (size_t) -1 is equal to SIZE_MAX. The same holds true for other unsigned types.
Note that the above holds true:
for all unsigned types,
even if the underlying machine doesn't represent numbers in Two's complement. In this case, the compiler has to make sure the identity holds true.
Also, the above means that you can't rely on specific representations for signed types.
Edit: In order to answer some of the comments:
Let's say we have a code snippet like:
int i = -1;
long j = i;
There is a type conversion in the assignment to j. Assuming that int and long have different sizes (most [all?] 64-bit systems), the bit-patterns at memory locations for i and j are going to be different, because they have different sizes. The compiler makes sure that the values of i and j are -1.
Similarly, when we do:
size_t s = (size_t) -1
There is a type conversion going on. The -1 is of type int. It has a bit-pattern, but that is irrelevant for this example because when the conversion to size_t takes place due to the cast, the compiler will translate the value according to the rules for the type (size_t in this case). Thus, even if int and size_t have different sizes, the standard guarantees that the value stored in s above will be the maximum value that size_t can take.
If we do:
long j = LONG_MAX;
int i = j;
If LONG_MAX is greater than INT_MAX, then the value in i is implementation-defined (C89, section 3.2.1.2).
It's called two's complement. To make a negative number, invert all the bits then add 1. So to convert 1 to -1, invert it to 0xFFFFFFFE, then add 1 to make 0xFFFFFFFF.
As to why it's done this way, Wikipedia says:
The two's-complement system has the advantage of not requiring that the addition and subtraction circuitry examine the signs of the operands to determine whether to add or subtract. This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic.
Your first question, about why (unsigned)-1 gives the largest possible unsigned value is only accidentally related to two's complement. The reason -1 cast to an unsigned type gives the largest value possible for that type is because the standard says the unsigned types "follow the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer."
Now, for 2's complement, the representation of the largest possible unsigned value and -1 happen to be the same -- but even if the hardware uses another representation (e.g. 1's complement or sign/magnitude), converting -1 to an unsigned type must still produce the largest possible value for that type.
Two's complement is very nice for doing subtraction just like addition :)
11111110 (254 or -2)
+00000001 ( 1)
---------
11111111 (255 or -1)
11111111 (255 or -1)
+00000001 ( 1)
---------
100000000 ( 0 + 256)
That is two's complement encoding.
The main bonus is that you get the same encoding whether you are using an unsigned or signed int. If you subtract 1 from 0 the integer simply wraps around. Therefore 1 less than 0 is 0xFFFFFFFF.
Because the bit pattern for an int
-1 is FFFFFFFF in hexadecimal unsigned.
11111111111111111111111111111111 binary unsigned.
But in int the first bit signifies whether it is negative.
But in unsigned int the first bit is just extra number because a unsigned int cannot be negative. So the extra bit makes an unsigned int able to store bigger numbers.
As with an unsigned int 11111111111111111111111111111111 (binary) or FFFFFFFF (hexadecimal) is the biggest number a uint can store.
Unsigned Ints are not recommended because if they go negative then it overflows and goes to the biggest number.

Unsigned and Signed int and printf

I understand that I am assigning signed int a value that is larger than what it can handle. Also I should be using %d for signed and %u for unsigned. Similarly I should not be assigning -ve value to unsigned. But if I make such assignments and use printf as below, I get the results show below.
My understanding is that in each case, the number of converted to its two's compliment binary representation which is same for -1 or 4294967295. That is why %u for signed prints 4294967295 by ignoring -ve leftmost bit. When used %d for signed int, it uses left most bit as -ve flag and prints -1. Similarly %u for unsigned prints unsigned value but %d causes it to treat the number as signed and thus prints -1. Is that correct?
signed int si = 4294967295;
unsigned int ui = 4294967295;
printf("si = u=%u d=%d\n", si, si);
printf("ui = u=%u d=%d\n", ui, ui);
Output:
si = u=4294967295 d=-1
ui = u=4294967295 d=-1
signed int si = -1;
unsigned int ui = -1;
printf("si = u=%u d=%d\n", si, si);
printf("ui = u=%u d=%d\n", ui, ui);
Output:
si = u=4294967295 d=-1
ui = u=4294967295 d=-1
That is why %u for signed prints 4294967295 by ignoring -ve leftmost bit. When used %d for signed int, it uses left most bit as -ve flag and prints -1.
In the case of unsigned, the "leftmost" or most significant bit is not ignored, and is not negative; rather it has a place value of 231.
In the negative case, the sign bit is not a flag; instead it is a bit with a place value of -231.
In both cases the value of the integer is equal to the sum of the place values of all the binary digits (bits) set to 1.
The encoding of signed values in this way is known as two's complement. It is not the only possible encoding; what you described is known as sign and magnitude for example, and one's complement is another possibility. However, these alternative encodings are seldom encountered in practice, not least because two's complement is how arithmetic works on modern hardware in all but perhaps the most arcane architectures.
There are a few things going on here let's start out by saying that using the incorrect format specifier to printf is undefined behavior which means the results of your program are unpredictable, what actually happens will depends on many factors including your compiler, architecture, optimization level, etc...
For signed/unsigned conversions, that is defined by the respective standards, both C and C++ make it implementation defined behavior to convert a value that is larger than be stored in a signed integer type, from the C++ draft standard:
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and
bit-field width); otherwise, the value is implementation-defined.
for example gcc chooses to use the same convention as unsigned:
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
When you assign -1 to an unsigned value in both C and C++ the result will always be the maximum unsigned value of the type, from the draft C++ standard:
If the destination type is unsigned, the resulting value is the least
unsigned integer congruent to the source integer (modulo 2n where n is
the number of bits used to represent the unsigned type). [ Note: In a
two’s complement representation, this conversion is conceptual and
there is no change in the bit pattern (if there is no truncation).
—end note ]
The wording from C99 is easier to digest:
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
So we have the following:
-1 + (UNSIGNED_MAX + 1)
the result of which is UNSIGNED_MAX
As for printf and incorrect format specifier we can see form the draft C99 standard section 7.19.6.1 The fprintf function says:
If a conversion specification is invalid, the behavior is
undefined.248) If any argument is not the correct type for the
corresponding conversion specification, the behavior is undefined.
fprintf covers printf with respect to format specifiers and C++ falls back in C with respect to printf.

Using sizeof to get maximum power of two

Is there a way in C/C++ to compute the maximum power of two that is representable by a certain data type using the sizeof operator?
For example, say I have an unsigned short int. Its values can range between 0 and 65535.
Therefore the maximum power of two that an unsigned short int can contain is 32768.
I pass this unsigned short int to a function and I have (at the moment) and algorithm that looks like this:
if (ushortParam > 32768) {
ushortParam = 32768; // Bad hardcoded literals
}
However, in the future, I may want to change the variable type to incorporate larger powers of two. Is there a type-independent formula using sizeof() that can achieve the following:
if (param > /*Some function...*/sizeof(param) )
{
param = /*Some function...*/sizeof(param);
}
Note the parameter will never require floating-point precision - integers only.
Setting most significant bit of your a variable of that parameter size will give you the highest power of 2.
1 << (8*sizeof(param)-1)
What about:
const T max_power_of_two = (std::numeric_limits<T>::max() >> 1) + 1;
To get the highest power of 2 representable by a certain integer type you may use limits.h instead of the sizeof operator. For instance:
#include <stdlib.h>
#include <stdio.h>
#include <limits.h>
int main() {
int max = INT_MAX;
int hmax = max>>1;
int mpow2 = max ^ hmax;
printf("The maximum representable integer is %d\n",max);
printf("The maximum representable power of 2 is %d\n",mpow2);
return 0;
}
This should always work as the right shift of a positive integer is always defined. Quoting from the standard C section 6.5.7.5 (Bitwise shift operator):
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1
has an unsigned type or if E1 has a signed type and a nonnegative
value, the value of the result is the integral part of the quotient of
E1 divided by the quantity, 2 raised to the power E2.
If the use of sizeof is mandatory you can use:
1 << (CHAR_BIT*sizeof(param)-1)
for unsigned integer types and:
1 << (CHAR_BIT*sizeof(param)-2)
for signed integer types. The lines above will work only in the case of integer types without padding bits. The part of the standard C ensuring these lines to work is in section 6.2.6.2. In particular:
For unsigned integer types other than unsigned char, the bits of the
object representation shall be divided into two groups: value bits and
padding bits (there need not be any of the latter). If there are N
value bits, each bit shall represent a different power of 2 between 1
and 2N-1, so that objects of that type shall be capable of
representing values from 0 to 2N - 1 using a pure binary
representation; this shall be known as the value representation.
guarantees the first method to work while:
For signed integer types, the bits of the object representation shall
be divided into three groups: value bits, padding bits, and the sign
bit. There need not be any padding bits; there shall be exactly one
sign bit.
...
A valid (non-trap) object representation of a signed integer type
where the sign bit is zero is a valid object representation of the
corresponding unsigned type, and shall represent the same value.
explains why the second line give the right answer.
The accepted answer will probably work on Posix platforms, but is not general C/C++. It assumes that CHAR_BIT is 8, doesn't specify the type, and assumes that the type has no padding bits.
Here are more general versions for any/all unsigned integer types and don't require including any headers, dependencies, etc.:
#define MAX_VAL(UNSIGNED_TYPE) ((UNSIGNED_TYPE) -1)
#define MAX_POW2(UNSIGNED_TYPE) (~(MAX_VAL(UNSIGNED_TYPE) >> 1))
#define MAX_POW2_VER2(UNSIGNED_TYPE) (MAX_VAL(UNSIGNED_TYPE) ^ (MAX_VAL(UNSIGNED_TYPE) >> 1))
#define MAX_POW2_VER3(UNSIGNED_TYPE) ((MAX_VAL(UNSIGNED_TYPE) >> 1) + 1)
The standards, even C90, guarantee that casting -1 to an unsigned type always yields the maximum value that type can represent. From there, all of the bitwise operators above are well defined.
http://c0x.coding-guidelines.com/6.3.1.3.html
6.3.1.3 Signed and unsigned integers
682 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
683 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
684 Otherwise, the new type is signed and the value cannot be represented in it;
685 either the result is implementation-defined or an implementation-defined signal is raised.
The maximum value of an unsigned type is one less than a power of 2 and has all value bits set. The above expressions result in the highest bit alone being set, which is the maximum power of 2 that the type can represent.
http://c0x.coding-guidelines.com/6.2.6.2.html
6.2.6.2 Integer types
593 For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter).
594 If there are N value bits, each bit shall represent a different power of 2 between 1 and 2^(N - 1), so that objects of that type shall be capable of representing values from 0 to 2^N − 1 using a pure binary representation;
595 this shall be known as the value representation.
596 The values of any padding bits are unspecified.

Curious arithmetic error- 255x256x256x256=18446744073692774400

I encountered a strange thing when I was programming under c++. It's about a simple multiplication.
Code:
unsigned __int64 a1 = 255*256*256*256;
unsigned __int64 a2= 255 << 24; // same as the above
cerr()<<"a1 is:"<<a1;
cerr()<<"a2 is:"<<a2;
interestingly the result is:
a1 is: 18446744073692774400
a2 is: 18446744073692774400
whereas it should be:(using calculator confirms)
4278190080
Can anybody tell me how could it be possible?
255*256*256*256
all operands are int you are overflowing int. The overflow of a signed integer is undefined behavior in C and C++.
EDIT:
note that the expression 255 << 24 in your second declaration also invokes undefined behavior if your int type is 32-bit. 255 x (2^24) is 4278190080 which cannot be represented in a 32-bit int (the maximum value is usually 2147483647 on a 32-bit int in two's complement representation).
C and C++ both say for E1 << E2 that if E1 is of a signed type and positive and that E1 x (2^E2) cannot be represented in the type of E1, the program invokes undefined behavior. Here ^ is the mathematical power operator.
Your literals are int. This means that all the operations are actually performed on int, and promptly overflow. This overflowed value, when converted to an unsigned 64bit int, is the value you observe.
It is perhaps worth explaining what happened to produce the number 18446744073692774400. Technically speaking, the expressions you wrote trigger "undefined behavior" and so the compiler could have produced anything as the result; however, assuming int is a 32-bit type, which it almost always is nowadays, you'll get the same "wrong" answer if you write
uint64_t x = (int) (255u*256u*256u*256u);
and that expression does not trigger undefined behavior. (The conversion from unsigned int to int involves implementation-defined behavior, but as nobody has produced a ones-complement or sign-and-magnitude CPU in many years, all implementations you are likely to encounter define it exactly the same way.) I have written the cast in C style because everything I'm saying here applies equally to C and C++.
First off, let's look at the multiplication. I'm writing the right hand side in hex because it's easier to see what's going on that way.
255u * 256u = 0x0000FF00u
255u * 256u * 256u = 0x00FF0000u
255u * 256u * 256u * 256u = 0xFF000000u (= 4278190080)
That last result, 0xFF000000u, has the highest bit of a 32-bit number set. Casting that value to a signed 32-bit type therefore causes it to become negative as-if 232 had been subtracted from it (that's the implementation-defined operation I mentioned above).
(int) (255u*256u*256u*256u) = 0xFF000000 = -16777216
I write the hexadecimal number there, sans u suffix, to emphasize that the bit pattern of the value does not change when you convert it to a signed type; it is only reinterpreted.
Now, when you assign -16777216 to a uint64_t variable, it is back-converted to unsigned as-if by adding 264. (Unlike the unsigned-to-signed conversion, this semantic is prescribed by the standard.) This does change the bit pattern, setting all of the high 32 bits of the number to 1 instead of 0 as you had expected:
(uint64_t) (int) (255u*256u*256u*256u) = 0xFFFFFFFFFF000000u
And if you write 0xFFFFFFFFFF000000 in decimal, you get 18446744073692774400.
As a closing piece of advice, whenever you get an "impossible" integer from C or C++, try printing it out in hexadecimal; it's much easier to see oddities of twos-complement fixed-width arithmetic that way.
The answer is simple -- overflowed.
Here Overflow occurred on int and when you are assigning it to unsigned int64 its converted in to 18446744073692774400 instead of 4278190080

Why unsigned int 0xFFFFFFFF is equal to int -1?

In C or C++ it is said that the maximum number a size_t (an unsigned int data type) can hold is the same as casting -1 to that data type. for example see Invalid Value for size_t
Why?
I mean, (talking about 32 bit ints) AFAIK the most significant bit holds the sign in a signed data type (that is, bit 0x80000000 to form a negative number). then, 1 is 0x00000001.. 0x7FFFFFFFF is the greatest positive number a int data type can hold.
Then, AFAIK the binary representation of -1 int should be 0x80000001 (perhaps I'm wrong). why/how this binary value is converted to anything completely different (0xFFFFFFFF) when casting ints to unsigned?? or.. how is it possible to form a binary -1 out of 0xFFFFFFFF?
I have no doubt that in C: ((unsigned int)-1) == 0xFFFFFFFF or ((int)0xFFFFFFFF) == -1 is equally true than 1 + 1 == 2, I'm just wondering why.
C and C++ can run on many different architectures, and machine types. Consequently, they can have different representations of numbers: Two's complement, and Ones' complement being the most common. In general you should not rely on a particular representation in your program.
For unsigned integer types (size_t being one of those), the C standard (and the C++ standard too, I think) specifies precise overflow rules. In short, if SIZE_MAX is the maximum value of the type size_t, then the expression
(size_t) (SIZE_MAX + 1)
is guaranteed to be 0, and therefore, you can be sure that (size_t) -1 is equal to SIZE_MAX. The same holds true for other unsigned types.
Note that the above holds true:
for all unsigned types,
even if the underlying machine doesn't represent numbers in Two's complement. In this case, the compiler has to make sure the identity holds true.
Also, the above means that you can't rely on specific representations for signed types.
Edit: In order to answer some of the comments:
Let's say we have a code snippet like:
int i = -1;
long j = i;
There is a type conversion in the assignment to j. Assuming that int and long have different sizes (most [all?] 64-bit systems), the bit-patterns at memory locations for i and j are going to be different, because they have different sizes. The compiler makes sure that the values of i and j are -1.
Similarly, when we do:
size_t s = (size_t) -1
There is a type conversion going on. The -1 is of type int. It has a bit-pattern, but that is irrelevant for this example because when the conversion to size_t takes place due to the cast, the compiler will translate the value according to the rules for the type (size_t in this case). Thus, even if int and size_t have different sizes, the standard guarantees that the value stored in s above will be the maximum value that size_t can take.
If we do:
long j = LONG_MAX;
int i = j;
If LONG_MAX is greater than INT_MAX, then the value in i is implementation-defined (C89, section 3.2.1.2).
It's called two's complement. To make a negative number, invert all the bits then add 1. So to convert 1 to -1, invert it to 0xFFFFFFFE, then add 1 to make 0xFFFFFFFF.
As to why it's done this way, Wikipedia says:
The two's-complement system has the advantage of not requiring that the addition and subtraction circuitry examine the signs of the operands to determine whether to add or subtract. This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic.
Your first question, about why (unsigned)-1 gives the largest possible unsigned value is only accidentally related to two's complement. The reason -1 cast to an unsigned type gives the largest value possible for that type is because the standard says the unsigned types "follow the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer."
Now, for 2's complement, the representation of the largest possible unsigned value and -1 happen to be the same -- but even if the hardware uses another representation (e.g. 1's complement or sign/magnitude), converting -1 to an unsigned type must still produce the largest possible value for that type.
Two's complement is very nice for doing subtraction just like addition :)
11111110 (254 or -2)
+00000001 ( 1)
---------
11111111 (255 or -1)
11111111 (255 or -1)
+00000001 ( 1)
---------
100000000 ( 0 + 256)
That is two's complement encoding.
The main bonus is that you get the same encoding whether you are using an unsigned or signed int. If you subtract 1 from 0 the integer simply wraps around. Therefore 1 less than 0 is 0xFFFFFFFF.
Because the bit pattern for an int
-1 is FFFFFFFF in hexadecimal unsigned.
11111111111111111111111111111111 binary unsigned.
But in int the first bit signifies whether it is negative.
But in unsigned int the first bit is just extra number because a unsigned int cannot be negative. So the extra bit makes an unsigned int able to store bigger numbers.
As with an unsigned int 11111111111111111111111111111111 (binary) or FFFFFFFF (hexadecimal) is the biggest number a uint can store.
Unsigned Ints are not recommended because if they go negative then it overflows and goes to the biggest number.