int plus unsigned int returns an unsigned int. Should it be so?
Consider this code:
#include <boost/static_assert.hpp>
#include <boost/typeof/typeof.hpp>
#include <boost/type_traits/is_same.hpp>
class test
{
static const int si = 0;
static const unsigned int ui = 0;
typedef BOOST_TYPEOF(si + ui) type;
BOOST_STATIC_ASSERT( ( boost::is_same<type, int>::value ) ); // fails
};
int main()
{
return 0;
}
If by "should it be" you mean "does my compiler behave according to the standard": yes.
C++2003: Clause 5, paragraph 9:
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield
result types in a similar way. The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follows:
blah
Otherwise, blah,
Otherise, blah, ...
Otherwise, if either operand is unsigned, the other shall be converted to unsigned.
If by "should it be" you mean "would the world be a better place if it didn't": I'm not competent to answer that.
Unsigned integer types mostly behave as members of a wrapping abstract algebraic ring of values which are equivalent mod 2^N; one might view an N-bit unsigned integer not as representing a particular integer, but rather the set of all integers with a particular value in the bottom N bits. For example, if one adds together two binary numbers whose last 4 digits are ...1001 and ...0101, the result will be ...1110. If one adds ...1111 and ...0001, the result will be ...0000; if one subtracts ...0001 from ...0000 the result will be ...1111. Note that concepts of overflow or underflow don't really mean anything, since the upper-bit values of the operands are unknown and the upper-bit values of the result are of no interest. Note also that adding a signed integer whose upper bits are known to one whose upper bits are "don't know/don't care" should yield a number whose upper bits are "don't know/don't care" (which is what unsigned integer types mostly behave as).
The only places where unsigned integer types fail to behave as members of a wrapping algebraic ring is when they participate in comparisons, are used in numerical division (which implies comparisons), or are promoted to other types. If the only way to convert an unsigned integer type to something larger was to use an operator or function for that purpose, the use of such an operator or function could make clear that it was making assumptions about the upper bits (e.g. turning "some number whose lower bits are ...00010110" into "the number whose lower bits are ...00010110 and whose upper bits are all zeroes). Unfortunately, C doesn't do that. Adding a signed value to an unsigned value of equal size yields a like-size unsigned value (which makes sense with the interpretation of unsigned values above), but adding a larger signed integer to an unsigned type will cause the compiler to silently assume that all upper bits of the latter are zeroes. This behavior can be especially vexing in cases where, depending upon a compilers' promotion rules, some compilers may deem two expressions as having the same size while others may view them as different sizes.
It is likely that the behavior stems from the logic behind pointer types (memory location, e.g. std::size_t) plus a memory location difference (std::ptrdiff_t) is also a memory location.
In other words, std::size_t = std::size_t + std::ptrdiff_t.
When this logic is translated to underlaying types this means, unsigned long = unsigned long + long, or unsigned = unsigned + int.
The "other" explanation from #supercat is also possibly correct.
What is clear is that unsigned integer were not designed or should not be interpreted to be mathematical positive numbers, no even in principle. See https://www.youtube.com/watch?v=wvtFGa6XJDU
Related
I have the below simple program:
#include <iostream>
#include <stdio.h>
void SomeFunction(int a)
{
std::cout<<"Value in function: a = "<<a<<std::endl;
}
int main(){
size_t a(0);
std::cout<<"Value in main: "<<a-1<<std::endl;
SomeFunction(a-1);
return 0;
}
Upon executing this I get:
Value in main: 18446744073709551615
Value in function: a = -1
I think I roughly understand why the function gets the 'correct' value of -1: there is an implicit conversion from the unsigned type to the signed one i.e. 18446744073709551615(unsigned) = -1(signed).
Is there any situation where the function will not get the 'correct' value?
Since size_t type is unsigned, subtracting 1 is well defined:
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
However, the resultant value of 264-1 is out of ints range, so you get implementation-defined behavior:
[when] the new type is signed and the value cannot be represented in it, either the result is implementation-defined or an implementation-defined signal is raised.
Therefore, the answer to your question is "yes": there are platforms where the value of a would be different; there are also platforms where instead of calling SomeFunction the program will raise a signal.
Not on your computer... but technically yes, there is a situation where things can go wrong.
All modern PCs use the "two's complement" system for signed integer arithmetic (read Wikipedia for details). Two's complement has many advantages, but one of the biggest is this: unsaturated addition and subtraction of signed integers is identical to that of unsigned integers. As long as overflow/underflow causes the result to "wrap around" (i.e., 0-1 = UINT_MAX), the computer can add and subtract without even knowing whether you're interpreting the numbers as signed or unsigned.
BUT! C/C++ do not technically require two's complement for signed integers. There are two other permissible systems, known as "sign-magnitude" and "one's complement". These are unusual systems, never found outside antique architectures and embedded processors (and rarely even there). But in those systems, signed and unsigned arithmetic do not match up, and (signed)(a+b) will not necessarily equal (signed)a + (signed) b.
There's another, more mundane caveat when you're also narrowing types, as is the case between size_t and int on x64, because C/C++ don't require compilers to follow a particular rule when narrowing out-of-range values to signed types. This is likewise more a matter of language lawyering than actual unsafeness, though: VC++, GCC, Clang, and all other compilers I'm aware of narrow through truncation, leading to the expected behavior.
It's easier to compare and contrast signed and insigned of type same basic type, say signed int and unsigned int.
On a system that uses 32 bits for int, the range of unsigned int is [0 - 4294967295] and the range of signed int is [-2147483647 - 2147483647].
Say you have a variable of type unsigned int and its value is greater than 2147483647. If you pass such a variable to SomeFunction, you will see an incorrect value in the function.
Conversely, say you have a variable of type signed int and its value is less than zero. If you pass such a variable to a function that expects an unsigned int, you will see an incorrect value in the function.
I was looking at this video. Bjarne Stroustrup says that unsigned ints are error prone and lead to bugs. So, you should only use them when you really need them. I've also read in one of the question on Stack Overflow (but I don't remember which one) that using unsigned ints can lead to security bugs.
How do they lead to security bugs? Can someone clearly explain it by giving an suitable example?
One possible aspect is that unsigned integers can lead to somewhat hard-to-spot problems in loops, because the underflow leads to large numbers. I cannot count (even with an unsigned integer!) how many times I made a variant of this bug
for(size_t i = foo.size(); i >= 0; --i)
...
Note that, by definition, i >= 0 is always true. (What causes this in the first place is that if i is signed, the compiler will warn about a possible overflow with the size_t of size()).
There are other reasons mentioned Danger – unsigned types used here!, the strongest of which, in my opinion, is the implicit type conversion between signed and unsigned.
One big factor is that it makes loop logic harder: Imagine you want to iterate over all but the last element of an array (which does happen in the real world). So you write your function:
void fun (const std::vector<int> &vec) {
for (std::size_t i = 0; i < vec.size() - 1; ++i)
do_something(vec[i]);
}
Looks good, doesn't it? It even compiles cleanly with very high warning levels! (Live) So you put this in your code, all tests run smoothly and you forget about it.
Now, later on, somebody comes along an passes an empty vector to your function. Now with a signed integer, you hopefully would have noticed the sign-compare compiler warning, introduced the appropriate cast and not have published the buggy code in the first place.
But in your implementation with the unsigned integer, you wrap and the loop condition becomes i < SIZE_T_MAX. Disaster, UB and most likely crash!
I want to know how they lead to security bugs?
This is also a security problem, in particular it is a buffer overflow. One way to possibly exploit this would be if do_something would do something that can be observed by the attacker. They might be able to find what input went into do_something, and that way data the attacker should not be able to access would be leaked from your memory. This would be a scenario similar to the Heartbleed bug. (Thanks to ratchet freak for pointing that out in a comment.)
I'm not going to watch a video just to answer a question, but one issue is the confusing conversions which can happen if you mix signed and unsigned values. For example:
#include <iostream>
int main() {
unsigned n = 42;
int i = -42;
if (i < n) {
std::cout << "All is well\n";
} else {
std::cout << "ARITHMETIC IS BROKEN!\n";
}
}
The promotion rules mean that i is converted to unsigned for the comparison, giving a large positive number and a surprising result.
Although it may only be considered as a variant of the existing answers: Referring to "Signed and unsigned types in interfaces," C++ Report, September 1995 by Scott Meyers, it's particularly important to avoid unsigned types in interfaces.
The problem is that it becomes impossible to detect certain errors that clients of the interface could make (and if they could make them, they will make them).
The example given there is:
template <class T>
class Array {
public:
Array(unsigned int size);
...
and a possible instantiation of this class
int f(); // f and g are functions that return
int g(); // ints; what they do is unimportant
Array<double> a(f()-g()); // array size is f()-g()
The difference of the values returned by f() and g() might be negative, for an awful number of reasons. The constructor of the Array class will receive this difference as a value that is implicitly converted to be unsigned. Thus, as the implementor of the Array class, one can not distinguish between an erreonously passed value of -1, and a very large array allocation.
The big problem with unsigned int is that if you subtract 1 from an unsigned int 0, the result isn't a negative number, the result isn't less than the number you started with, but the result is the largest possible unsigned int value.
unsigned int x = 0;
unsigned int y = x - 1;
if (y > x) printf ("What a surprise! \n");
And this is what makes unsigned int error prone. Of course unsigned int works exactly as it is designed to work. It's absolutely safe if you know what you are doing and make no mistakes. But most people make mistakes.
If you are using a good compiler, you turn on all the warnings that the compiler produces, and it will tell you when you do dangerous things that are likely to be mistakes.
The problem with unsigned integer types is that depending upon their size they may represent one of two different things:
Unsigned types smaller than int (e.g. uint8) hold numbers in the range 0..2ⁿ-1, and calculations with them will behave according to the rules of integer arithmetic provided they don't exceed the range of the int type. Under present rules, if such a calculation exceeds the range of an int, a compiler is allowed to do anything it likes with the code, even going so far as to negate the laws of time and causality (some compilers will do precisely that!), and even if the result of the calculation would be assigned back to an unsigned type smaller than int.
Unsigned types unsigned int and larger hold members of the abstract wrapping algebraic ring of integers congruent mod 2ⁿ; this effectively means that if a calculation goes outside the range 0..2ⁿ-1, the system will add or subtract whatever multiple of 2ⁿ would be required to get the value back in range.
Consequently, given uint32_t x=1, y=2; the expression x-y may have one of two meanings depending upon whether int is larger than 32 bits.
If int is larger than 32 bits, the expression will subtract the number 2 from the number 1, yielding the number -1. Note that while a variable of type uint32_t can't hold the value -1 regardless of the size of int, and storing either -1 would cause such a variable to hold 0xFFFFFFFF, but unless or until the value is coerced to an unsigned type it will behave like the signed quantity -1.
If int is 32 bits or smaller, the expression will yield a uint32_t value which, when added to the uint32_t value 2, will yield the uint32_t value 1 (i.e. the uint32_t value 0xFFFFFFFF).
IMHO, this problem could be solved cleanly if C and C++ were to define new unsigned types [e.g. unum32_t and uwrap32_t] such that a unum32_t would always behave as a number, regardless of the size of int (possibly requiring the right-hand operation of a subtraction or unary minus to be promoted to the next larger signed type if int is 32 bits or smaller), while a wrap32_t would always behave as a member of an algebraic ring (blocking promotions even if int were larger than 32 bits). In the absence of such types, however, it's often impossible to write code which is both portable and clean, since portable code will often require type coercions all over the place.
Numeric conversion rules in C and C++ are a byzantine mess. Using unsigned types exposes yourself to that mess to a much greater extent than using purely signed types.
Take for example the simple case of a comparison between two variables, one signed and the other unsigned.
If both operands are smaller than int then they will both be converted to int and the comparison will give numerically correct results.
If the unsigned operand is smaller than the signed operand then both will be converted to the type of the signed operand and the comparison will give numerically correct results.
If the unsigned operand is greater than or equal in size to the signed operand and also greater than or equal in size to int then both will be converted to the type of the unsigned operand. If the value of the signed operand is less than zero this will lead to numerically incorrect results.
To take another example consider multiplying two unsigned integers of the same size.
If the operand size is greater than or equal to the size of int then the multiplication will have defined wraparound semantics.
If the operand size is smaller than int but greater than or equal to half the size of int then there is the potential for undefined behaviour.
If the operand size is less than half the size of int then the multiplication will produce numerically correct results. Assigning this result back to a variable of the original unsigned type will produce defined wraparound semantics.
In addition to range/warp issue with unsigned types. Using mix of unsigned and signed integer types impact significant performance issue for processor. Less then floating point cast, but quite a lot to ignore that. Additionally compiler may place range check for the value and change the behavior of further checks.
This question already has answers here:
sizeof() operator in if-statement
(5 answers)
Closed 4 years ago.
#include <stdio.h>
int arr[] = {1,2,3,4,5,6,7,8};
#define SIZE (sizeof(arr)/sizeof(int))
int main()
{
printf("SIZE = %d\n", SIZE);
if ((-1) < SIZE)
printf("less");
else
printf("more");
}
The output after compiling with gcc is "more". Why the if condition fails even when -1 < 8?
The problem is in your comparison:
if ((-1) < SIZE)
sizeof typically returns an unsigned long, so SIZE will be unsigned long, whereas -1 is just an int. The rules for promotion in C and related languages mean that -1 will be converted to size_t before the comparison, so -1 will become a very large positive value (the maximum value of an unsigned long).
One way to fix this is to change the comparison to:
if (-1 < (long long)SIZE)
although it's actually a pointless comparison, since an unsigned value will always be >= 0 by definition, and the compiler may well warn you about this.
As subsequently noted by #Nobilis, you should always enable compiler warnings and take notice of them: if you had compiled with e.g. gcc -Wall ... the compiler would have warned you of your bug.
TL;DR
Be careful with mixed signed/unsigned operations (use -Wall compiler warnings). The Standard has a long section about it. In particular, it is often but not always true that signed is value-converted to unsigned (although it does in your particular example). See this explanation below (taken from this Q&A)
Relevant quote from the C++ Standard:
5 Expressions [expr]
10 Many binary operators that expect operands of arithmetic or
enumeration type cause conversions and yield result types in a similar
way. The purpose is to yield a common type, which is also the type of
the result. This pattern is called the usual arithmetic conversions,
which are defined as follows:
[2 clauses about equal types or types of equal sign omitted]
— Otherwise, if the operand that has unsigned integer type has rank
greater than or equal to the rank of the type of the other operand,
the operand with signed integer type shall be converted to the type of
the operand with unsigned integer type.
— Otherwise, if the type of
the operand with signed integer type can represent all of the values
of the type of the operand with unsigned integer type, the operand
with unsigned integer type shall be converted to the type of the
operand with signed integer type.
— Otherwise, both operands shall be
converted to the unsigned integer type corresponding to the type of
the operand with signed integer type.
Your actual example
To see into which of the 3 cases your program falls, modify it slightly to this
#include <stdio.h>
int arr[] = {1,2,3,4,5,6,7,8};
#define SIZE (sizeof(arr)/sizeof(int))
int main()
{
printf("SIZE = %zu, sizeof(-1) = %zu, sizeof(SIZE) = %zu \n", SIZE, sizeof(-1), sizeof(SIZE));
if ((-1) < SIZE)
printf("less");
else
printf("more");
}
On the Coliru online compiler, this prints 4 and 8 for the sizeof() of -1 and SIZE, respectively, and selects the "more" branch (live example).
The reason is that the unsigned type is of greater rank than the signed type. Hence, clause 1 applies and the signed type is value-converted to the unsigned type (on most implementation, typically by preserving the bit-representation, so wrapping around to a very large unsigned number), and the comparison then proceeds to select the "more" branch.
Variations on a theme
Rewriting the condition to if ((long long)(-1) < (unsigned)SIZE) would take the "less" branch (live example).
The reason is that the signed type is of greater rank than the unsigned type and can also accomodate all the unsigned values. Hence, clause 2 applies and the unsigned type is converted to the signed type, and the comparison then proceeds to select the "less" branch.
Of course, you would never write such a contrived if() statement with explicit casts, but the same effect could happen if you compare variables with types long long and unsigned. So it illustrates the point that mixed signed/unsigned arithmetic is very subtle and depends on the relative sizes ("ranking" in the words of the Standard). In particular, there is no fixed rules saying that signed will always be converted to unsigned.
When you do comparison between signed and unsigned where unsigned has at least an equal rank to that of the signed type (see TemplateRex's answer for the exact rules), the signed is converted to the type of the unsigned.
With regards to your case, on a 32bit machine the binary representation of -1 as unsigned is 4294967295. So in effect you are comparing if 4294967295 is smaller than 8 (it isn't).
If you had enabled warnings, you would have been warned by the compiler that something fishy is going on:
warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
Since the discussion has shifted a bit on how appropriate the use of unsigned is, let me put a quote by James Gosling with regards to the lack of unsigned types in Java (and I will shamelessly link to another post of mine on the subject):
Gosling: For me as a language designer, which I don't really count
myself as these days, what "simple" really ended up meaning was could
I expect J. Random Developer to hold the spec in his head. That
definition says that, for instance, Java isn't -- and in fact a lot of
these languages end up with a lot of corner cases, things that nobody
really understands. Quiz any C developer about unsigned, and pretty
soon you discover that almost no C developers actually understand what
goes on with unsigned, what unsigned arithmetic is. Things like that
made C complex. The language part of Java is, I think, pretty simple.
The libraries you have to look up.
This is an historical design bug of C that was also repeated in C++.
It dates back to 16-bit computers and the error was deciding to use all 16 bits to represent sizes up to 65536 giving up the possibility to represent negative sizes.
This in se wouldn't have been an error if unsigned meaning was "non-negative integer" (a size cannot logically be negative) but it's a problem with the conversion rules of the language.
Given the conversion rules of the language the unsigned type in C doesn't represent a non-negative number, but it's instead more like a bitmask (the mathematical term is actually "a member of the ℤ/n ring"). To see why consider that for the C and C++ language
unsigned - unsigned gives an unsigned result
signed + unsigned gives and unsigned result
both of them clearly make no sense at all if you read unsigned as "non-negative number".
Of course saying that the size of an object is a member of ℤ/n ring doesn't make any sense at all and here it's where the error resides.
Practical implications:
Every time you deal with the size of an object be careful because the value is unsigned and that type in C/C++ has a lot of properties that are illogical for a number. Please always remember that unsigned doesn't mean "non-negative integer" but "member of ℤ/n algebraic ring" and that, most dangerous, in case of a mixed operation an int is converted to unsigned int and not the opposite.
For example:
void drawPolyline(const std::vector<P2d>& pts) {
for (int i=0; i<pts.size()-1; i++) {
drawLine(pts[i], pts[i+1]);
}
}
is buggy, because if passed an empty vector of points it will do illegal (UB) operations. The reason is that pts.size() is an unsigned.
The rules of the language will convert 1 (an integer) to 1{mod n}, will perform the subtraction in ℤ/n resulting in (size-1){mod n}, will convert i also to a {mod n} representation and will do the comparison in ℤ/n.
C/C++ actually defines a < operator in ℤ/n (rarely done in math) and you will end up accessing pts[0], pts[1] ... and so on until huge numbers even if the input vector was empty.
A correct loop could be
void drawPolyline(const std::vector<P2d>& pts) {
for (int i=1; i<pts.size(); i++) {
drawLine(pts[i-1], pts[i]);
}
}
but I normally prefer
void drawPolyline(const std::vector<P2d>& pts) {
for (int i=0,n=pts.size(); i<n-1; i++) {
drawLine(pts[i], pts[i+1]);
}
}
in other words getting rid of unsigned as soon as possible, and just working with regular ints.
Never use unsigned to represent size of containers or counters because unsigned means "member of ℤ/n" and the size of a container is not one of those things. Unsigned types are useful, but NOT to represent size of objects.
The standard C/C++ library unfortunately made this wrong choice, and it's too late to fix it. You are not forced to do the same mistake however.
In the words of Bjarne Stroustrup:
Using an unsigned instead of an int to gain one more bit to represent
positive integers is almost never a good idea. Attempts to ensure that
some values are positive by declaring variables unsigned will
typically be defeated by the implicit conversion rules
well, i'm not going to repeat the strong words Paul R said, but when you are comparing unsigned and integers you are going to experience dome bad things.
do if ((-1) < (int)SIZE)
instead of your if condition
Convert the unsigned type returned from sizeof operator to signed
when you compare two unsigned and signed number compiler implicitly converts signed to unsigned.
-1 signed representation in 4 byte int is 11111111 11111111 11111111 11111111 when converted to unsigned this representation would refer to 2^16-1
So basically your are comparing that 2^16-1>SIZE, which would be true.
You have to override that by explicitly casting the unsigned value to signed.
Since sizeof operator returns unsigned long long you should cast it to signed long long
if((-1)<(signed long long)SIZE)
use this if condition in your code
I've spent some time poring over the standard references, but I've not been able to find an answer to the following:
is it technically guaranteed by the C/C++ standard that, given a signed integral type S and its unsigned counterpart U, the absolute value of each possible S is always less than or equal to the maximum value of U?
The closest I've gotten is from section 6.2.6.2 of the C99 standard (the wording of the C++ is more arcane to me, I assume they are equivalent on this):
For signed integer types, the bits of the object representation shall be divided into three
groups: value bits, padding bits, and the sign bit. (...) Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and Nin the unsigned type, then M≤N).
So, in hypothetical 4-bit signed/unsigned integer types, is anything preventing the unsigned type to have 1 padding bit and 3 value bits, and the signed type having 3 value bits and 1 sign bit? In such a case the range of unsigned would be [0,7] and for signed it would be [-8,7] (assuming two's complement).
In case anyone is curious, I'm relying at the moment on a technique for extracting the absolute value of a negative integer consisting of first a cast to the unsigned counterpart, and then the application of the unary minus operator (so that for instance -3 becomes 4 via cast and then 3 via unary minus). This would break on the example above for -8, which could not be represented in the unsigned type.
EDIT: thanks for the replies below Keith and Potatoswatter. Now, my last point of doubt is on the meaning of "subrange" in the wording of the standard. If it means a strictly "less-than" inclusion, then my example above and Keith's below are not standard-compliant. If the subrange is intended to be potentially the whole range of unsigned, then they are.
For C, the answer is no, there is no such guarantee.
I'll discuss types int and unsigned int; this applies equally to any corresponding pair of signed and unsigned types (other than char and unsigned char, neither of which can have padding bits).
The standard, in the section you quoted, implicitly guarantees that UINT_MAX >= INT_MAX, which means that every non-negative int value can be represented as an unsigned int.
But the following would be perfectly legal (I'll use ** to denote exponentiation):
CHAR_BIT == 8
sizeof (int) == 4
sizeof (unsigned int) == 4
INT_MIN = -2**31
INT_MAX = +2**31-1
UINT_MAX = +2**31-1
This implies that int has 1 sign bit (as it must) and 31 value bits, an ordinary 2's-complement representation, and unsigned int has 31 value bits and one padding bit. unsigned int representations with that padding bit set might either be trap representations, or extra representations of values with the padding bit unset.
This might be appropriate for a machine with support for 2's-complement signed arithmetic, but poor support for unsigned arithmetic.
Given these characteristics, -INT_MIN (the mathematical value) is outside the range of unsigned int.
On the other hand, I seriously doubt that there are any modern systems like this. Padding bits are permitted by the standard, but are very rare, and I don't expect them to become any more common.
You might consider adding something like this:
#if -INT_MIN > UINT_MAX
#error "Nope"
#endif
to your source, so it will compile only if you can do what you want. (You should think of a better error message than "Nope", of course.)
You got it. In C++11 the wording is more clear. §3.9.1/3:
The range of non-negative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the value representation of each corresponding signed/unsigned type shall be the same.
But, what really is the significance of the connection between the two corresponding types? They are the same size, but that doesn't matter if you just have local variables.
In case anyone is curious, I'm relying at the moment on a technique for extracting the absolute value of a negative integer consisting of first a cast to the unsigned counterpart, and then the application of the unary minus operator (so that for instance -3 becomes 4 via cast and then 3 via unary minus). This would break on the example above for -8, which could not be represented in the unsigned type.
You need to deal with whatever numeric ranges the machine supports. Instead of casting to the unsigned counterpart, cast to whatever unsigned type is sufficient: one larger than the counterpart if necessary. If no large enough type exists, then the machine may be incapable of doing what you want.
This question already has answers here:
Implicit type conversion rules in C++ operators
(9 answers)
Closed 4 years ago.
Consider the following programs:
// http://ideone.com/4I0dT
#include <limits>
#include <iostream>
int main()
{
int max = std::numeric_limits<int>::max();
unsigned int one = 1;
unsigned int result = max + one;
std::cout << result;
}
and
// http://ideone.com/UBuFZ
#include <limits>
#include <iostream>
int main()
{
unsigned int us = 42;
int neg = -43;
int result = us + neg;
std::cout << result;
}
How does the + operator "know" which is the correct type to return? The general rule is to convert all of the arguments to the widest type, but here there's no clear "winner" between int and unsigned int. In the first case, unsigned int must be being chosen as the result of operator+, because I get a result of 2147483648. In the second case, it must be choosing int, because I get a result of -1. Yet I don't see in the general case how this is decidable. Is this undefined behavior I'm seeing or something else?
This is outlined explicitly in §5/9:
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:
If either operand is of type long double, the other shall be converted to long double.
Otherwise, if either operand is double, the other shall be converted to double.
Otherwise, if either operand is float, the other shall be converted to float.
Otherwise, the integral promotions shall be performed on both operands.
Then, if either operand is unsigned long the other shall be converted to unsigned long.
Otherwise, if one operand is a long int and the other unsigned int, then if a long int can represent all the values of an unsigned int, the unsigned int shall be converted to a long int; otherwise both operands shall be converted to unsigned long int.
Otherwise, if either operand is long, the other shall be converted to long.
Otherwise, if either operand is unsigned, the other shall be converted to unsigned.
[Note: otherwise, the only remaining case is that both operands are int]
In both of your scenarios, the result of operator+ is unsigned. Consequently, the second scenario is effectively:
int result = static_cast<int>(us + static_cast<unsigned>(neg));
Because in this case the value of us + neg is not representable by int, the value of result is implementation-defined – §4.7/3:
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.
Before C was standardized, there were differences between compilers -- some followed "value preserving" rules, and others "sign preserving" rules. Sign preserving meant that if either operand was unsigned, the result was unsigned. This was simple, but at times gave rather surprising results (especially when a negative number was converted to an unsigned).
C standardized on the rather more complex "value preserving" rules. Under the value preserving rules, promotion can/does depend on the actual ranges of the types, so you can get different results on different compilers. For example, on most MS-DOS compilers, int is the same size as short and long is different from either. On many current systems int is the same size as long, and short is different from either. With value preserving rules, these can lead to the promoted type being different between the two.
The basic idea of value preserving rules is that it'll promote to a larger signed type if that can represent all the values of the smaller type. For example, a 16-bit unsigned short can be promoted to a 32-bit signed int, because every possible value of unsigned short can be represented as a signed int. The types will be promoted to an unsigned type if and only if that's necessary to preserve the values of the smaller type (e.g., if unsigned short and signed int are both 16 bits, then a signed int can't represent all possible values of unsigned short, so an unsigned short will be promoted to unsigned int).
When you assign the result as you have, the result will get converted to the destination type anyway, so most of this makes relatively little difference -- at least in most typical cases, where it'll just copy the bits into the result, and it's up to you to decide whether to interpret that as signed or unsigned.
When you don't assign the result such as in a comparison, things can get pretty ugly though. For example:
unsigned int a = 5;
signed int b = -5;
if (a > b)
printf("Of course");
else
printf("What!");
Under sign preserving rules, b would be promoted to unsigned, and in the process become equal to UINT_MAX - 4, so the "What!" leg of the if would be taken. With value preserving rules, you can manage to produce some strange results a bit like this as well, but 1) primarily on the DOS-like systems where int is the same size as short, and 2) it's generally harder to do it anyway.
It's choosing whatever type you put your result into or at least cout is honoring that type during output.
I don't remember for sure but I think C++ compilers generate the same arithmetic code for both, it's only compares and output that care about sign.