Explain integer comparison with promotion - c++

I'm trying to understand how integer promotion and comparison in and c++ application works.
#include <cstdint>
int main(void)
{
uint32_t foo = 20;
uint8_t a = 2;
uint8_t b = 1;
uint8_t c = 5;
if(foo == b*c) {}
if(foo == a) {}
if(foo == a + c) {}
if(foo == a + b*c) {}
return 0;
}
Only for the last comparison i get a compiler warning: "comparison between signed and unsigned integer expressions [-Wsign-compare]".
Why does this only happen in the last case but not in the others?

since the type of operands are different a set of implicit conversions take place to reach a common type.
For the binary operators (except shifts), if the promoted operands
have different types, additional set of implicit conversions is
applied, known as usual arithmetic conversions with the goal to
produce the common type (also accessible via the std::common_type type
trait)
because of integral types here integral conversions is applied to:
If either operand has scoped enumeration type, no conversion is performed: the other operand and the return type must have the same
type
Otherwise, if either operand is long double, the other operand is converted to long double
Otherwise, if either operand is double, the other operand is converted to double
Otherwise, if either operand is float, the other operand is converted to float
Otherwise, the operand has integer type (because bool, char, char8_t, char16_t, char32_t, wchar_t, and unscoped enumeration
were promoted at this point) and integral conversions are
applied to produce the common type, as follows:
If both operands are signed or both are unsigned, the operand with lesser conversion rank is converted to the operand with the
greater integer conversion rank
Otherwise, if the unsigned operand's conversion rank is greater or equal to the conversion rank of the signed
operand, the signed operand is converted to the unsigned
operand's type.
Otherwise, if the signed operand's type can represent all values of the unsigned operand, the unsigned operand is converted to the
signed operand's type Otherwise, both operands are converted to the
unsigned counterpart of the signed operand's type.
The same arithmetic conversions apply to comparison operators too.
from all this one can conclude since the rhs are all uint8_t the common type will be int, and then since the rhs is uint32_t the common type of == operator will be uint32_t.
but for some reason that I have no idea gcc don't do the last conversion while clang does it. see the gcc type conversion for + operator in godblot
It also could happen that the warning is a false warning and the conversion took place, as it happened for + operator.
See how clang sees the last if(cppinsights):
if(foo == static_cast<unsigned int>(static_cast<int>(a) + (static_cast<int>
(b) * static_cast<int>(c))))
Update:
I couldn't find a difference in the assembly generated by the two compilers and would agree with #M.M so, IMO it's a gcc bug.

It's a compiler "bug". To elaborate on this:
In general, comparison between signed and unsigned relies on implementation-defined quantities (the sizes/ranges of types). For example USHRT_MAX == -1 is true on common 16-bit systems, and false on common 32-bit systems. The answer by "oblivion" goes into more technical detail about this.
All of your code examples are well-defined and behave the same on all (conforming) systems.
The intent of this warning is twofold:
to alert you to code that might behave differently on other systems.
to alert you to code that might not behave as the coder intended.
However, in general. it's not such a simple job for the compiler's static analysis to sort out the first case, let alone the second case which is rather subjective.
IMO the warning, for your code, is a bug because the code is well-defined and there is nothing to warn about.
Personally I don't enable this warning: I'm familiar with the rules for signed-unsigned comparison and prefer to avoid mangling my code to suppress the warning.
Going to the opposite extreme, some people prefer to avoid all signed-unsigned comparisons in their code even when it is well-defined; and they would consider it a bug that the compiler doesn't warn about your first three code examples.
GCC has tended to err on the side of warning too much, but they are in the situation that they can't please everyone.

Related

Why is signed and unsigned addition converted differently for 16 and 32 bit integers?

It seems the GCC and Clang interpret addition between a signed and unsigned integers differently, depending on their size. Why is this, and is the conversion consistent on all compilers and platforms?
Take this example:
#include <cstdint>
#include <iostream>
int main()
{
std::cout <<"16 bit uint 2 - int 3 = "<<uint16_t(2)+int16_t(-3)<<std::endl;
std::cout <<"32 bit uint 2 - int 3 = "<<uint32_t(2)+int32_t(-3)<<std::endl;
return 0;
}
Result:
$ ./out.exe
16 bit uint 2 - int 3 = -1
32 bit uint 2 - int 3 = 4294967295
In both cases we got -1, but one was interpreted as an unsigned integer and underflowed. I would have expected both to be converted in the same way.
So again, why do the compilers convert these so differently, and is this guaranteed to be consistent? I tested this with g++ 11.1.0, clang 12.0. and g++ 11.2.0 on Arch Linux and Debian, getting the same result.
When you do uint16_t(2)+int16_t(-3), both operands are types that are smaller than int. Because of this, each operand is promoted to an int and signed + signed results in a signed integer and you get the result of -1 stored in that signed integer.
When you do uint32_t(2)+int32_t(-3), since both operands are the size of an int or larger, no promotion happens and now you are in a case where you have unsigned + signed which results in a conversion of the signed integer into an unsigned integer, and the unsigned value of -1 wraps to being the largest value representable.
So again, why do the compilers convert these so differently,
Standard quotes for [language-lawyer]:
[expr.arith.conv]
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way.
The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follows:
...
Otherwise, the integral promotions ([conv.prom]) shall be performed on both operands.
Then the following rules shall be applied to the promoted operands:
If both operands have the same type, no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank shall be converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.
Otherwise, both operands shall be converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
[conv.prom]
A prvalue of an integer type other than bool, char8_­t, char16_­t, char32_­t, or wchar_­t whose integer conversion rank ([conv.rank]) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.
These conversions are called integral promotions.
std::uint16_t type may have a lower conversion rank than int in which case it will be promoted when used as an operand. int may be able to represent all values of std::uint16_t in which case the promotion will be to int. The common type of two int is int.
std::uint32_t type may have the same or a higher conversion rank than int in which case it won't be promoted. The common type of an unsigned type and a signed of same rank is an unsigned type.
For an explanation why this conversion behaviour was chosen, see chapter "6.3.1.1 Booleans, characters, and integers" of "
Rationale for
International Standard—
Programming Languages—
C". I won't quote the entire chapter here.
is this guaranteed to be consistent?
The consistency depends on relative sizes of the integer types which are implementation defined.
Why is this,
C (and hence C++) has a rule that effectively says when a type smaller than int is used in an expression it is first promoted to int (the actual rule is a little more complex than that to allow for multiple distinct types of the same size).
Section 6.3.1.1 of the Rationale for International Standard Programming Languages C claims that in early C compilers there were two versions of the promotion rule. "unsigned preserving" and "value preserving" and talks about why they chose the "value preserving" option. To summarise they believed it would produce correct results in a greater proportion of situations.
It does not however explain why the concept of promotion exists in the first place. I would speculate that it existed because on many processors, including the PDP-11 for which C was originally designed, arithmetic operations only operated on words, not on units smaller than words. So it was simpler and more efficient to convert everything smaller than a word to a word at the start of an expression.
On most platforms today int is 32 bits. So both uint16_t and int16_t are promoted to int. The artithmetic proceeds to produce a result of type int with a value of -1.
OTOH uint32_t and int32_t are not smaller than int, so they retain their original size and signedness through the promotion step. The rules for when the operands to an arithmetic operator are of different types come into play and since the operands are the same size the signed operand is converted to unsigned.
The rationale does not seem to talk about this rule, which suggests it goes back to pre-standard C.
and is the conversion consistent on all compilers and platforms?
On an Ansi C or ISO C++ platform it depends on the size of int. With 16 bit int both examples would give large positive values. With 64-bit int both examples would give -1.
On pre-standard implementations it's possible that both expressions might return large positive numbers.
A 16-bit unsigned int can be promoted to a 32-bit int without any lost values due to range differences, so that's what happens. Not so for the 32-bit integers.

C++: Does Comparing different sized integers cause UB? [duplicate]

This question already has answers here:
Comparing int with long and others
(2 answers)
Closed 1 year ago.
So this is probably a really simple question and if it was not about C++ I would just go ahead and check if it works on my computer or not, but unfortunately in C++ things usually tend to work on a couple of systems while still being UB and therefore not working on other systems.
Consider the following code snippet:
unsigned long long int a = std::numeric_limits< unsigned long long int >::max();
unsigned int b = 12;
bool test = a > b;
My question is: Can we compare integers of different size with one another without explicitly casting the smaller type to the bigger one using e.g. static_cast without running into undefined behavior (UB)?
In general there are three ways I can imagine this turning out:
The smaller type is implicitly cast to the bigger type before conversion (either via a real cast or by some clever way of being able to "pretend" it had been casted)
The bigger type is truncated to the size of the smaller one before comparison
This is not defined and one needs to add in an explicit cast in order to arrive at defined behavior
This is not undefined behavior. This is covered by the usual arithmetic conversions which are detailed in section 8p11.5 of the C++17 standard:
The integral promotions (7.6) shall be performed on both operands.
Then the following rules shall be applied to the promoted operands:
(11.5.1) If both operands have the same type, no further conversion is needed.
(11.5.2) Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser
integer conversion rank shall be converted to the type of the operand
with greater rank.
(11.5.3) Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the rank of the type of the other
operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type.
(11.5.4)Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with
unsigned integer type, the operand with unsigned integer type shall be
converted to the type of the operand with signed integer type.
(11.5.5)Otherwise, both operands shall be converted to the unsigned integer type corresponding to the type of the operand with signed
integer type.
The passage in bold is what applies here. Since both types are unsigned, the the smaller type is converted to the larger type as the format can hold a subset of values the latter can hold.
This is safe. C++ has what are called the Usual arithmetic conversions and they handle how to implicitly convert the objects passed to the built in binary operators.
In this case, integer promotion happens and b is converted to a unsigned long long int for you and then operator > is evaluated.

C++ Unexpected Integer Promotion

I was writing some code recently that was actually supposed to test other code, and I stumbled upon a surprising case of integer promotion. Here's the minimal testcase:
#include <cstdint>
#include <limits>
int main()
{
std::uint8_t a, b;
a = std::numeric_limits<std::uint8_t>::max();
b = a;
a = a + 1;
if (a != b + 1)
return 1;
else
return 0;
}
Surprisingly this program returns 1. Some debugging and a hunch revealed that b + 1 in the conditional was actually returning 256, while a + 1 in assignment produced the expected value of 0.
Section 8.10.6 (on the equality/ineuqlity operators) of the C++17 draft states that
If both operands are of arithmetic or enumeration type, the usual arithmetic conversions are performed on
both operands; each of the operators shall yield true if the specified relationship is true and false if it is
false.
What are "the usual arithmetic conversions", and where are they defined in the standard? My guess is that they implicitly promote smaller integers to int or unsigned int for certain operators (which is also supported by the fact that replacing std::uint8_t with unsigned int yields 0, and further in that the assignment operator lacks the "usual arithmetic conversions" clause).
What are "the usual arithmetic conversions", and where are they defined in the standard?
[expr.arith.conv]/1
Many binary operators that expect operands of arithmetic or
enumeration type cause conversions and yield result types in a similar
way. The purpose is to yield a common type, which is also the type of
the result. This pattern is called the usual arithmetic conversions,
which are defined as follows:
(1.1) If either operand is of scoped enumeration type, no conversions
are performed; if the other operand does not have the same type, the
expression is ill-formed.
(1.2) If either operand is of type long double, the other shall be
converted to long double.
(1.3) Otherwise, if either operand is double, the other shall be
converted to double.
(1.4) Otherwise, if either operand is float, the other shall be
converted to float.
(1.5) Otherwise, the integral promotions ([conv.prom]) shall be
performed on both operands.59 Then the following rules shall be
applied to the promoted operands:
(1.5.1) If both operands have the same type, no further conversion is
needed.
(1.5.2) Otherwise, if both operands have signed integer types or both
have unsigned integer types, the operand with the type of lesser
integer conversion rank shall be converted to the type of the operand
with greater rank.
(1.5.3) Otherwise, if the operand that has unsigned integer type has
rank greater than or equal to the rank of the type of the other
operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type.
(1.5.4) Otherwise, if the type of the operand with signed integer type
can represent all of the values of the type of the operand with
unsigned integer type, the operand with unsigned integer type shall be
converted to the type of the operand with signed integer type.
(1.5.5) Otherwise, both operands shall be converted to the unsigned
integer type corresponding to the type of the operand with signed
integer type.
59) As a consequence, operands of type bool, char8_­t, char16_­t,
char32_­t, wchar_­t, or an enumerated type are converted to some
integral type.
For uint8_t vs int (for operator+ and operator!= later), #1.5 is applied, uint8_t will be promoted to int, and the result of operator+ is int too.
On the other hand, for unsigned int vs int (for operator+), #1.5.3 is applied, int will be converted to unsigned int, and the result of operator+ is unsigned int.
Your guess is correct. Operands to many operators in C++ (e.g., binary arithmetic and comparison operators) are subject to the usual arithmetic conversions. In C++17, the usual arithmetic conversions are specified in [expr]/11. I'm not going to quote the whole paragraph here because it's rather large (you can just click on the link), but for integral types, the usual arithmetic conversions boil down to integral promotions being applied followed by effectively some more promoting in the sense that if the types of the two operands after the initial integral promotions are not the same, the smaller type is converted to the larger one of the two. The integral promotions basically mean that any type smaller than an int will be promoted to int or unsigned int, whichever of the two can represent all possible values of the original type, which is mainly what is causing the behavior in your example.
As you have already figured out yourself, in your code, the usual arithmetic conversions happen in a = a + 1; and, most noticeably, in the condition of your if
if (a != b + 1)
…
where they cause b to be promoted to int, making the result of b + 1 to be of type int, as well as a being promoted to int and the !=, thus, happening on values of type int, which causes the condition to be true rather than false…

Why is unsigned short (multiply) unsigned short converted to signed int? [duplicate]

This question already has answers here:
Implicit type conversion rules in C++ operators
(9 answers)
Closed 7 years ago.
Why is unsigned short * unsigned short converted to int in C++11?
The int is too small to handle max values as demonstrated by this line of code.
cout << USHRT_MAX * USHRT_MAX << endl;
overflows on MinGW 4.9.2
-131071
because (source)
USHRT_MAX = 65535 (2^16-1) or greater*
INT_MAX = 32767 (2^15-1) or greater*
and (2^16-1)*(2^16-1) = ~2^32.
Should I expect any problems with this solution?
unsigned u = static_cast<unsigned>(t*t);
This program
unsigned short t;
cout<<typeid(t).name()<<endl;
cout<<typeid(t*t).name()<<endl;
gives output
t
i
on
gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC)
gcc version 4.8.2 (GCC)
MinGW 4.9.2
with both
g++ p.cpp
g++ -std=c++11 p.cpp
which proves that t*t is converted to int on these compilers.
Usefull resources:
Signed to unsigned conversion in C - is it always safe?
Signed & unsigned integer multiplication
https://bytes.com/topic/c-sharp/answers/223883-multiplication-types-smaller-than-int-yields-int
http://www.cplusplus.com/reference/climits
http://en.cppreference.com/w/cpp/language/types
Edit: I have demonstrated the problem on the following image.
You may want to read about implicit conversions, especially the section about numeric promotions where it says
Prvalues of small integral types (such as char) may be converted to prvalues of larger integral types (such as int). In particular, arithmetic operators do not accept types smaller than int as arguments
What the above says is that if you use something smaller than int (like unsigned short) in an expression that involves arithmetic operators (which of course includes multiplication) then the values will be promoted to int.
It's the usual arithmetic conversions in action.
Commonly called argument promotion, although the standard uses that term in a more restricted way (the eternal conflict between reasonable descriptive terms and standardese).
C++11 §5/9:
” Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield
result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions […]
The paragraph goes on to describe the details, which amount to conversions up a ladder of more general types, until all arguments can be represented. The lowest rung on this ladder is integral promotion of both operands of a binary operation, so at least that is performed (but the conversion can start at a higher rung). And integral promotion starts with this:
C++11 §4.5/1:
” A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion
rank (4.13) is less than the rank of int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int
Crucially, this is about types, not arithmetic expressions. In your case the arguments of the multiplication operator * are converted to int. Then the multiplication is performed as an int multiplication, yielding an int result.
As pointed out by Paolo M in comments, USHRT_MAX has type int (this is specified by 5.2.4.2.1/1: all such macros have a type at least as big as int).
So USHRT_MAX * USHRT_MAX is already an int x int, no promotions occur.
This invokes signed integer overflow on your system, causing undefined behaviour.
Regarding the proposed solution:
unsigned u = static_cast<unsigned>(t*t);
This does not help because t*t itself causes undefined behaviour due to signed integer overflow. As explained by the other answers, t is promoted to int before the multiplication occurs, for historical reasons.
Instead you could use:
auto u = static_cast<unsigned int>(t) * t;
which, after integer promotion, is an unsigned int multiplied by an int; and then according to the rest of the usual arithmetic conversions, the int is promoted to unsigned int, and a well-defined modular multiplication occurs.
With integer promotion rules
USHRT_MAX value is promoted to int.
then we do the multiplication of 2 int (with possible overflow).
It seems that nobody has answered this part of the question yet:
Should I expect any problems with this solution?
u = static_cast<unsigned>(t*t);
Yes, there is a problem here: it first computes t*t and allows it to overflow, then it converts the result to unsigned. Integer overflow causes undefined behavior according to the C++ standard (even though it may always work fine in practice). The correct solution is:
u = static_cast<unsigned>(t)*t;
Note that the second t is promoted to unsigned before the multiplication because the first operand is unsigned.
As it has been pointed out by other answers, this happens due to integer promotion rules.
The simplest way to avoid the conversion from an unsigned type with a smaller rank than a signed type with a larger rank, is to make sure the conversion is done into an unsigned int and not int.
This is done by multiplying by the value 1 that is of type unsigned int. Due to 1 being a multiplicative identity, the result will remain unchanged:
unsigned short c = t * 1U * t;
First the operands t and 1U are evaluated. Left operand is signed and has a smaller rank than the unsigned right operand, so it gets converted to the type of the right operand. Then the operands are multiplied and the same happens with the result and the remaining right operand. The last paragraph in the Standard cited below is used for this promotion.
Otherwise, the integer promotions are performed on both operands. Then the
following rules are applied to the promoted operands:
-If both operands have the same type, then no further conversion is needed.
-Otherwise, if both operands have signed integer types or both have unsigned
integer types, the operand with the type of lesser integer conversion rank is
converted to the type of the operand with greater rank.
-Otherwise, if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type.

Why does C/C++ automatically convert char/wchar_t/short/bool/enum types to int?

So, if I understood it well, integral promotion provides that: char, wchar_t, bool, enum, short types ALWAYS are converted to int (or unsigned int). Then, if there are different types in an expression, further conversion will be applied.
Am I understanding this well?
And if yes, then my question: Why is it good? Why? Don't become char/wchar_t/bool/enum/short unnecessary? I mean for example:
char c1;
char c2;
c1 = c2;
As I described before, char ALWAYS is converted to int, so in this case after automatic converting this looks like this:
int c1;
int c2;
c1 = c2;
But I can't understand why is this good, if I know that char type will be enough for my needs.
Storage types are never automatically converted. You only get automatic integer promotion as soon as you start doing integer arithmetics (+, -, bitshifts, ...) on those variables.
char c1, c2; // stores them as char
char c3 = c1 + c2; // equivalent to
char c3 = (char)((int)c1 + (int)c2);
The conversions you're asking about are the usual arithmetic conversions and the integer promotions, defined in section 6.3.1.8 of the latest ISO C standard. They're applied to the operands of most binary operators ("binary" meaning that they take two operands, such as +, *, etc.). (The rules are similar for C++. In this answer, I'll just refer to the C standard.)
Briefly the usual arithmetic conversions are:
If either operand is long double, the other operand is converted to long double.
Otherwise, if either operand is double, the other operand is converted to double.
Otherwise, if either operand is float, the other operand is converted to float.
Otherwise, the integer promotions are performed on both operands, and then some other rules are applied to bring the two operands to a common type.
The integer promotions are defined in section 6.3.1.1 of the C standard. For a type narrower than int, if the type int can hold all the values of the type, then an expression of that type is converted to int; otherwise it's converted to unsigned int. (Note that this means that an expression of type unsigned short may be converted either to int or to unsigned int, depending on the relative ranges of the types.)
The integer promotions are also applied to function arguments when the declaration doesn't specify the type of the parameter. For example:
short s = 2;
printf("%d\n", s);
promotes the short value to int. This promotion does not occur for non-variadic functions.
The quick answer for why this is done is that the standard says so.
The underlying reason for all this complexity is to allow for the restricted set of arithmetic operations available on most CPUs. With this set of rules, all arithmetic operators (other than the shift operators, which are a special case) are only required to work on operands of the same type. There is no short + long addition operator; instead, the short operand is implicitly converted to long. And there are no arithmetic operators for types narrower than int; if you add two short values, both arguments are promoted to int, yielding an int result (which might then be converted back to short).
Some CPUs can perform arithmetic on narrow operands, but not all can do so. Without this uniform set of rules, either compilers would have to emulate narrow arithmetic on CPUs that don't support it directly, or the behavior of arithmetic expressions would vary depending on what operations the target CPU supports. The current rules are a good compromise between consistency across platforms and making good use of CPU operations.
if I understood it well, integral promotion provides that: char, wchar_t, bool, enum, short types ALWAYS converted to int (or unsigned int).
Your understanding is only partially correct: short types are indeed promoted to int, but only when you use them in expressions. The conversion is done immediately before the use. It is also "undone" when the result is stored back.
The way the values are stored remains consistent with the properties of the type, letting you control the way you use your memory for the variables that you store. For example,
struct Test {
char c1;
char c2;
};
will be four times as small as
struct Test {
int c1;
int c2;
};
on systems with 32-bit ints.
The conversion is not performed when you store the value in the variable. The conversion is done if you cast the value or if you perform some operation like some arithmetic operation on it explicitly
It really depends on your underlying microprocessor architecture. For example, if your processor is 32-bit, that is its native integer size. Using its native integer size in integer computations is better optimized.
Type conversion takes place when arithmetic operations, shift operations, unary operations are performed. See what standard says about it:
C11; 6.3.1.4 Real floating and integer:
If an int can represent all values of the original type (as restricted by the width, for a
bit-field), the value is converted to an int; otherwise, it is converted to an unsigned
int. These are called the integer promotions.58) All other types are unchanged by the
integer promotions.
58.The integer promotions are applied only: as part of the usual arithmetic conversions, to certain argument expressions, to the operands of the unary +, -, and ~ operators, and to both operands of the shift operators,1 as specified by their respective subclauses
1. Emphasis is mine.