Why does unary minus perform integral promotion?

Why does unary minus perform integral promotion? - c++

const auto min = -std::numeric_limits<T>::max();
T x = min; // conversion from 'const int' to 'short', possible loss of data
T is a template argument, a short in this case. Unary minus apparently performs integral promotion.
Why does unary minus perform integral promotion?
If auto is changed to T no warning is generated, but it should be assigning an int to a short. Why isn't there a warning (it could be VS being fancy)?

Short answer: (now long because people want to be excessively pedantic about English which by its very nature is not exact).
Its not explicitly (as in the unary minus mathematical).
But as part of any operation (this includes unary minus operation) on POD data there is an implicit check on input parameters (the smallest integer type that can be used in and operation is int) so there is integral promotion on the input before the unary minus (the mathematical part not the operation part). The output of all operations on POD is the same as the input parameters (after integral promotion is applied). Thus here the output is also an int.
Long answer:
In C (and thus C++) the smallest type that POD operations happen on is int. So before the unary minus is applied the value is converted to int. Then the unary minus is applied. The result of the expression is thus int.
See here: Implicit type conversion rules in C++ operators

Unary minus performs integral promotion because all the arithmetic operators on integral types do it. This convention was probably established for performance reasons (typically int was the size best suited for integral operations on the hardware) and perhaps because it's sometimes useful not to be limited to the range of smaller integral types.
I've seen bugs where people do:
uint64_t i = 1 << 33; // use 64-bit type to ensure enough bits to hold the result
But this is a bug because the result of 1 << 33 doesn't depend on what it's assigned to. (Today we have warnings about this). The solution, which is ugly, is to manually force one of the operands to be large enough: uint64_t i = static_cast<uint64_t>(1) << 33
Integral promotion happens to avoid the same sort of bug with smaller types, which would have been the common case back when C was designed this way:
short a = 30000, b = 20000
int i = a + b;
In any case "It doesn't have to be logical. It's the law."

I think this is to account for ideal computer model, which C/C++ assume. Calculations are performed in registers and it implies result's size would be machine word. Hence "automagical" expansion

I believe it's (partly) because the negative of a short may not be representable as a short.
(-215 might need promotion to int if short is 16-bit and int is bigger.)
Of course, there's no guarantee that int is any bigger, but it's (usually?) the most optimal data type for the machine anyway, so it makes sense to try promoting it to int since there's likely no performance cost anyway.

Related

Multiply 2 large numbers and store it in a long variable, but over flow occurs

Why, when I multiply 2 large numbers and store it in a long variable, overflow occurs and I must put (long) at the right hand side?
For example:
#include <iostream>
using namespace std;
int main(){
long x=100000*90000; //why i should code long x=(long)100000*90000;
cout<<x;
}
/* main.cpp:18:14: warning: integer overflow in expression
[-Woverflow] long x=100000*90000;
~~~~~~^~~~~~ 410065408 */

Why, when I multiply 2 large numbers and store it in a long variable, overflow occurs and I must put (long) at the right hand side?
More precisely, you multiply 2 large numbers, then overflow occurs, after which you store the product in a long variable. The overflow occurs in the middle of this process, not at the end.
With a bit more detail: you multiply 2 values of type int, overflow what int can store, then convert to long (so you can store the product in a long variable). There is nothing hinting that you want the result to be a long value until evaluation reaches the assignment, so the earlier operation (multiplication) proceeds with what it has (int values).
By casting the first operand to long, you change the order in which things happen. The operands are converted to long, then you multiply and no overflow occurs because the product fits in a long. After that, assigning a value to a variable of the same type has no special consideration.
Note 1: Your cast converts (just) the first operand to long. The second operand is converted to long because the multiplication operator expects both factors to have the same type. Well, there is a more precise, technical explanation involving the rules for integral conversions, but "expects the same type" gives a good feel for a majority of the cases. Just remember that it is a simplification, not a hard rule.
Note 2: Using 100000L would give the same result as using (long)100000 with less typing.

The key here is the priority (actually it is called operator precedence) of operations. Since the priority of a multiplication is higher than the priority of an assignment, the type of the result (of sub-expression) is first evaluated for the product operation. Since both operands are int, the result will be int and the number doesn't fit to integer. At this stage of expression evaluation it doesn't matter what the type will be later, when the assignment will be evaluated.
When you use (long)100000 or 100000L you explicitly tell the compiler that one of the operands is of type long and so must be the result's type. In this particular case type cast (long) has larger priority than multiplication, so it casts the type first.
Actually, on some platforms you may even need to convert to long long type in order to fit: (long long)100000 or 100000LL.
Another widespread example of this trap is:
double d = 3/2;
which ends up with d=1.0 by the same reasons.

In addition to others' suggestion to use L, long may be 32-bit and overflow still happens with 100000L * 90000.
Using L does not certainly help enough.
Consider long long.
long long x = 100000LL * 90000;
Cast vs. suffix
Consider this view if the C-style cast to long works, then so will L.
Cast has a weakness. A cast forces the result to long - this may narrow the result. Suffix L will not narrow the constant, only potentially widen it.
(long) 9000000000 // With 32-bit `long`, this is a (long) 410065408
9000000000L // With 32-bit `long`, this is a (long long) 9000000000
The first likely and unfortunately quiets a useful warning about the narrowing.

Double divided by double and by integer: which one is better?

I always assume that dividing a double by an integer will lead to faster code because the compiler will select the better microcode to compute:
double a;
double b = a/3.0;
double c = a/3; // will compute faster than b
For a single operation it does not matter, but for repetitive operations it can make difference. Is my assumption always correct or compiler or CPU dependent or whatever?
The same question applies for multiplication; i.e. will 3 * a be faster than 3.0 * a?

Your assumption is not correct because both your divide operations will be performed with two double operands. In the second case, c = a/3, the integer literal will be converted to a double value by the compiler before any code is generated.
From this Draft C++ Standard:
8.3 Usual arithmetic conversions          [expr.arith.conv]
1    Many binary operators
that expect operands of arithmetic or enumeration type cause
conversions and yield result types in a similar way. The purpose is to
yield a common type, which is also the type of the result. This
pattern is called the usual arithmetic conversions, which are
defined as follows:
…
(1.3) –
Otherwise, if either operand is double, the other shall be converted
to double.
Note that, in this Draft C11 Standard, §6.3.1.8 (Usual arithmetic conversions) has equivalent (indeed, near-identical) text.

There is no difference. The integer operand is implicitly converted to a double, so they end up practically equivalent.

size_t divided by int type conversion rules

When I am doing arithmetic operations with size_t type (or unsigned long), how careful should I be with decorating integer constants with type literals. For example,
size_t a = 1111111;
if (a/2 > 0) ...;
What happens when compiler does the division? Does it treat 2 as integer or as unsigned integer? If the former, then what is the resulting type for (unsigned int)/(int)?
Should I always carefully write 'u' literals
if (a/2u > 0) ...;
for (a=amax; a >= 0u; a -= 3u) ...;
or compiler will correctly guess that I want to use operations with unsigned integers?

2 is indeed treated as an int, which is then implicitly converted to size_t. In a mixed operation size_t / int, the unsigned type "wins" and signed type gets converted to unsigned one, assuming the unsigned type is at least as wide as the signed one. The result is unsigned, i.e. size_t in your case. (See Usual arithmetic conversions for details).
It is a better idea to just write it as a / 2. No suffixes, no type casts. Keep the code as type-independent as possible. Type names (and suffixes) belong in declarations, not in statements.

The C standard guarantees that size_t is an unsigned integer.
The literal 2 is always of type int.
The "usual artihmetic converstions guarantee that whenever an unsigned and a signed integer of the same size ("rank") are used as operands in a binary operation, the signed operand gets converted to unsigned type.
So the compiler actually interprets the expression like this:
a/(size_t)2 > (size_t)0
(The result of the > operator or any relational operator is always of type int though, as a special case for that group of operators.)
Should I always carefully write 'u' literals
Some coding standards, most notably MISRA-C, would have you do this, to make sure that no implicit type promotions exist in the code. Implicit promotions or conversions are very dangerous and they are a flaw in the C language.
For your specific case, there is no real danger with implicit promotions. But there are cases when small integer types are used and you might end up with unintentional change of signedness because of implicit type promotions.
There is never any harm with being explicit, although writing an u suffix to every literal in your code may arguably reduce readability.
Now what you really must do as a C programmer to deal with type promotion dangers, is to learn how the integer promotions and usual arithmetic conversions work (here's some example on the topic). Sadly, there are lots of C programmers who don't, veterans included. The result is subtle, but sometimes critical bugs. Particularly when using the bit-wise operators such as shifts, where change of signedness could invoke undefined behavior.
These rules can be somewhat tricky to learn as they aren't really behaving rationally or consistently. But until you know these rules in detail you have to be explicit with types.
EDIT: To be picky, the size of size_t is actually not specified, all the standard says is that it must be large enough to at least hold the value 65535 (2 bytes). So in theory, size_t could be equal to unsigned short, in which case promotions would turn out quite different. But in practice I doubt that scenario is of any interest, since I don't believe there exists any implementation where size_t is smaller than unsigned int.

Both C++ and C promote signed types to unsigned types when evaluating an operator like division which takes two arguments, and one of the arguments is an unsigned type.
So the literal 2 will be converted to an unsigned type.
Personally, I believe it's better to leave promotion to the compiler rather than be explicit: if your code was ever refactored and a became a signed type then a / 2u would cause a to be promoted to an unsigned type, with potentially disastrous consequences.

size_t sz = 11;
sz / 2 = 5
sz / (-2) = 0
WHY?
sz is treated as unsigned int because size can not be negative. When doing an arithmetic operation with an unsigned int and an int, the int is turned into unsigned int.
From "CS dummies"

Why does C/C++ automatically convert char/wchar_t/short/bool/enum types to int?

So, if I understood it well, integral promotion provides that: char, wchar_t, bool, enum, short types ALWAYS are converted to int (or unsigned int). Then, if there are different types in an expression, further conversion will be applied.
Am I understanding this well?
And if yes, then my question: Why is it good? Why? Don't become char/wchar_t/bool/enum/short unnecessary? I mean for example:
char c1;
char c2;
c1 = c2;
As I described before, char ALWAYS is converted to int, so in this case after automatic converting this looks like this:
int c1;
int c2;
c1 = c2;
But I can't understand why is this good, if I know that char type will be enough for my needs.

Storage types are never automatically converted. You only get automatic integer promotion as soon as you start doing integer arithmetics (+, -, bitshifts, ...) on those variables.
char c1, c2; // stores them as char
char c3 = c1 + c2; // equivalent to
char c3 = (char)((int)c1 + (int)c2);

The conversions you're asking about are the usual arithmetic conversions and the integer promotions, defined in section 6.3.1.8 of the latest ISO C standard. They're applied to the operands of most binary operators ("binary" meaning that they take two operands, such as +, *, etc.). (The rules are similar for C++. In this answer, I'll just refer to the C standard.)
Briefly the usual arithmetic conversions are:
If either operand is long double, the other operand is converted to long double.
Otherwise, if either operand is double, the other operand is converted to double.
Otherwise, if either operand is float, the other operand is converted to float.
Otherwise, the integer promotions are performed on both operands, and then some other rules are applied to bring the two operands to a common type.
The integer promotions are defined in section 6.3.1.1 of the C standard. For a type narrower than int, if the type int can hold all the values of the type, then an expression of that type is converted to int; otherwise it's converted to unsigned int. (Note that this means that an expression of type unsigned short may be converted either to int or to unsigned int, depending on the relative ranges of the types.)
The integer promotions are also applied to function arguments when the declaration doesn't specify the type of the parameter. For example:
short s = 2;
printf("%d\n", s);
promotes the short value to int. This promotion does not occur for non-variadic functions.
The quick answer for why this is done is that the standard says so.
The underlying reason for all this complexity is to allow for the restricted set of arithmetic operations available on most CPUs. With this set of rules, all arithmetic operators (other than the shift operators, which are a special case) are only required to work on operands of the same type. There is no short + long addition operator; instead, the short operand is implicitly converted to long. And there are no arithmetic operators for types narrower than int; if you add two short values, both arguments are promoted to int, yielding an int result (which might then be converted back to short).
Some CPUs can perform arithmetic on narrow operands, but not all can do so. Without this uniform set of rules, either compilers would have to emulate narrow arithmetic on CPUs that don't support it directly, or the behavior of arithmetic expressions would vary depending on what operations the target CPU supports. The current rules are a good compromise between consistency across platforms and making good use of CPU operations.

if I understood it well, integral promotion provides that: char, wchar_t, bool, enum, short types ALWAYS converted to int (or unsigned int).
Your understanding is only partially correct: short types are indeed promoted to int, but only when you use them in expressions. The conversion is done immediately before the use. It is also "undone" when the result is stored back.
The way the values are stored remains consistent with the properties of the type, letting you control the way you use your memory for the variables that you store. For example,
struct Test {
char c1;
char c2;
};
will be four times as small as
struct Test {
int c1;
int c2;
};
on systems with 32-bit ints.

The conversion is not performed when you store the value in the variable. The conversion is done if you cast the value or if you perform some operation like some arithmetic operation on it explicitly

It really depends on your underlying microprocessor architecture. For example, if your processor is 32-bit, that is its native integer size. Using its native integer size in integer computations is better optimized.

Type conversion takes place when arithmetic operations, shift operations, unary operations are performed. See what standard says about it:
C11; 6.3.1.4 Real floating and integer:
If an int can represent all values of the original type (as restricted by the width, for a
bit-field), the value is converted to an int; otherwise, it is converted to an unsigned
int. These are called the integer promotions.58) All other types are unchanged by the
integer promotions.
58.The integer promotions are applied only: as part of the usual arithmetic conversions, to certain argument expressions, to the operands of the unary +, -, and ~ operators, and to both operands of the shift operators,1 as specified by their respective subclauses
1. Emphasis is mine.

Result of bitwise operator in C++

Testing a couple of compilers (Comeau, g++) confirms that the result of a bitwise operator of some "integer type" is an int:
void foo( unsigned char );
void foo( unsigned short );
unsigned char a, b;
foo (a | b);
I would have expected the type of "a | b" to be an unsigned char, as both operands are unsigned char, but the compilers say that the result is an int, and the call to foo() is ambiguous. Why is the language designed so that the result is an int, or is this implementation dependent?
Thanks,

This is in fact standard C++ behavior (ISO/IEC 14882):
5.13/1 Bitwise inclusive OR operator
The usual arithmetic conversions are
performed; the result is the bitwise
inclusive OR function of its operands.
The operator applies only to integral
or enumeration operands.
5/9 Usual arithmetic conversions
Many binary operators that expect
operands of arithmetic or enumeration
type cause conversions and yield
result types in a similar way. The purpose
is to yield a common type, which is also
the type of the result. This
pattern is called the usual arithmetic
conversions, which are defined as
follows:
If either operand is of type long double,
the other shall be converted to long double.
Otherwise, if either operand is double,
the other shall be converted to double.
Otherwise, if either operand is float,
the other shall be converted to float.
Otherwise, the integral promotions
shall be performed on both operands.
...
4.5/1 Integral Promotions
An rvalue of type char, signed char,
unsigned char, short int, or unsigned
short int can be converted to an
rvalue of type int if int can
represent all the values of the source
type; otherwise, the source rvalue can be
converted to an rvalue of type unsigned int.
I think it has to do with int supposedly being the "natural" size for the execution environment to allow for efficient arithmetic (see Charles Bailey's answer).

I would have expected the type of "a | b" to be an unsigned char, as both operands are unsigned char,
My reading of some beginner C books in past left the impression that bitwise operators were left in the language solely for the purpose of the system programming and generally should be avoided.
The operators are performed by the CPU itself. CPU uses for operands registers (which are surely larger than char) and thus compiler cannot know how much bits of a register would be affected by the operation. To not to loose the full result of the operation, compiler upcasts the result to the proper operation. AFAICT.
Why is the language designed so that the result is an int, or is this implementation dependent?
Bit-level representation of data types is in fact implementation defined. That might be the reason why apparently bit-wise operations are also implementation defined.
Though C99 defines in 6.2.6.2 Integer types how they should appear and behave (and later how bitwise operations should work) the particular chapter gives a lot of freedom to the implementation.

It seems this is the same thing as in Java:
Short and char (and other integers smaller than an int) are weaker types than int. Every operation on these weaker types is therefore automatically unboxed into an int.
If you really want to get a short, you will have to typecast it.
Unfortunately I can't tell you why this was done, but it seems this is a relatively common language decision...

Isn't short the same as short int? in the same way that long is synonymous with int. eg. a short is an int taking up less memory then a standard int?

int is supposed to be the natural word size for any given machine architecture and many machines have instructions that only (or at least optimally) perform arithmetic operations on machine words.
If the language were defined without integer promotion, many multistep calculations which could otherwise naturally be mapped directly into machine instructions might have to be interspersed with masking operations performed on the intermediates in order to generate 'correct' results.

Neither C nor C++ ever perform any arithmetic operations on types smaller than int. Any time you specify a smaller operand (any flavor of char or short), the operand gets promoted to either int or unsigned int, depending on the range.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js