Why cannot floating-point promotion work for arithmetics as well? - c++

I have read a bit about floating-point promotion. I know that it doesn't apply on binary arithmetic operations, only on e.g. overload resolution. But why?
The C++ standard guarantees that double must be at least as precise as float [basic.fundamental.8] and the floating point promotion is required to keep the value unchanged [conv.fpprom]. Yet this question makes it very clear that it does not happen. Stroustrup, 4th edition has the subject even errata-ed (here, Chapter 10, p. 267).
However, I cannot see any reason why the promotion cannot be done in usual arithmetic conversions [expr.10], even if all prerequisites are met. Is there any?
The latest C++14 working draft can be found here, the final version is purchase-only.

Converting a float to a double costs something, and it's likely more expensive than a short to int conversion (it needs several shifts and bit combining operations). And unlike e.g. short, the float type is considered something on which the processor can operate directly (just like it can on int).
Given the facts obove, why should floating-point promotion happen when it's not necessary? That is, if you're adding two floats, why convert them to double, add them, and then convert them back to float?(1)
Note that a floating-point promotion will indeed happen when you're adding mixed arguments (e.g. a float + double), by the very ruling in C++14 [expr] you're referring to.
(10.3) Otherwise, if either operand is double, the other shall be converted to double.
As per [conv.fpprom], this conversion from float to double is carried out by floating point promotion.
(1) Of course, it is perfectly possible this will happen internally if the processor cannot operate on floats directly, and [expr].12 explicitly allows that. But that very paragraph says
the types are not changed thereby.

It does!
I don't know what you call "work", but the scope of definition for floating-point promotion and usual arithmetic conversions is different.
usual arithmetic conversions : Apply to binary operators that expect operands of arithmetic or enumeration type.
floating-point promotion : Apply to prvalues of type float.
Some expressions, like a + b qualify for both, while 1.0f qualify only as a prvalue.
The standard you linked says (about usual arithmetic conversions)
(10.3) if either operand is double, the other shall be converted to
double
...
(10.5) — Otherwise, the integral promotions shall be performed on both operands
It doesn't restrict how the other operand is converted to double, so I would assume that double + float follow the floating-point promotion rule.

Related

Why does C++ standard specify signed integer be cast to unsigned in binary operations with mixed signedness?

The C and C++ standards stipulate that, in binary operations between a signed and an unsigned integer of the same rank, the signed integer is cast to unsigned. There are many questions on SO caused by this... let's call it strange behavior: unsigned to signed conversion, C++ Implicit Conversion (Signed + Unsigned), A warning - comparison between signed and unsigned integer expressions, % (mod) with mixed signedness, etc.
But none of these give any reasons as to why the standard goes this way, rather than casting towards signed ints. I did find a self-proclaimed guru who says it's the obvious right thing to do, but he doesn't give a reasoning either: http://embeddedgurus.com/stack-overflow/2009/08/a-tutorial-on-signed-and-unsigned-integers/.
Looking through my own code, wherever I combine signed and unsigned integers, I always need to cast from unsigned to signed. There are places where it doesn't matter, but I haven't found a single example of code where it makes sense to cast the signed integer to unsigned.
What are cases where casting to unsigned in the correct thing to do? Why is the standard the way it is?
Casting from unsigned to signed results in implementation-defined behaviour if the value cannot be represented. Casting from signed to unsigned is always modulo two to the power of the unsigned's bitsize, so it is always well-defined.
The standard conversion is to the signed type if every possible unsigned value is representable in the signed type. Otherwise, the unsigned type is chosen. This guarantees that the conversion is always well-defined.
Notes
As indicated in comments, the conversion algorithm for C++ was inherited from C to maintain compatibility, which is technically the reason it is so in C++.
When this note was written, the C++ standard allowed three binary representations, including sign-magnitude and ones' complement. That's no longer the case, and there's every reason to believe that it won't be the case for C either in the reasonably bear future. I'm leaving the footnote as a historical relic, but it says nothing relevant to the current language.
It has been suggested that the decision in the standard to define signed to unsigned conversions and not unsigned to signed conversion is somehow arbitrary, and that the other possible decision would be symmetric. However, the possible conversion are not symmetric.
In both of the non-2's-complement representations contemplated by the standard, an n-bit signed representation can represent only 2n−1 values, whereas an n-bit unsigned representation can represent 2n values. Consequently, a signed-to-unsigned conversion is lossless and can be reversed (although one unsigned value can never be produced). The unsigned-to-signed conversion, on the other hand, must collapse two different unsigned values onto the same signed result.
In a comment, the formula sint = uint > sint_max ? uint - uint_max : uint is proposed. This coalesces the values uint_max and 0; both are mapped to 0. That's a little weird even for non-2s-complement representations, but for 2's-complement it's unnecessary and, worse, it requires the compiler to emit code to laboriously compute this unnecessary conflation. By contrast the standard's signed-to-unsigned conversion is lossless and in the common case (2's-complement architectures) it is a no-op.
If the signed casting was chosen, then simple a+1 would always result in signed type (unless constant was typed as 1U).
Assume a was unsigned int, then this seemingly innocent increment a+1 could lead to things like undefined overflow or "index out of bound", in the case of arr[a+1]
Thus, "unsigned casting" seems like a safer approach because people probably don't even expect casting to be happening in the first place, when simply adding a constant.
This is sort of a half-answer, because I don't really understand the committee's reasoning.
From the C90 committee's rationale document: https://www.lysator.liu.se/c/rat/c2.html#3-2-1-1
Since the publication of K&R, a serious divergence has occurred among implementations of C in the evolution of integral promotion rules. Implementations fall into two major camps, which may be characterized as unsigned preserving and value preserving. The difference between these approaches centers on the treatment of unsigned char and unsigned short, when widened by the integral promotions, but the decision has an impact on the typing of constants as well (see §3.1.3.2).
... and apparently also on the conversions done to match the two operands for any operator. It continues:
Both schemes give the same answer in the vast majority of cases, and both give the same effective result in even more cases in implementations with twos-complement arithmetic and quiet wraparound on signed overflow --- that is, in most current implementations.
It then specifies a case where ambiguity of interpretation arises, and states:
The result must be dubbed questionably signed, since a case can be made for either the signed or unsigned interpretation. Exactly the same ambiguity arises whenever an unsigned int confronts a signed int across an operator, and the signed int has a negative value. (Neither scheme does any better, or any worse, in resolving the ambiguity of this confrontation.) Suddenly, the negative signed int becomes a very large unsigned int, which may be surprising --- or it may be exactly what is desired by a knowledgable programmer. Of course, all of these ambiguities can be avoided by a judicious use of casts.
and:
The unsigned preserving rules greatly increase the number of situations where unsigned int confronts signed int to yield a questionably signed result, whereas the value preserving rules minimize such confrontations. Thus, the value preserving rules were considered to be safer for the novice, or unwary, programmer. After much discussion, the Committee decided in favor of value preserving rules, despite the fact that the UNIX C compilers had evolved in the direction of unsigned preserving.
Thus, they consider the case of int + unsigned an unwanted situation, and chose conversion rules for char and short that yield as few of those situations as possible, even though most compilers at the time followed a different approach. If I understand right, this choice then forced them to follow the current choice of int + unsigned yielding an unsigned operation.
I still find all of this truly bizarre.
Why does C++ standard specify signed integer be cast to unsigned in binary operations with mixed signedness?
I suppose that you mean converted rather than "cast". A cast is an explicit conversion.
As I'm not the author nor have I encountered documentation about this decision, I cannot promise that my explanation is the truth. However, there is a fairly reasonable potential explanation: Because that's how C works, and C++ was based on C. Unless there was an opportunity improve upon the rules, there would be no reason to change what works and what programmers have been used to. I don't know if the committee even deliberated changing this.
I know what you may be thinking: "Why does C standard specify signed integer...". Well, I'm also not the author of C standard, but there is at least a fairly extensive document titled "Rationale for
American National Standard
for Information Systems -
Programming Language -
C". As extensive it is, it doesn't cover this question unfortunately (it does cover a very similar question of how to promote integer types narrower than int in which regard the standard differs from some of the C implementations that pre-date the standard).
I don't have access to a pre-standard K&R documents, but I did find a passage from book "Expert C Programming: Deep C Secrets" which quotes rules from the pre-standard K&R C (in context of comparing the rule with the standardised ones):
Section 6.6 Arithmetic Conversions
A great many operators cause conversions and yield result types in a similar way. This pattern will be called the "usual arithmetic conversions."
First, any operands of type char or short are converted to int, and any of type float are converted to double. Then if either operand is double, the other is converted to double and that is the type of the result. Otherwise, if either operand is long, the other is converted to long and that is the type of the result. Otherwise, if either operand is unsigned, the other is converted to unsigned and that is the type of the result. Otherwise, both operands must be int, and that is the type of the result.
So, it appears that this has been the rule from since before standardisation of C and was presumably the chosen by the designer himself. Unless someone can find a written rationale, we may never know the answer.
What are cases where casting to unsigned in the correct thing to do?
Here is an extremely simple case:
unsigned u = INT_MAX;
u + 42;
The type of the literal 42 is signed, so with your proposed / designer rule, u + 42 would also be signed. This would be quite surprising and would result in the shown program to have undefined behaviour due to signed integer overflow.
Basically, implicit conversion to signed and to unsigned each have their problems.

Widening and narrowing rules in C/C++

I was trying to read through the C/C++ standard for this but I can't find the answer.
Say you have the following snippet:
int8_t m;
int64_t n;
And that at some point you perform m + n, the addition itself is a binary operator and I think the most likely think that happen in such a case is:
Wide m to the same size of n, call the widening result m_prime
Perform m_prime + n
Return a result of type int64_t
I was trying to understand however if instead of performing m+n I had performed n+m the result would change (because maybe there could be a narrowing operations instead of a widening).
I cannot find the part of the standard that clarify this point (which I understand it could sound trivial).
Can anyone point me where I can find this in the standard? or what happens in general in situations like the one I exposed?
Personally I've been looking at the section "Additive operators" but it doesn't seem to me to explain what happens, pointer arithmetic is covered a bit, but there's no reference to some casting rule implicitly applied.
You can assume I'm talking about C++11, but any other standard I guess would apply the same rules.
See Clause 5 Expressions [expr]. Point 10 starts
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield result types in a similar way. The purpose is to yield a common type, which is also the type of the result. This pattern is called the usual arithmetic conversions, which are defined as follows:
The sub-points that follow say things like "If either operand is...", "...the other shall...", "If both operands ..." etc.
For your specific example, see 10.5.2
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank shall be converted to the type of the operand with greater rank.

Is promotion and widening the same thing?

Is there's a difference between promotion and widening, I've heard that widening only describes integral promotion.
Widening "typically" refers to integral/floating point types (as in a char going to a long or float to double), but it can also refer to character widening (as in going from a char type to a wchar_t type).
Widening conversions are also known as "promotions" and narrowing conversions are known as "coercion".
The notion of "promotion" and "coercion" can also be used in the OO since as well (polymorphism); as in promotion of a base class to a derived type, or coercion of derived type to base. In this since it's still a "widening" and "narrowing" as the address space used for the base is "less" than the derived type (hence you are widening/promoting your types when "up-casting", or narrowing/coercing your types when "down-casting").
So to answer directly: Is there's a difference between promotion and widening .. no not really (unless you are feeling pedantic), though I probably wouldn't say "widen that class type" over "promote that class type" if I was talking about non-integrals (just to avoid any possible initial confusion).
It really depends on context, because the term "widening" is an informal term, and the meaning varies a bit depending on who is telling the story. I'll describe some common interpretations (but not the only ones).
Before doing that, it is necessary to describe what promotions are.
The C++ standard describes integral promotions (between integral types) and floating point promotions (between floating point types). Conversion between an integral type and a floating point type is not described as a promotion.
The common features are that promotions are generally value preserving (except from signed to unsigned integral types, which uses modulo arithmetic) but need not involve increasing the size of a variable (or range of values it can represent). For example, a short may be promoted to an int, but a short and an int may also be the same size (albeit that is implementation/compiler dependent).
The C++ standard doesn't use the term "widening" at all (except in some contexts in the library, unrelated to type conversions). A common informal meaning, in context of integral and floating point conversions, is a promotion that is BOTH value preserving AND to a larger type. The implementation is typically setting the additional bits in the result to zero (i.e. making the value wider without fiddling the bits that represent it). So signed char to short, short to long, unsigned char to unsigned short are widening conversions (assuming none of the types are equal size). Similarly, float to double is a widening conversion (the standard guarantees that the values a float can represent is a strict subset of the values that a double can represent). Conversion from int to double is not a widening (e.g. not necessarily value preserving, bits may be fiddled).
Widening is also sometimes used to describe a conversion of a pointer to derived class into a pointer to base class (or between similar references). The reverse is called "narrowing" and - in C++ - can only be forced with an explicit type conversion.

is isnan preserved when asigning to different size FP type.

Basically:
float nanf=std::numeric_limits::signaling_NaN<decltype(g_nanf)>();
double nand = nanf;
assert(std::isnan(nand));
can assert fire?
also what if I was assigning double nan to float
From N3337:
4.6 Floating point promotion [conv.fpprom]
1 A prvalue of type float can be converted to a prvalue of type double. The value is unchanged.
,
4.8 Floating point conversions [conv.double]
1 A prvalue of floating point type can be converted to a prvalue of another floating point
type. If the source value can be exactly represented in the
destination type, the result of the conversion is that exact
representation. If the source value is between two adjacent
destination values, the result of the conversion is an
implementation-defined choice of either of those values. Otherwise,
the behavior is undefined.
and
3.9.1 Fundamental types [basic.fundamental]
8 There are three floating point types: float, double, and long double. The type double provides
at least as much precision as float, and the type long double provides
at least as much precision as double. The set of values of the type
float is a subset of the set of values of the type double; the set of
values of the type double is a subset of the set of values of the type
long double.
Now we should confirm that NaN is in fact a valid value for a floating point type. The definition for isnan refers back to the C Standard. From N1570:
7.12.3.4 The isnan macro
2 The isnan macro determines whether its argument value is a NaN.
So to summarise: yes, going from float to double should preserve NaN-ness. Going from double to float is perhaps a little iffier, but as double supports NaN, we conclude that this conversion also must be preserved, by the "subset of values" wording.
(What the word "value" actually means seems to be somewhat ill-defined.)
C++ doesn't require adherence to IEEE-754; however, for platforms that do follow the standard, clause 6.2 governs the behavior for quiet NaNs:
For an operation with quiet NaN inputs, other than maximum and minimum operations, if a floating-point result is to be delivered the result shall be a quiet NaN ...
and signaling NaNs:
Under default exception handling, any operation signaling an invalid operation exception and for which a floating-point result is to be delivered shall deliver a quiet NaN.
Signaling NaNs shall be reserved operands that, under default exception handling, signal the invalid operation exception (see 7.2) for every general-computational and signaling-computational operation ...
BoBTFish's answer sounds pretty convincing to me. Now, I don't have normative information on this subject myself, but I want to provide an alternative answer based on a bit of deduction:
It is generally true that:
Generally speaking, any expression involving a NaN must return NaN.
An arithmetic expression might involve casting from/to single and double, maybe multiple times.
I don't see how the compiler could could meet both requirements without preserving NaN during a float/double cast.

What does the C++ standard say about results of casting value of a type that lies outside the range of the target type?

Recently I had to perform some data type conversions from float to 16 bit integer. Essentially my code reduces to the following
float f_val = 99999.0;
short int si_val = static_cast<short int>(f_val);
// si_val is now -32768
This input value was a problem and in my code I had neglected to check the limits of the float value so I can see my fault, but it made me wonder about the exact rules of the language when one has to do this kind of ungainly cast. I was slightly surprised to find that value of the cast was -32768. Furthermore, this is the value I get whenever the value of the float exceeds the limits of a 16 bit integer. I have googled this but found a surprising lack of detailed info about it. The best I could find was the following from cplusplus.com
Converting to int from some smaller integer type, or to double from
float is known as promotion, and is guaranteed to produce the exact
same value in the destination type. Other conversions between
arithmetic types may not always be able to represent the same value
exactly:
If the conversion is from a floating-point type to an integer type, the value
is truncated (the decimal part is removed).
The conversions from/to bool consider false equivalent to zero (for numeric
types) and to null pointer (for pointer types); and true equivalent to all
other values.
Otherwise, when the destination type cannot represent the value, the conversion
is valid between numerical types, but the value is
implementation-specific (and may not be portable).
This suggestion that the results are implementation defined does not surprise me, but I have heard that cplusplus.com is not always reliable.
Finally, when performing the same cast from a 32 bit integer to 16 bit integer (again with a value outisde of 16 bit range) I saw results clearly indicating integer overflow. Although I was not surprised by this it has added to my confusion due to the inconsistency with the cast from float type.
I have no access to the C++ standard, but a lot of C++ people here do so I was wondering what the standard says on this issue? Just for completeness, I am using g++ version 4.6.3.
You're right to question what you've read. The conversion has no defined behaviour, which contradicts what you quoted in your question.
4.9 Floating-integral conversions [conv.fpint]
1 A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates;
that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be
represented in the destination type. [ Note: If the destination type is bool, see 4.12. -- end note ]
One potentially useful permitted result that you might get is a crash.