In the following code:
#include <cstdint>
#include <cinttypes>
#include <cstdio>
using namespace std;
int main() {
double xd = 1.18;
int64_t xi = 1000000000;
int64_t res1 = (double)(xi * xd);
double d = xi * xd;
int64_t res2 = d;
printf("%" PRId64"\n", res1);
printf("%" PRId64"\n", res2);
}
Using v4.9.3 g++ -std=c++14 targeting 32-bit Windows I get output:
1179999999
1180000000
Are these values allowed to be different?
I expected that, even if the compiler uses a higher internal precision than double for the computation of xi * xd, it should do this consistently. Loss of precising in floating conversion is implementation-defined, and also the precision of this calculation is implementation-defined - [c.limits]/3 says that FLT_EVAL_METHOD should be imported from C99. IOW I expected that it should not be allowed to use a different precision for xi * xd on one line than it does on another line.
Note: This is intentionally a C++ question and not a C question - I believe the two languages have different rules in this area.
even if the compiler uses a higher internal precision than double for the computation of xi * xd, it should do this consistently
Whether required or not (discussed below), this clearly doesn't happen: Stackoverflow is littered with questions from people who've seen similar-seeming calculations change for no ostensible reason within the same program.
The C++ Standard draft n3690 says (emphasis mine):
The values of the floating operands and the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby.62
62) The cast and assignment operators must still perform their specific conversions as described in 5.4, 5.2.9 and 5.17.
So - in agreement with M.M.'s comment and contrary to my earlier edit - it's the version with the (double) cast that must be rounded to a 64-bit double - which evidently happens to be >= 1180000000 in the run documented in the question - before truncation to integer. The more general case sans 62) leaves the compiler freedom not to round early in the other case.
[c.limits]/3 says that FLT_EVAL_METHOD should be imported from C99. IOW I expected that it should not be allowed to use a different precision for xi * xd on one line than it does on another line.
Check the cppreference page:
Regardless of the value of FLT_EVAL_METHOD, any floating-point expression may be contracted, that is, calculated as if all intermediate results have infinite range and precision (unless #pragma STDC FP_CONTRACT is off)
As tmyklebu comments, it continues:
Cast and assignment strip away any extraneous range and precision: this models the action of storing a value from an extended-precision FPU register into a standard-sized memory location.
This last agrees with the "62)" part of the Standard.
M.M. comments:
STDC FP_CONTRACT does not seem to appear in the C++ Standard and also it's not clear to me exactly to what extent the C99 behaviour is 'imported'
Doesn't appear in the draft I looked at. That suggests C++ doesn't guarantee its availability, leaving the default mentioned above of "any floating-point expression may be contracted", but we know per M.M. comments and the Standard and cppreference quotes above the (double) cast is an exception forcing rounding to 64 bits.
The C++ Standard draft mentioned above says of <cfloat>:
The contents are the same as the Standard C library header .
See also: ISO C 7.1.5, 5.2.4.2.2, 5.2.4.2.1.
If one of those C Standards required STDC FP_CONTRACT there's more chance of it being portable for use by C++ programs, but I've not surveyed implementations for support.
Depending on FLT_EVAL_METHOD, xi * xd may be calculated with higher precision than double. If xi were so large that it cannot be represented exactly in double, then I'm not even sure if the compiler would be allowed to convert it exactly to long double or not - probably not, because that conversion happens before anything covered by FLT_EVAL_METHOD. There is no requirement that higher precision must be used consistently.
There are two places where conversion to double must happen: At the point of the cast (double) and at the point of assignment to a double. There have been gcc versions where the cast to double was "optimised" away if a value was already "officially" a double (like xi * xd here) even if in reality it was higher precision; that "optimisation" was always a bug because a cast must convert.
So you may have run into this bug where a cast to double wasn't performed (if the bug is still there), you may have run into inconsistent use of higher precision, which is legal if FLT_EVAL_METHOD allows it, and you may even have run into inconsistent use of higher precision when FLT_EVAL_METHOD didn't allow it at all, which would again be a bug (not the inconsistency, but the use of higher precision in the first place).
Related
In a specific online judge running 32-bit GCC 7.3.0, this:
#include <iostream>
volatile float three = 3.0f, seven = 7.0f;
int main()
{
float x = three / seven;
std::cout << x << '\n';
float y = three / seven;
std::cout << (x == y) << '\n';
}
Outputs
0.428571
0
To me this seems like it violates IEEE 754, since the standard requires basic operations to be correctly rounded. Now I know there are a couple reasons for IEEE 754 floating-point calculations to be non-deterministic as discussed here, but I don't see how any of them applies to this example. Here are some of the things I considered:
Excess precision and contraction: I'm doing a single calculation and assigning the result to a float, which should force both of the values to be rounded to float precision.
Compile-time calculations: three and seven are volatile so both calculations must be done at runtime.
Floating-point flags: The calculations are done in the same thread almost immediately after each other, so the flags should be the same.
Does this necessarily indicate that the online judge system doesn't conform to IEEE 754?
Also, removing the statement printing x, adding a statement to print y, or making y volatile all changes the result. This seems to contradict my understanding of the C++ standard which I think requires the assignments to round off any excess precision.
Thanks to geza for pointing out that this is a known issue. I would still like a definitive answer on whether this conforms to the C++ standard and IEEE 754 though, since the C++ standard appears to require assignments to round off excess precision. Here's the quote from draft N4860 [expr.pre]:
The values of the floating-point operands and the results of floating-point expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby.50
50) The cast and assignment operators must still perform their specific conversions as described in 7.6.1.3, 7.6.3, 7.6.1.8 and 7.6.19.
Does this necessarily indicate that the online judge system doesn't conform to IEEE 754?
Yes, with minor caveats.
One, C++ cannot just “conform” to IEEE 754. There has to be some specification of how things in C++ bind (connect) to IEEE 754, such as statements that the float format is IEEE-754 binary32, that x / y uses IEEE-754 division, and so on. C++ 2017 draft N4659 refers to LIA-1, but I do not see that it clearly requires LIA-1 be used even if std::numeric_limits<float>::is_iec559 reports true, and LIA-1 apparently only suggests language bindings.
The C++ standard tells us the fact that std::numeric_limits<float>::is_iec559 reports true means the float type conforms to ISO/IEC/IEEE 60559, which is effectively IEEE 754-2008. But, in addition to the binding problem, I do not see a statement in the C++ standard that nullifies 8 [expr] 13 (“The values of the floating operands and the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby.”) when is_iec559 is true. Although it is true that the cast and conversion operators must “perform their specific conversions” (footnote 64), and this forces float y = three / seven; to produce the correct IEEE-754 binary32 results even if binary64 or Intel’s 80-bit floating-point are used for the division, it might not force it to produce the correct result if only a little excess precision is used. (If at least 48 bits of precision are used, no double-rounding errors occur for division when rounded to the 24-bits of the binary32 format. If fewer excess bits are used, there may be some cases that experience double rounding errors.)
I believe the intent of is_iec559 is to indicate a sensible binding, and the behavior shown in the question does violate this. In particular, the defect shown in the question is caused by failing to round the excess precision used in the division to the actual float type; it is not caused by the hypothetical use of less-than-enough excess precision mentioned above.
I am working on floating point determinism and having already studied so many surprising potential causes of indeterminism, I am starting to get paranoid about copying floats:
Does anything in the C++ standard or in general guarantee me that a float lvalue, after being copied to another float variable or when used as a const-ref or by-value parameter, will always be bitwise equivalent to the original value?
Can anything cause a copied float to be bitwise inquivalent to the original value, such as changing the floating point environment or passing it into a different thread?
Here is some sample code based on what I use to check for equivalence of floating point values in my test-cases, this one will fail because it expects FE_TONEAREST:
#include <cfenv>
#include <cstdint>
// MSVC-specific pragmas for floating point control
#pragma float_control(precise, on)
#pragma float_control(except, on)
#pragma fenv_access(on)
#pragma fp_contract(off)
// May make a copy of the floats
bool compareFloats(float resultValue, float comparisonValue)
{
// I was originally doing a bit-wise comparison here but I was made
// aware in the comments that this might not actually be what I want
// so I only check against the equality of the values here now
// (NaN values etc. have to be handled extra)
bool areEqual = (resultValue == comparisonValue);
// Additional outputs if not equal
// ...
return areEqual;
}
int main()
{
std::fesetround(FE_TOWARDZERO)
float value = 1.f / 10;
float expectedResult = 0x1.99999ap-4;
compareFloats(value, expectedResult);
}
Do I have to be worried that if I pass a float by-value into the comparison function it might come out differently on the other side, even though it is an lvalue?
No there is no such guarantee.
Subnormal, non-normalised floating points, and NaN are all cases where the bit patterns may differ.
I believe that signed negative zero is allowed to become a signed positive zero on assignment, although IEEE754 disallows that.
The C++ standard itself has virtually no guarantees on floating point math because it does not mandate IEEE-754 but leaves it up to the implementation (emphasis mine):
[basic.fundamental/12]
There are three floating-point types: float, double, and long double.
The type double provides at least as much precision as float, and the type long double provides at least as much precision as double.
The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double.
The value representation of floating-point types is implementation-defined.
[ Note: This document imposes no requirements on the accuracy of floating-point operations; see also [support.limits]. — end note ]
The C++ code you write is a high-level abstract description of what you want the abstract machine to do, and it is fully in the hands of the compiler what this gets translated to. "Assignments" is an aspect of the C++ standard, and as shown above, the C++ standard does not mandate the behavior of floating point operations. To verify the statement "assignments leave floating point values unchanged" your compiler would have to specify its floating point behavior in terms of the C++ abstract machine, and I've not seen any such documentation (especially not for MSVC).
In other words: Without nailing down the exact compiler, compiler version, compilation flags etc., it is impossible to say for sure what the floating point semantics of a C++ program are (especially regarding the difficult cases like rounding, NaNs or signed zero). Most compilers differentiate between strict IEEE conformance and relaxing some of those restrictions, but even then you are not necessarily guaranteed that the program has the same outputs in non-optimized vs optimized builds due to, say, constant folding, precision of intermediate results and so on.
Point in case: For gcc, even with -O0, your program in question does not compute 1.f / 10 at run-time but at compile-time and thus your rounding mode settings are ignored: https://godbolt.org/z/U8B6bc
You should not be paranoid about copying floats in particular but paranoid of compiler optimizations for floating point in general.
Consider
#include <iostream>
int main()
{
double a = 1.0 / 0;
double b = -1.0 / 0;
double c = 0.0 / 0;
std::cout << a << b << c; // to stop compilers from optimising out the code.
}
I have always thought that a will be +Inf, b will be -Inf, and c will be NaN. But I also hear rumours that strictly speaking the behaviour of floating point division by zero is undefined and therefore the above code cannot considered to be portable C++. (That theoretically obliterates the integrity of my million line plus code stack. Oops.)
Who's correct?
Note I'm happy with implementation defined, but I'm talking about cat-eating, demon-sneezing undefined behaviour here.
C++ standard does not force the IEEE 754 standard, because that depends mostly on hardware architecture.
If the hardware/compiler implement correctly the IEEE 754 standard, the division will provide the expected INF, -INF and NaN, otherwise... it depends.
Undefined means, the compiler implementation decides, and there are many variables to that like the hardware architecture, code generation efficiency, compiler developer laziness, etc..
Source:
The C++ standard state that a division by 0.0 is undefined
C++ Standard 5.6.4
... If the second operand of / or % is zero the behavior is undefined
C++ Standard 18.3.2.4
...static constexpr bool is_iec559;
...56. True if and only if the type adheres to IEC 559 standard.217
...57. Meaningful for all floating point types.
C++ detection of IEEE754:
The standard library includes a template to detect if IEEE754 is supported or not:
static constexpr bool is_iec559;
#include <numeric>
bool isFloatIeee754 = std::numeric_limits<float>::is_iec559();
What if IEEE754 is not supported?
It depends, usually a division by 0 trigger a hardware exception and make the application terminate.
Quoting cppreference:
If the second operand is zero, the behavior is undefined, except that if floating-point division is taking place and the type supports IEEE floating-point arithmetic (see std::numeric_limits::is_iec559), then:
if one operand is NaN, the result is NaN
dividing a non-zero number by ±0.0 gives the correctly-signed infinity and FE_DIVBYZERO is raised
dividing 0.0 by 0.0 gives NaN and FE_INVALID is raised
We are talking about floating-point division here, so it is actually implementation-defined whether double division by zero is undefined.
If std::numeric_limits<double>::is_iec559 is true, and it is "usually true", then the behaviour is well-defined and produces the expected results.
A pretty safe bet would be to plop down a:
static_assert(std::numeric_limits<double>::is_iec559, "Please use IEEE754, you weirdo");
... near your code.
Division by zero both integer and floating point are undefined behavior [expr.mul]p4:
The binary / operator yields the quotient, and the binary % operator yields the remainder from the division
of the first expression by the second. If the second operand of / or % is zero the behavior is undefined. ...
Although implementation can optionally support Annex F which has well defined semantics for floating point division by zero.
We can see from this clang bug report clang sanitizer regards IEC 60559 floating-point division by zero as undefined that even though the macro __STDC_IEC_559__ is defined, it is being defined by the system headers and at least for clang does not support Annex F and so for clang remains undefined behavior:
Annex F of the C standard (IEC 60559 / IEEE 754 support) defines the
floating-point division by zero, but clang (3.3 and 3.4 Debian snapshot)
regards it as undefined. This is incorrect:
Support for Annex F is optional, and we do not support it.
#if STDC_IEC_559
This macro is being defined by your system headers, not by us; this is
a bug in your system headers. (FWIW, GCC does not fully support Annex
F either, IIRC, so it's not even a Clang-specific bug.)
That bug report and two other bug reports UBSan: Floating point division by zero is not undefined and clang should support Annex F of ISO C (IEC 60559 / IEEE 754) indicate that gcc is conforming to Annex F with respect to floating point divide by zero.
Though I agree that it isn't up to the C library to define STDC_IEC_559 unconditionally, the problem is specific to clang. GCC does not fully support Annex F, but at least its intent is to support it by default and the division is well-defined with it if the rounding mode isn't changed. Nowadays not supporting IEEE 754 (at least the basic features like the handling of division by zero) is regarded as bad behavior.
This is further support by the gcc Semantics of Floating Point Math in GCC wiki which indicates that -fno-signaling-nans is the default which agrees with the gcc optimizations options documentation which says:
The default is -fno-signaling-nans.
Interesting to note that UBSan for clang defaults to including float-divide-by-zero under -fsanitize=undefined while gcc does not:
Detect floating-point division by zero. Unlike other similar options, -fsanitize=float-divide-by-zero is not enabled by -fsanitize=undefined, since floating-point division by zero can be a legitimate way of obtaining infinities and NaNs.
See it live for clang and live for gcc.
Division by 0 is undefined behavior.
From section 5.6 of the C++ standard (C++11):
The binary / operator yields the quotient, and the binary % operator
yields the remainder from the division of the first expression by the
second. If the second operand of / or % is zero the behavior is
undefined. For integral operands the / operator yields the algebraic
quotient with any fractional part discarded; if the quotient a/b is
representable in the type of the result, (a/b)*b + a%b is equal to a .
No distinction is made between integer and floating point operands for the / operator. The standard only states that dividing by zero is undefined without regard to the operands.
In [expr]/4 we have
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined. [ Note: most existing implementations of C++ ignore integer overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all floating point exceptions vary among machines, and is usually adjustable by a library function. —end note ]
Emphasis mine
So per the standard this is undefined behavior. It does go on to say that some of these cases are actually handled by the implementation and are configurable. So it won't say it is implementation defined but it does let you know that implementations do define some of this behavior.
As to the submitter's question 'Who's correct?', it is perfectly OK to say that both answers are correct. The fact that the C standard describes the behavior as 'undefined' DOES NOT dictate what the underlying hardware actually does; it merely means that if you want your program to be meaningful according to the standard you -may not assume- that the hardware actually implements that operation. But if you happen to be running on hardware that implements the IEEE standard, you will find the operation is in fact implemented, with the results as stipulated by the IEEE standard.
This also depends on the floating point environment.
cppreference has details:
http://en.cppreference.com/w/cpp/numeric/fenv
(no examples though).
This should be available in most desktop/server C++11 and C99 environments. There are also platform-specific variations that predate the standardization of all this.
I would expect that enabling exceptions makes the code run more slowly, so probably for this reason most platforms that I know of disable exceptions by default.
I'm a bit surprised with MSVC ldexp behavior (it happens in Visual Studio 2013, but also with all older versions at least down to 2003...).
For example:
#include <math.h>
#include <stdio.h>
int main()
{
double g=ldexp(2.75,-1074);
double e=ldexp(3.0,-1074);
printf("g=%g e=%g \n",g,e);
return 0;
}
prints
g=9.88131e-324 e=1.4822e-323
The first one g is strangely rounded...
It is 2.75 * fmin_denormalized, so i definitely expect the second result e.
If I evaluate 2.75*ldexp(1.0,-1074) I correctly get same value as e.
Are my expectations too high, or does Microsoft fail to comply with some standard?
While the question does not explicitly state this, I assume that the output expected by the asker is:
g=1.4822e-323 e=1.4822e-323
This is what we would expect from a C/C++ compiler that promises strict adherence to IEEE-754. The question is tagged both C and C++, I will address C99 here as that is the standard I have in hand.
In Annex F, which describes IEC 60559 floating-point arithmetic (where IEC 60559 is basically another name for IEEE-754) the C99 standard specifies:
An implementation that defines __STDC_IEC_559__ shall conform to the
specifications in this annex. [...] The scalbn and scalbln
functions in <math.h> provide the scalb function recommended in the
Appendix to IEC 60559.
Further down in that annex, section F.9.3.6 specifies:
On a binary system, ldexp(x, exp) is equivalent to scalbn(x, exp).
The appendix referenced by the C99 standard is the appendix of the 1985 version of IEEE-754, where we find the scalb function defined as follows:
Scalb(y, N) returns y × 2N for integral values N without computing 2N.
scalb is defined as a multiplication with a power of two, and multiplications must be rounded correctly based on the current rounding mode according to the standard. Therefore, with a conforming C99 compiler ldexp() must return a correctly rounded result if the compiler defines __STDC_IEC_559__. In the absence of a library call setting the rounding mode, the default rounding mode "round to nearest or even" is in effect.
I do not have access to MSVC 2013, so I do not know whether it defines that symbol or not. This could even depend on a compiler flag setting, such as /fp:strict.
After tracking down my copy of the C++11 standard, I cannot find any reference to __STDC_IEC_559__ or any language about IEEE-754 bindings. According to the answer to this question this is because that support is included by referring to the C99 standard.
This happens because during the ldexp calculation the 2.75 gets truncated to 2, which happens because at that small of a denormalized number the bits that represent the '.75' part get shifted off the end of the representable number and disappear. Whether this is a bug or designed behavior can be debated.
When calculating 2.75*ldexp(1.0,-1074) normal rounding happens, and the 2.75 becomes 3.
EDIT: ldexp should round correctly, and this is a bug.
OP results do not fail to comply with the C spec as that spec does not define the preciseness of calculations.
OP result may have failed IEEE 754, but it depends on the rounding mode in use at the time, which is not posted. Yet OP's reports 2.75*ldexp(1.0,-1074) worked as expected implying at that time, the expected rounding mode was in effect.
Using printf("%la",x) aids in seeing clearly what is happening near the limits of double.
I would expect g to "round to nearest - ties to even" with the result of 0x1.8p-1073 - which did occur with gcc on windows.
Ideally g would have the value of 0x1.6p-1073
0x0.0p-1073 Zero
0x0.8p-1073 next higher double DBL_TRUE_MIN
0x1.0p-1073 next higher double
0x1.6p-1073 ideal `g` answer, but not available as a double
0x1.8p-1073 next higher double
To be fair, it could be a processor bug - it has happened before.
Ref
double g=ldexp(2.75,-1074);
printf("%la\n%la\n", 2.75,ldexp(2.75,-1074));
printf("%la\n%la\n", 3.0 ,ldexp(3.0 ,-1074));
double e=ldexp(3.0,-1074);
printf("%la\n%la\n", g,e);
printf("%la\n%la\n", 9.88131e-324, DBL_TRUE_MIN);
printf("g=%g e=%g \n",g,e);
0x1.6p+1
0x1.8p-1073
0x1.8p+1
0x1.8p-1073
0x1.8p-1073
0x1.8p-1073
0x1p-1073
0x1p-1074
g=1.4822e-323 e=1.4822e-323
I have recently analyzed an old piece of code compiled with VS2005 because of a different numerical behaviour in "debug" (no optimizations) and "release" (/O2 /Oi /Ot options) compilations. The (reduced) code looks like:
void f(double x1, double y1, double x2, double y2)
{
double a1, a2, d;
a1 = atan2(y1,x1);
a2 = atan2(y2,x2);
d = a1 - a2;
if (d == 0.0) { // NOTE: I know that == on reals is "evil"!
printf("EQUAL!\n");
}
The function f is expected to print "EQUAL" if invoked with identical pairs of values (e.g. f(1,2,1,2)), but this doesn't always happen in "release". Indeed it happened that the compiler has optimized the code as if it were something like d = a1-atan2(y2,x2) and removed completely the assignment to the intermediate variable a2. Moreover, it has taken advantage of the fact that the second atan2()'s result is already on the FPU stack, so reloaded a1 on FPU and subtracted the values. The problem is that the FPU works at extended precision (80 bits) while a1 was "only" double (64 bits), so saving the first atan2()'s result in memory has actually lost precision. Eventually, d contains the "conversion error" between extended and double precision.
I know perfectly that identity (== operator) with float/double should be avoided. My question is not about how to check proximity between doubles. My question is about how "contractual" an assignment to a local variable should be considered. By my "naive" point of view, an assignment should force the compiler to convert a value to the precision represented by the variable's type (double, in my case). What if the variables were "float"? What if they were "int" (weird, but legal)?
So, in short, what does the C standard say about that cases?
By my "naive" point of view, an assignment should force the compiler to convert a value to the precision represented by the variable's type (double, in my case).
Yes, this is what the C99 standard says. See below.
So, in short, what does the C standard say about that cases?
The C99 standard allows, in some circumstances, floating-point operations to be computed at a higher precision than that implied by the type: look for FLT_EVAL_METHOD and FP_CONTRACT in the standard, these are the two constructs related to excess precision. But I am not aware of any words that could be interpreted as meaning that the compiler is allowed to arbitrarily reduce the precision of a floating-point value from the computing precision to the type precision. This should, in a strict interpretation of the standard, only happen in specific spots, such as assignments and casts, in a deterministic fashion.
The best is to read Joseph S. Myers's analysis of the parts relevant to FLT_EVAL_METHOD:
C99 allows evaluation with excess range and precision following
certain rules. These are outlined in 5.2.4.2.2 paragraph 8:
Except for assignment and cast (which remove all extra range and
precision), the values of operations with floating operands and
values subject to the usual arithmetic conversions and of floating
constants are evaluated to a format whose range and precision may
be greater than required by the type. The use of evaluation
formats is characterized by the implementation-defined value of
FLT_EVAL_METHOD:
Joseph S. Myers goes on to describe the situation in GCC before the patch that accompanies his post. The situation was just as bad as it is in your compiler (and countless others):
GCC defines FLT_EVAL_METHOD to 2 when using x87 floating point. Its
implementation, however, does not conform to the C99 requirements for
FLT_EVAL_METHOD == 2, since it is implemented by the back end
pretending that the processor supports operations on SFmode and
DFmode:
Sometimes, depending on optimization, a value may be spilled to
memory in SFmode or DFmode, so losing excess precision unpredictably
and in places other than when C99 specifies that it is lost.
An assignment will not generally lose excess precision, although
-ffloat-store may make it more likely that it does.
…
The C++ standard inherits the definition of math.h from C99, and math.h is the header that defines FLT_EVAL_METHOD. For this reason you might expect C++ compilers to follow suit, but they do not seem to be taking the issue as seriously. Even G++ still does not support -fexcess-precision=standard, although it uses the same back-end as GCC (which has supported this option since Joseph S. Myers' post and accompanying patch).