Compilers are allowed to make several assumptions that would lead to undefined behaviour (such as assuming addition doesn't overflow). May they make such an assumption with regards to floating point NaN?
For example:
double a = some_calc();
double b = a;
if( a == b )
do_something();
Can the optimizer remove the conditional statement and assume that it is always true? Or is it bound to the platform floating point rules (IEEE) and forced to do the check in case the value is NaN?
That is, can the compiler optimize based on the assumption that a double does not contain NaN? As the C++ standard doesn't say a lot about how floating point actually works on the platform, I'm not clear if this is actually fully specified.
Or is it bound to the platform floating point rules (IEEE)
Not necessarily, if the implementation uses IEEE 754 floating point numbers, std::numeric_limits<double>::is_iec559 is set to true.
and forced to do the check in case the value is NaN?
If the implementation does use IEEE 754, the result of arithmetic operations must match IEEE floating point rules, but as far as comparison goes, it can be optimised. If the body of some_calc is available for the analysis by the compiler in the same translation unit (or during link-time code generation) and it can conclude that it never returns NaN (i.e. returns a constant), it can be optimised, as the semantics of the code don't change.
Related
I am working on floating point determinism and having already studied so many surprising potential causes of indeterminism, I am starting to get paranoid about copying floats:
Does anything in the C++ standard or in general guarantee me that a float lvalue, after being copied to another float variable or when used as a const-ref or by-value parameter, will always be bitwise equivalent to the original value?
Can anything cause a copied float to be bitwise inquivalent to the original value, such as changing the floating point environment or passing it into a different thread?
Here is some sample code based on what I use to check for equivalence of floating point values in my test-cases, this one will fail because it expects FE_TONEAREST:
#include <cfenv>
#include <cstdint>
// MSVC-specific pragmas for floating point control
#pragma float_control(precise, on)
#pragma float_control(except, on)
#pragma fenv_access(on)
#pragma fp_contract(off)
// May make a copy of the floats
bool compareFloats(float resultValue, float comparisonValue)
{
// I was originally doing a bit-wise comparison here but I was made
// aware in the comments that this might not actually be what I want
// so I only check against the equality of the values here now
// (NaN values etc. have to be handled extra)
bool areEqual = (resultValue == comparisonValue);
// Additional outputs if not equal
// ...
return areEqual;
}
int main()
{
std::fesetround(FE_TOWARDZERO)
float value = 1.f / 10;
float expectedResult = 0x1.99999ap-4;
compareFloats(value, expectedResult);
}
Do I have to be worried that if I pass a float by-value into the comparison function it might come out differently on the other side, even though it is an lvalue?
No there is no such guarantee.
Subnormal, non-normalised floating points, and NaN are all cases where the bit patterns may differ.
I believe that signed negative zero is allowed to become a signed positive zero on assignment, although IEEE754 disallows that.
The C++ standard itself has virtually no guarantees on floating point math because it does not mandate IEEE-754 but leaves it up to the implementation (emphasis mine):
[basic.fundamental/12]
There are three floating-point types: float, double, and long double.
The type double provides at least as much precision as float, and the type long double provides at least as much precision as double.
The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double.
The value representation of floating-point types is implementation-defined.
[ Note: This document imposes no requirements on the accuracy of floating-point operations; see also [support.limits]. — end note ]
The C++ code you write is a high-level abstract description of what you want the abstract machine to do, and it is fully in the hands of the compiler what this gets translated to. "Assignments" is an aspect of the C++ standard, and as shown above, the C++ standard does not mandate the behavior of floating point operations. To verify the statement "assignments leave floating point values unchanged" your compiler would have to specify its floating point behavior in terms of the C++ abstract machine, and I've not seen any such documentation (especially not for MSVC).
In other words: Without nailing down the exact compiler, compiler version, compilation flags etc., it is impossible to say for sure what the floating point semantics of a C++ program are (especially regarding the difficult cases like rounding, NaNs or signed zero). Most compilers differentiate between strict IEEE conformance and relaxing some of those restrictions, but even then you are not necessarily guaranteed that the program has the same outputs in non-optimized vs optimized builds due to, say, constant folding, precision of intermediate results and so on.
Point in case: For gcc, even with -O0, your program in question does not compute 1.f / 10 at run-time but at compile-time and thus your rounding mode settings are ignored: https://godbolt.org/z/U8B6bc
You should not be paranoid about copying floats in particular but paranoid of compiler optimizations for floating point in general.
It is common knowledge that one has to be careful when comparing floating point values. Usually, instead of using ==, we use some epsilon or ULP based equality testing.
However, I wonder, are there any cases, when using == is perfectly fine?
Look at this simple snippet, which cases are guaranteed to succeed?
void fn(float a, float b) {
float l1 = a/b;
float l2 = a/b;
if (l1==l1) { } // case a)
if (l1==l2) { } // case b)
if (l1==a/b) { } // case c)
if (l1==5.0f/3.0f) { } // case d)
}
int main() {
fn(5.0f, 3.0f);
}
Note: I've checked this and this, but they don't cover (all of) my cases.
Note2: It seems that I have to add some plus information, so answers can be useful in practice: I'd like to know:
what the C++ standard says
what happens, if a C++ implementation follows IEEE-754
This is the only relevant statement I found in the current draft standard:
The value representation of floating-point types is implementation-defined. [ Note: This document imposes no requirements on the accuracy of floating-point operations; see also [support.limits]. — end note ]
So, does this mean, that even "case a)" is implementation defined? I mean, l1==l1 is definitely a floating-point operation. So, if an implementation is "inaccurate", then could l1==l1 be false?
I think this question is not a duplicate of Is floating-point == ever OK?. That question doesn't address any of the cases I'm asking. Same subject, different question. I'd like to have answers specifically to case a)-d), for which I cannot find answers in the duplicated question.
However, I wonder, are there any cases, when using == is perfectly fine?
Sure there are. One category of examples are usages that involve no computation, e.g. setters that should only execute on changes:
void setRange(float min, float max)
{
if(min == m_fMin && max == m_fMax)
return;
m_fMin = min;
m_fMax = max;
// Do something with min and/or max
emit rangeChanged(min, max);
}
See also Is floating-point == ever OK? and Is floating-point == ever OK?.
Contrived cases may "work". Practical cases may still fail. One additional issue is that often optimisation will cause small variations in the way the calculation is done so that symbolically the results should be equal but numerically they are different. The example above could, theoretically, fail in such a case. Some compilers offer an option to produce more consistent results at a cost to performance. I would advise "always" avoiding the equality of floating point numbers.
Equality of physical measurements, as well as digitally stored floats, is often meaningless. So if your comparing if floats are equal in your code you are probably doing something wrong. You usually want greater than or less that or within a tolerance. Often code can be rewritten so these types of issues are avoided.
Only a) and b) are guaranteed to succeed in any sane implementation (see the legalese below for details), as they compare two values that have been derived in the same way and rounded to float precision. Consequently, both compared values are guaranteed to be identical to the last bit.
Case c) and d) may fail because the computation and subsequent comparison may be carried out with higher precision than float. The different rounding of double should be enough to fail the test.
Note that the cases a) and b) may still fail if infinities or NANs are involved, though.
Legalese
Using the N3242 C++11 working draft of the standard, I find the following:
In the text describing the assignment expression, it is explicitly stated that type conversion takes place, [expr.ass] 3:
If the left operand is not of class type, the expression is implicitly converted (Clause 4) to the cv-unqualified type of the left operand.
Clause 4 refers to the standard conversions [conv], which contain the following on floating point conversions, [conv.double] 1:
A prvalue of floating point type can be converted to a prvalue of another floating point type. If the
source value can be exactly represented in the destination type, the result of the conversion is that exact
representation. If the source value is between two adjacent destination values, the result of the conversion
is an implementation-defined choice of either of those values. Otherwise, the behavior is undefined.
(Emphasis mine.)
So we have the guarantee that the result of the conversion is actually defined, unless we are dealing with values outside the representable range (like float a = 1e300, which is UB).
When people think about "internal floating point representation may be more precise than visible in code", they think about the following sentence in the standard, [expr] 11:
The values of the floating operands and the results of floating expressions may be represented in greater
precision and range than that required by the type; the types are not changed thereby.
Note that this applies to operands and results, not to variables. This is emphasized by the attached footnote 60:
The cast and assignment operators must still perform their specific conversions as described in 5.4, 5.2.9 and 5.17.
(I guess, this is the footnote that Maciej Piechotka meant in the comments - the numbering seems to have changed in the version of the standard he's been using.)
So, when I say float a = some_double_expression;, I have the guarantee that the result of the expression is actually rounded to be representable by a float (invoking UB only if the value is out-of-bounds), and a will refer to that rounded value afterwards.
An implementation could indeed specify that the result of the rounding is random, and thus break the cases a) and b). Sane implementations won't do that, though.
Assuming IEEE 754 semantics, there are definitely some cases where you can do this. Conventional floating point number computations are exact whenever they can be, which for example includes (but is not limited to) all basic operations where the operands and the results are integers.
So if you know for a fact that you don't do anything that would result in something unrepresentable, you are fine. For example
float a = 1.0f;
float b = 1.0f;
float c = 2.0f;
assert(a + b == c); // you can safely expect this to succeed
The situation only really gets bad if you have computations with results that aren't exactly representable (or that involve operations which aren't exact) and you change the order of operations.
Note that the C++ standard itself doesn't guarantee IEEE 754 semantics, but that's what you can expect to be dealing with most of the time.
Case (a) fails if a == b == 0.0. In this case, the operation yields NaN, and by definition (IEEE, not C) NaN ≠ NaN.
Cases (b) and (c) can fail in parallel computation when floating-point round modes (or other computation modes) are changed in the middle of this thread's execution. Seen this one in practice, unfortunately.
Case (d) can be different because the compiler (on some machine) may choose to constant-fold the computation of 5.0f/3.0f and replace it with the constant result (of unspecified precision), whereas a/b must be computed at runtime on the target machine (which might be radically different). In fact, intermediate calculations may be performed in arbitrary precision. I've seen differences on old Intel architectures when intermediate computation was performed in 80-bit floating-point, a format that the language didn't even directly support.
In my humble opinion, you should not rely on the == operator because it has many corner cases. The biggest problem is rounding and extended precision. In case of x86, floating point operations can be done with bigger precision than you can store in variables (if you use coprocessors, IIRC SSE operations use same precision as storage).
This is usually good thing, but this causes problems like:
1./2 != 1./2 because one value is form variable and second is from floating point register. In the simplest cases, it will work, but if you add other floating point operations the compiler could decide to split some variables to the stack, changing their values, thus changing the result of the comparison.
To have 100% certainty you need look at assembly and see what operations was done before on both values. Even the order can change the result in non-trivial cases.
Overall what is point of using ==? You should use algorithms that are stable. This means they work even if values are not equal, but they still give the same results. The only place I know where == could be useful is serializing/deserializing where you know what result you want exactly and you can alter serialization to archive your goal.
One of the big got'chas of floating point numbers is that some of them cannot be exactly represented in binary. This can make them difficult to work with. However what I'm curious about is whether or not subtle or not-so-subtle errors in floating point are deterministic. Can somebody predict them for example? Here's one example of a random number generator that could take advantage of floating point errors:
#include <cmath>
float constant = M_PI;
float generate()
{
static float state = 1;
state = state * constant;
return state;
}
One would have to know the implementation, the hardware, the compiler settings and so on, which makes it quite difficult to predict what the results would be. Or is my thinking flawed?
Floating point "errors" are deterministic. There is a 1:1 mapping between input and output values for a given operation. Your example will produce the same output sequence every time.
That said, there could be a floating-point implementation or ten out there that will produce different sequences, but this is not something you can consider "random" (i.e. a source of entropy).
Every floating point representation defines the composition of a floating point variable (which part is the mantissa, which part is the exponent, which part is the sign, etc) and the behaviour of every operation.
In any implementation you might choose, it is therefore possible to predict the result of every floating point operation, if you know its operand (or operands) That characteristic is the definition of determinism.
So, yes, floating point operations are deterministic.
Different implementations (compilers, host systems, etc) do support different floating point representations. So there is some variation of results between implementations. However, it is still possible to predict the result of any floating point operation, if you know how floating point variables are represented, and how operations work.
The fact not everyone knows enough about floating point types and operations on them does not make them non-deterministic. Nor does the fact that not everyone can describe the complete set of operations in a complex algorithm. The knowledge is readily available and, with enough effort, understandable well enough so effects of all operations on all possible operands can be reliably predicted before doing the operation.
There are buggy implementations of floating point out there, which do not comply with their own documentation. For example, look up the pentium FDIV bug - where some early pentium CPUs implemented floating point division incorrectly. Even those turned out to be deterministic, once it was understood what the operations actually do.
I would like to know if I can assume that the value of a float will not change if I just pass it around functions, without any further calculations. I would like to write some tests for that kind of functions using hardcoded values.
Example :
float identity(float f) { return f; }
Can I write the following test :
TEST() {
EXPECT(identity(1.8f) == 1.8f);
}
In general the C++ standard doesn't make guarantees if it's known that the guarantee will lead to sub-optimal code for some processor architecture.
The legacy x86 floating point processing uses 80-bit registers for calculations. The mere act of moving a value from one of those registers into 64 bits of memory causes rounding to occur.
If you're not performing any lossy operation and just passing the floating point data around it should be safe to assume (assuming there are no interferences or optimization bugs) that the values will remain the same. Just make sure you're not comparing floating point values with literal values interpreted as double (EXPECT(indentity(1.8f) == 1.8);) or vice-versa.
/paranoid_level on
However you should always check your target architecture behavior with floating point numbers, especially with respect to the IEEE 754 standard: on a system which allows IEEE 754 exceptions under specific circumstances (e.g. -ftz flags often used in GPUs) you might end up having results inconsistent with your expectations (possibly when combining smart compiler optimizations) since results might be handled internally in a different manner. An example is an architecture which applies to any floating point operation a denormals-are-zero (-daz) policy.
That is perfectly fine and the value will be identical.
As pointed out in the comments, if you were to write:
TEST() {
EXPECT(indentity(1.8) == 1.8f);
EXPECT(indentity(1.8l) == 1.8f);
}
you could end up having problems, due to implicit conversions from int/double.
however if you compare floats with floats, you're perfectly safe.
reference
When comparing doubles for equality, we need to give a tolerance level, because floating-point computation might introduce errors. For example:
double x;
double y;
x = f();
y = g();
if (fabs(x-y)<epsilon) {
// they are equal!
} else {
// they are not!
}
However, if I simply assign a constant value, without any computation, do I still need to check the epsilon?
double x = 1;
double y = 1;
if (x==y) {
// they are equal!
} else {
// no they are not!
}
Is == comparison good enough? Or I need to do fabs(x-y)<epsilon again? Is it possible to introduce error in assigning? Am I too paranoid?
How about casting (double x = static_cast<double>(100))? Is that gonna introduce floating-point error as well?
I am using C++ on Linux, but if it differs by language, I would like to understand that as well.
Actually, it depends on the value and the implementation. The C++ standard (draft n3126) has this to say in 2.14.4 Floating literals:
If the scaled value is in the range of representable values for its type, the result is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.
In other words, if the value is exactly representable (and 1 is, in IEEE754, as is 100 in your static cast), you get the value. Otherwise (such as with 0.1) you get an implementation-defined close match (a). Now I'd be very worried about an implementation that chose a different close match based on the same input token but it is possible.
(a) Actually, that paragraph can be read in two ways, either the implementation is free to choose either the closest higher or closest lower value regardless of which is actually the closest, or it must choose the closest to the desired value.
If the latter, it doesn't change this answer however since all you have to do is hardcode a floating point value exactly at the midpoint of two representable types and the implementation is once again free to choose either.
For example, it might alternate between the next higher and next lower for the same reason banker's rounding is applied - to reduce the cumulative errors.
No if you assign literals they should be the same :)
Also if you start with the same value and do the same operations, they should be the same.
Floating point values are non-exact, but the operations should produce consistent results :)
Both cases are ultimately subject to implementation defined representations.
Storage of floating point values and their representations take on may forms - load by address or constant? optimized out by fast math? what is the register width? is it stored in an SSE register? Many variations exist.
If you need precise behavior and portability, do not rely on this implementation defined behavior.
IEEE-754, which is a standard common implementations of floating point numbers abide to, requires floating-point operations to produce a result that is the nearest representable value to an infinitely-precise result. Thus the only imprecision that you will face is rounding after each operation you perform, as well as propagation of rounding errors from the operations performed earlier in the chain. Floats are not per se inexact. And by the way, epsilon can and should be computed, you can consult any numerics book on that.
Floating point numbers can represent integers precisely up to the length of their mantissa. So for example if you cast from an int to a double, it will always be exact, but for casting into into a float, it will no longer be exact for very large integers.
There is one major example of extensive usage of floating point numbers as a substitute for integers, it's the LUA scripting language, which has no integer built-in type, and floating-point numbers are used extensively for logic and flow control etc. The performance and storage penalty from using floating-point numbers turns out to be smaller than the penalty of resolving multiple types at run time and makes the implementation lighter. LUA has been extensively used not only on PC, but also on game consoles.
Now, many compilers have an optional switch that disables IEEE-754 compatibility. Then compromises are made. Denormalized numbers (very very small numbers where the exponent has reached smallest possible value) are often treated as zero, and approximations in implementation of power, logarithm, sqrt, and 1/(x^2) can be made, but addition/subtraction, comparison and multiplication should retain their properties for numbers which can be exactly represented.
The easy answer: For constants == is ok.
There are two exceptions which you should be aware of:
First exception:
0.0 == -0.0
There is a negative zero which compares equal for the IEEE 754 standard. This means
1/INFINITY == 1/-INFINITY which breaks f(x) == f(y) => x == y
Second exception:
NaN != NaN
This is a special caveat of NotaNumber which allows to find out if a number is a NaN
on systems which do not have a test function available (Yes, that happens).