For floating point values, is it guaranteed that a + b is the same as1 b + a?
I believe this is guaranteed in IEEE754, however the C++ standard does not specify that IEEE754 must be used. The only relevant text seems to be from [expr.add]#3:
The result of the binary + operator is the sum of the operands.
The mathematical operation "sum" is commutative. However, the mathematical operation "sum" is also associative, whereas floating point addition is definitely not associative. So, it seems to me that we cannot conclude that the commutativity of "sum" in mathematics means that this quote specifies commutativity in C++.
Footnote 1:
"Same" as in bitwise identical, like memcmp rather than ==, to distinguish +0 from -0. IEEE754 treats +0.0 == -0.0 as true, but also has specific rules for signed zero. +0 + -0 and -0 + +0 both produce +0 in IEEE754, same for addition of opposite-sign values with equal magnitude. An == that followed IEEE semantics would hide non-commutativity of signed-zero if that was the criterion.
Also, a+b == b+a is false with IEEE754 math if either input is NaN.
memcmp will say whether two NaNs have the same bit-pattern (including payload), although we can consider NaN propagation rules separately from commutativity of valid math operations.
It is not even required that a + b == a + b. One of the subexpressions may hold the result of the addition with more precision than the other one, for example when the use of multiple additions requires one of the subexpressions to be temporarily stored in memory, when the other subexpression can be kept in a register (with higher precision).
If a + b == a + b is not guaranteed, a + b == b + a cannot be guaranteed. If a + b does not have to return the same value each time, and the values are different, one of them necessarily will not be equal to one particular evaluation of b + a.
No, the C++ language generally wouldn't make such a requirement of the hardware. Only the associativity of operators is defined.
All kinds of crazy things do happen in floating-point arithmetic. Perhaps, on some machine, adding zero to an denormal number produces zero. Conceivable that a machine could avoid updating memory in the case of adding a zero-valued register to a denormal in memory. Possible that a really dumb compiler would always put the LHS in memory and the RHS in a register.
Note, though, that a machine with non-commutative addition would need to specifically define how expressions map to instructions, if you're going to have any control over which operation you get. Does the left-hand side go into the first machine operand or the second?
Such an ABI specification, mentioning the construction of expressions and instructions in the same breath, would be quite pathological.
The C++ standard very specifically does not guarantee IEEE 754. The library does have some support for IEC 559 (which is basically just the IEC's version of the IEEE 754 standard), so you can check whether the underlying implementation uses IEEE 754/IEC 559 though (and when it does, you can depend on what it guarantees, of course).
For the most part, the C and C++ standards assume that such basic operations will be implemented however the underlying hardware works. For something as common as IEEE 754, they'll let you detect whether it's present, but still don't require it.
Related
I am reading C++ Concurrency in Action Chapter 5.
This chapter says,
Note that although you can use std::atomic<float> or std::atomic<double>, because the built-in floating point types do satisfy the criteria for use with memcpy and memcmp, the behavior may be surprising in the case of compare_exchange_strong. The operation may fail even though the old stored value was equal in value to the comparand, if the stored value had a different representation. Note that there are no atomic arithmetic operations on floating-point values. You’ll get similar behavior with compare_exchange_strong if you use std::atomic<> with a user-defined type that has an equality-comparison operator defined, and that operator differs from the comparison using memcmp—the operation may fail because the otherwise-equal values have a different representation.
But I don't understand why it is.
If float and double can use memcpy and memcmp, what is the matter to do atomic operation like compare_exchange_strong ?
I can't use compare_exchange_weak also ?
Above paragraph, what does "difference representation" mean ?
IEEE-754 floats use a signed/magnitude format to determine the sign of the mantissa. That is, there's a sign bit, and then there are bits that represent a positive number. If the sign bit is set, then the number is interpreted as negative.
Of course, if all of the numeric (and exponent) bits are 0, and the sign bit is set, then that yields the "number" -0.0. As far as real-world human math is concerned, there is no "negative 0". So logically, such a number should be treated as equivalent to +0.0. And normal floating-point comparison logic does exactly that. But these two equivalent values are stored with different binary representations; one has the sign bit set; the other does not.
So if you do a comparison of the binary representation of two floating point values (which is what the atomic_compare_* operations will do), then -0.0 will not be considered equal to +0.0.
There are other cases where IEEE-754 equality testing can go wrong, but this is the simplest one to explain.
in c++ for two doubles a and b, is it true if a > b, then a - b > 0? Assume a, b are not Nan, just normal numbers
No, not by the requirements of the C++ standard alone. C++ allows implementations to use floating-point types without support for subnormal numbers. In such a type, you could have a equal to 1.00012•2m, where m is the minimum normal exponent, and b equal to 1.00002•2m. Then a > b but the exact value of a-b is not representable (it would be subnormal), so the computed result is zero.
This does not happen with IEEE-754 arithmetic, which supports subnormals. However, some C++ implementations, possibly not conforming to the C++ standard, use IEEE-754 formats but flush subnormal results to zero. So this will also produce a > b but a-b == 0.
Supposing an implementation is supporting subnormals and is conforming to the C standard, another issue is the standard allows implementations to use extra precision within floating-point expressions. If your a and b are actually expressions rather than single objects, it is possible that the extra precision could cause a > b to be true while a-b == 0 is false. Implementations are permitted to use or not use such extra precision at will, so a > b could be true in some evaluations and not others, and similarly for a-b == 0. If a and b are single objects, this will not happen, because the standard requires implementations to “discard” the excess precision in assignments and casts.
Yes, this is true in IEEE-754 arithmetic, in all rounding modes. So long as your C++ compiler uses IEEE-754 semantics (either via software or hardware). But see Eric Postpischill's answer for when particular implementations might diverge from IEEE-754. Best to check with your compiler documentation.
It is common knowledge that one has to be careful when comparing floating point values. Usually, instead of using ==, we use some epsilon or ULP based equality testing.
However, I wonder, are there any cases, when using == is perfectly fine?
Look at this simple snippet, which cases are guaranteed to succeed?
void fn(float a, float b) {
float l1 = a/b;
float l2 = a/b;
if (l1==l1) { } // case a)
if (l1==l2) { } // case b)
if (l1==a/b) { } // case c)
if (l1==5.0f/3.0f) { } // case d)
}
int main() {
fn(5.0f, 3.0f);
}
Note: I've checked this and this, but they don't cover (all of) my cases.
Note2: It seems that I have to add some plus information, so answers can be useful in practice: I'd like to know:
what the C++ standard says
what happens, if a C++ implementation follows IEEE-754
This is the only relevant statement I found in the current draft standard:
The value representation of floating-point types is implementation-defined. [ Note: This document imposes no requirements on the accuracy of floating-point operations; see also [support.limits]. — end note ]
So, does this mean, that even "case a)" is implementation defined? I mean, l1==l1 is definitely a floating-point operation. So, if an implementation is "inaccurate", then could l1==l1 be false?
I think this question is not a duplicate of Is floating-point == ever OK?. That question doesn't address any of the cases I'm asking. Same subject, different question. I'd like to have answers specifically to case a)-d), for which I cannot find answers in the duplicated question.
However, I wonder, are there any cases, when using == is perfectly fine?
Sure there are. One category of examples are usages that involve no computation, e.g. setters that should only execute on changes:
void setRange(float min, float max)
{
if(min == m_fMin && max == m_fMax)
return;
m_fMin = min;
m_fMax = max;
// Do something with min and/or max
emit rangeChanged(min, max);
}
See also Is floating-point == ever OK? and Is floating-point == ever OK?.
Contrived cases may "work". Practical cases may still fail. One additional issue is that often optimisation will cause small variations in the way the calculation is done so that symbolically the results should be equal but numerically they are different. The example above could, theoretically, fail in such a case. Some compilers offer an option to produce more consistent results at a cost to performance. I would advise "always" avoiding the equality of floating point numbers.
Equality of physical measurements, as well as digitally stored floats, is often meaningless. So if your comparing if floats are equal in your code you are probably doing something wrong. You usually want greater than or less that or within a tolerance. Often code can be rewritten so these types of issues are avoided.
Only a) and b) are guaranteed to succeed in any sane implementation (see the legalese below for details), as they compare two values that have been derived in the same way and rounded to float precision. Consequently, both compared values are guaranteed to be identical to the last bit.
Case c) and d) may fail because the computation and subsequent comparison may be carried out with higher precision than float. The different rounding of double should be enough to fail the test.
Note that the cases a) and b) may still fail if infinities or NANs are involved, though.
Legalese
Using the N3242 C++11 working draft of the standard, I find the following:
In the text describing the assignment expression, it is explicitly stated that type conversion takes place, [expr.ass] 3:
If the left operand is not of class type, the expression is implicitly converted (Clause 4) to the cv-unqualified type of the left operand.
Clause 4 refers to the standard conversions [conv], which contain the following on floating point conversions, [conv.double] 1:
A prvalue of floating point type can be converted to a prvalue of another floating point type. If the
source value can be exactly represented in the destination type, the result of the conversion is that exact
representation. If the source value is between two adjacent destination values, the result of the conversion
is an implementation-defined choice of either of those values. Otherwise, the behavior is undefined.
(Emphasis mine.)
So we have the guarantee that the result of the conversion is actually defined, unless we are dealing with values outside the representable range (like float a = 1e300, which is UB).
When people think about "internal floating point representation may be more precise than visible in code", they think about the following sentence in the standard, [expr] 11:
The values of the floating operands and the results of floating expressions may be represented in greater
precision and range than that required by the type; the types are not changed thereby.
Note that this applies to operands and results, not to variables. This is emphasized by the attached footnote 60:
The cast and assignment operators must still perform their specific conversions as described in 5.4, 5.2.9 and 5.17.
(I guess, this is the footnote that Maciej Piechotka meant in the comments - the numbering seems to have changed in the version of the standard he's been using.)
So, when I say float a = some_double_expression;, I have the guarantee that the result of the expression is actually rounded to be representable by a float (invoking UB only if the value is out-of-bounds), and a will refer to that rounded value afterwards.
An implementation could indeed specify that the result of the rounding is random, and thus break the cases a) and b). Sane implementations won't do that, though.
Assuming IEEE 754 semantics, there are definitely some cases where you can do this. Conventional floating point number computations are exact whenever they can be, which for example includes (but is not limited to) all basic operations where the operands and the results are integers.
So if you know for a fact that you don't do anything that would result in something unrepresentable, you are fine. For example
float a = 1.0f;
float b = 1.0f;
float c = 2.0f;
assert(a + b == c); // you can safely expect this to succeed
The situation only really gets bad if you have computations with results that aren't exactly representable (or that involve operations which aren't exact) and you change the order of operations.
Note that the C++ standard itself doesn't guarantee IEEE 754 semantics, but that's what you can expect to be dealing with most of the time.
Case (a) fails if a == b == 0.0. In this case, the operation yields NaN, and by definition (IEEE, not C) NaN ≠ NaN.
Cases (b) and (c) can fail in parallel computation when floating-point round modes (or other computation modes) are changed in the middle of this thread's execution. Seen this one in practice, unfortunately.
Case (d) can be different because the compiler (on some machine) may choose to constant-fold the computation of 5.0f/3.0f and replace it with the constant result (of unspecified precision), whereas a/b must be computed at runtime on the target machine (which might be radically different). In fact, intermediate calculations may be performed in arbitrary precision. I've seen differences on old Intel architectures when intermediate computation was performed in 80-bit floating-point, a format that the language didn't even directly support.
In my humble opinion, you should not rely on the == operator because it has many corner cases. The biggest problem is rounding and extended precision. In case of x86, floating point operations can be done with bigger precision than you can store in variables (if you use coprocessors, IIRC SSE operations use same precision as storage).
This is usually good thing, but this causes problems like:
1./2 != 1./2 because one value is form variable and second is from floating point register. In the simplest cases, it will work, but if you add other floating point operations the compiler could decide to split some variables to the stack, changing their values, thus changing the result of the comparison.
To have 100% certainty you need look at assembly and see what operations was done before on both values. Even the order can change the result in non-trivial cases.
Overall what is point of using ==? You should use algorithms that are stable. This means they work even if values are not equal, but they still give the same results. The only place I know where == could be useful is serializing/deserializing where you know what result you want exactly and you can alter serialization to archive your goal.
I did a quick test using the following:
float x = std::numeric_limits<float>::max();
x += 0.1;
that resulted in x == std::numeric_limits::max() so it didn't get any bigger than the limit.
Is this guaranteed behavior across compilers and platforms though? What about HLSL?
Is this guaranteed behavior across compilers and platforms though?
No, the behavior is undefined. The standard says (emphasis mine):
5 Expressions....
If during the evaluation of an expression, the result is not mathematically defined or not in the range of
representable values for its type, the behavior is undefined. [ Note: most existing implementations of C++
ignore integer overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all
floating point exceptions vary among machines, and is usually adjustable by a library function. —end note ]
As #user2079303 mentioned, in practice we can be less restricted:
it is not undefined if std::numeric_limits<float>::has_infinity. Which is often true. In that case, the result is merely unspecified.
The value of std::numeric_limits<T>::max() is defined to be the maximum finite value representable by type T (see 18.3.2.4 [numeric.limits.members] paragraph 4). Thus, the question actually becomes multiple subquestions:
Is it possible to create a value bigger than std::numeric_limits<T>::max(), i.e., is there an infinity?
If so, which value needs to be added to std::numeric_limits<T>::max() to get the infinity?
If not, is the behavior defined?
C++ does not specify the floating point format and different formats may disagree on what the result is. In particular, I don't think floating point formats need to define a value for infinity. For example, IBM Floating Points do not have an infinity. On the other hand the IEEE 754 does have an infinity representation.
Since overflow of arithmetic types may be undefined behavior (see 5 [expr] paragraph 4) and I don't see any exclusion for floating point types. Thus, the behavior would be undefined behavior if there is no infinity. At least, it can be tested whether a type does have an infinity (see 18.3.2.3 [numeric.limits] paragraph 35) in which case the operation can't overflow.
If there is an infinity I think adding any value to std::numeric_limits<T>::max() would get you infinity. However, determining whether that is, indeed, the case would require to dig through the respective floating point specification. I could imagine that IEEE 754 might ignore additions if the value is too small to be relevant as is the case for adding 0.1 to std::numeric_limits<T>::max(). I could also imagine that it decides that it always overflows to infinity.
Regarding division by zero, the standards say:
C99 6.5.5p5 - The result of the / operator is the quotient from the division of the first operand by the second; the result of the % operator is the remainder. In both operations, if the value of the second operand is zero, the behavior is undefined.
C++03 5.6.4 - The binary / operator yields the quotient, and the binary % operator yields the remainder from the division of the first expression by the second. If the second operand of / or % is zero the behavior is undefined.
If we were to take the above paragraphs at face value, the answer is clearly Undefined Behavior for both languages. However, if we look further down in the C99 standard we see the following paragraph which appears to be contradictory(1):
C99 7.12p4 - The macro INFINITY expands to a constant expression of type float representing positive or unsigned infinity, if available;
Do the standards have some sort of golden rule where Undefined Behavior cannot be superseded by a (potentially) contradictory statement? Barring that, I don't think it's unreasonable to conclude that if your implementation defines the INFINITY macro, division by zero is defined to be such. However, if your implementation does not define such a macro, the behavior is Undefined.
I'm curious what the consensus is (if any) on this matter for each of the two languages. Would the answer change if we are talking about integer division int i = 1 / 0 versus floating point division float i = 1.0 / 0.0 ?
Note (1) The C++03 standard talks about the <cmath> library which includes the INFINITY macro.
I don't see any contradiction. Division by zero is undefined, period. There is no mention of "... unless INFINITY is defined" anywhere in the quoted text.
Note that nowhere in mathematics it is defined that 1 / 0 = infinity. One might interpret it that way, but it is a personal, "shortcut" style interpretation, rather than a sound fact.
1 / 0 is not infinity, only lim 1/x = ∞ (x -> +0)
This was not a math purest question, but a C/C++ question.
According to the IEEE 754 Standard, which all modern C compilers / FPU's use, we have
3.0 / 0.0 = INF
0.0 / 0.0 = NaN
-3.0 / 0.0 = -INF
The FPU will have a status flag that you can set to generate an exception if so desired, but this is not the norm.
INF can be quite useful to avoiding branching when INF is a useful result. See discussion here
http://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
Why would it?
That doesn't make sense mathematically, it's not as if 1/x is defined as ∞ in mathematics in general. Also, you would at least need two more cases: -1/x and 0/x can't also equal ∞.
See division by zero in general, and the section about computer arithmetic in particular.
Implementations which define __STDC_IEC_559__ are required to abide by the requirements given in Annex F, which in turn requires floating-point semantics consistent with IEC 60559. The Standard imposes no requirements on the behavior of floating-point division by zero on implementations which do not define __STDC_IEC_559__, but does for those which do define it. In cases where IEC 60559 specifies a behavior but the C Standard does not, a compiler which defines __STDC_IEC_559__ is required by the C Standard to behave as described in the IEC standard.
As defined by IEC 60559 (or the US standard IEEE-754) Division of zero by zero yields NaN, division of a floating-point number by positive zero or literal constant zero yields an INF value with the same sign as the dividend, and division of a floating-point number by negative zero yields an INF with the opposite sign.
I've only got the C99 draft. In §7.12/4 it says:
The macro
INFINITY
expands to a constant expression of
type float representing positive or
unsigned infinity, if available; else
to a positive constant of type float
that overflows at translation time.
Note that INFINITY can be defined in terms of floating-point overflow, not necessarily divide-by-zero.
For the INFINITY macro: there is a explicit coding to represent +/- infinity in the IEEE754 standard, which is if all exponent bits are set and all fraction bits are cleared (if a fraction bit is set, it represents NaN)
With my compiler, (int) INFINITY == -2147483648, so an expression that evaluates to int i = 1/0 would definitely produce wrong results if INFINITIY was returned
Bottom line, C99 (as per your quotes) does not say anything about INFINITY in the context of "implementation-defined". Secondly, what you quoted does not show inconsistent meaning of "undefined behavior".
[Quoting Wikipedia's Undefined Behavior page] "In C and C++, implementation-defined behavior is also used, where the language standard does not specify the behavior, but the implementation must choose a behavior and needs to document and observe the rules it chose."
More precisely, the standard means "implementation-defined" (I think only) when it uses those words with respect to the statement made since "implementation-defined" is a specific attribute of the standard. The quote of C99 7.12p4 didn't mention "implementation-defined".
[From C99 std (late draft)] "undefined behavior: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements"
Note there is "no requirement" imposed for undefined behavior!
[C99 ..] "implementation-defined behavior: unspecified behavior where each implementation documents how the choice is made"
[C99 ..] "unspecified behavior: use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance"
Documentation is a requirement for implementation-defined behavior.