Compile time floating point division by zero in C++

Compile time floating point division by zero in C++ - c++

It is well known, if one divides a floating-point number on zero in run-time then the result will be either infinity or not-a-number (the latter case if the dividend was also zero). But is it allowed to divide by zero in C++ constexpr expressions (in compile time), e.g.
#include <iostream>
int main() {
double x = 0.;
// run-time division: all compilers print "-nan"
std::cout << 0./x << std::endl;
// compile-time division, diverging results
constexpr double y = 0.;
std::cout << 0./y << std::endl;
}
In this program the first printed number is obtained from division in run-time, and all compilers are pretty consistent in printing -nan. (Side question: why not +nan?)
But in second case the compilers diverge. MSVC simply stops compilation with the error:
error C2124: divide or mod by zero
GCC still prints -nan, while Clang changes the sign printing “positive” nan, demo: https://gcc.godbolt.org/z/eP744er8n
Does the language standard permit all three behaviors of the compilers for compile-time division: 1) reject the program, 2) produce the same division result as in run-time, 3) produce a different (in sign bit) result?

Division by zero is undefined behavior. (So anything goes)
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4713.pdf
Section 8.5.5, point 4

Related

Is this floating-point optimization allowed?

I tried to check out where float loses the ability to exactly represent large integer numbers. So I wrote this little snippet:
int main() {
for (int i=0; ; i++) {
if ((float)i!=i) {
return i;
}
}
}
This code seems to work with all compilers, except clang. Clang generates a simple infinite loop. Godbolt.
Is this allowed? If yes, is it a QoI issue?

Note that the built-in operator != requires its operands to be of the same type, and will achieve that using promotions and conversions if necessary. In other words, your condition is equivalent to:
(float)i != (float)i
That should never fail, and so the code will eventually overflow i, giving your program Undefined Behaviour. Any behaviour is therefore possible.
To correctly check what you want to check, you should cast the result back to int:
if ((int)(float)i != i)

As #Angew pointed out, the != operator needs the same type on both sides.
(float)i != i results in promotion of the RHS to float as well, so we have (float)i != (float)i.
g++ also generates an infinite loop, but it doesn't optimize away the work from inside it. You can see it converts int->float with cvtsi2ss and does ucomiss xmm0,xmm0 to compare (float)i with itself. (That was your first clue that your C++ source doesn't mean what you thought it did like #Angew's answer explains.)
x != x is only true when it's "unordered" because x was NaN. (INFINITY compares equal to itself in IEEE math, but NaN doesn't. NAN == NAN is false, NAN != NAN is true).
gcc7.4 and older correctly optimizes your code to jnp as the loop branch (https://godbolt.org/z/fyOhW1) : keep looping as long as the operands to x != x weren't NaN. (gcc8 and later also checks je to a break out of the loop, failing to optimize based on the fact that it will always be true for any non-NaN input). x86 FP compares set PF on unordered.
And BTW, that means clang's optimization is also safe: it just has to CSE (float)i != (implicit conversion to float)i as being the same, and prove that i -> float is never NaN for the possible range of int.
(Although given that this loop will hit signed-overflow UB, it's allowed to emit literally any asm it wants, including a ud2 illegal instruction, or an empty infinite loop regardless of what the loop body actually was.) But ignoring the signed-overflow UB, this optimization is still 100% legal.
GCC fails to optimize away the loop body even with -fwrapv to make signed-integer overflow well-defined (as 2's complement wraparound). https://godbolt.org/z/t9A8t_
Even enabling -fno-trapping-math doesn't help. (GCC's default is unfortunately to enable
-ftrapping-math even though GCC's implementation of it is broken/buggy.) int->float conversion can cause an FP inexact exception (for numbers too large to be represented exactly), so with exceptions possibly unmasked it's reasonable not to optimize away the loop body. (Because converting 16777217 to float could have an observable side-effect if the inexact exception is unmasked.)
But with -O3 -fwrapv -fno-trapping-math, it's 100% missed optimization not to compile this to an empty infinite loop. Without #pragma STDC FENV_ACCESS ON, the state of the sticky flags that record masked FP exceptions is not an observable side-effect of the code. No int->float conversion can result in NaN, so x != x can't be true.
These compilers are all optimizing for C++ implementations that use IEEE 754 single-precision (binary32) float and 32-bit int.
The bugfixed (int)(float)i != i loop would have UB on C++ implementations with narrow 16-bit int and/or wider float, because you'd hit signed-integer overflow UB before reaching the first integer that wasn't exactly representable as a float.
But UB under a different set of implementation-defined choices doesn't have any negative consequences when compiling for an implementation like gcc or clang with the x86-64 System V ABI.
BTW, you could statically calculate the result of this loop from FLT_RADIX and FLT_MANT_DIG, defined in <climits>. Or at least you can in theory, if float actually fits the model of an IEEE float rather than some other kind of real-number representation like a Posit / unum.
I'm not sure how much the ISO C++ standard nails down about float behaviour and whether a format that wasn't based on fixed-width exponent and significand fields would be standards compliant.
In comments:
#geza I would be interested to hear the resulting number!
#nada: it's 16777216
Are you claiming you got this loop to print / return 16777216?
Update: since that comment has been deleted, I think not. Probably the OP is just quoting the float before the first integer that can't be exactly represented as a 32-bit float. https://en.wikipedia.org/wiki/Single-precision_floating-point_format#Precision_limits_on_integer_values i.e. what they were hoping to verify with this buggy code.
The bugfixed version would of course print 16777217, the first integer that's not exactly representable, rather than the value before that.
(All the higher float values are exact integers, but they're multiples of 2, then 4, then 8, etc. for exponent values higher than the significand width. Many higher integer values can be represented, but 1 unit in the last place (of the significand) is greater than 1 so they're not contiguous integers. The largest finite float is just below 2^128, which is too large for even int64_t.)
If any compiler did exit the original loop and print that, it would be a compiler bug.

Can 'inf' be assigned to a variable like regular numeric values in c++

When I wrote the following code, instead of causing runtime error, it outputs 'inf'. Now, is there any way to assign this value ('inf') to a variable, like regular numeric values? How to check if a division yields 'inf'?
#include<iostream>
int main(){
double a = 1, b = 0;
std::cout << a / b << endl;
return 0;
}

C++ does not require implementations to support infinity or division by zero. Many implementations will, as many implementations use IEEE 754 formats even if they do not fully support IEEE 754 semantics.
When you want to use infinity as a value (that is, you want to refer to infinity in source code), you should not generate it by dividing by zero. Instead, include <limits> and use std::numeric_limits<T>::infinity() with T specified as double.
Returns the special value "positive infinity", as represented by the floating-point type T. Only meaningful if std::numeric_limits< T >::has_infinity== true.
(You may also see code that includes <cmath> and uses INFINITY, which are inherited from C.)
When you want to check if a number is finite, include <cmath> and use std::isfinite. Note that computations with infinite values tend to ultimately produce NaNs, and std::isfinite(x) is generally more convenient than !std::isinf(x) && !std::isnan(x).
A final warning in case you are using unsafe compiler flags: In case you use, e.g., GCC's -ffinite-math-only (included in -ffast-math) then std::isfinite does not work.

It appears to be I can:
#include<iostream>
int main(){
double a = 1, b = 0, c = 1/0.0;
std::cout << a / b << endl;
if (a / b == c) std::cout << "Yes you can.\n";
return 0;
}

Why does Clang optimize away x * 1.0 but NOT x + 0.0?

Why does Clang optimize away the loop in this code
#include <time.h>
#include <stdio.h>
static size_t const N = 1 << 27;
static double arr[N] = { /* initialize to zero */ };
int main()
{
clock_t const start = clock();
for (int i = 0; i < N; ++i) { arr[i] *= 1.0; }
printf("%u ms\n", (unsigned)(clock() - start) * 1000 / CLOCKS_PER_SEC);
}
but not the loop in this code?
#include <time.h>
#include <stdio.h>
static size_t const N = 1 << 27;
static double arr[N] = { /* initialize to zero */ };
int main()
{
clock_t const start = clock();
for (int i = 0; i < N; ++i) { arr[i] += 0.0; }
printf("%u ms\n", (unsigned)(clock() - start) * 1000 / CLOCKS_PER_SEC);
}
(Tagging as both C and C++ because I would like to know if the answer is different for each.)

The IEEE 754-2008 Standard for Floating-Point Arithmetic and the ISO/IEC 10967 Language Independent Arithmetic (LIA) Standard, Part 1 answer why this is so.
IEEE 754 § 6.3 The sign bit
When either an input or result is NaN, this standard does not interpret the sign of a NaN. Note, however, that operations on bit strings — copy, negate, abs, copySign — specify the sign bit of a NaN result, sometimes based upon the sign bit of a NaN operand. The logical predicate totalOrder is also affected by the sign bit of a NaN operand. For all other operations, this standard does not specify the sign bit of a NaN result, even when there is only one input NaN, or when the NaN is produced from an invalid operation.
When neither the inputs nor result are NaN, the sign of a product or quotient is the exclusive OR of the operands’ signs; the sign of a sum, or of a difference x − y regarded as a sum x + (−y), differs from at most
one of the addends’ signs; and the sign of the result of conversions, the quantize operation, the roundTo-Integral operations, and the roundToIntegralExact (see 5.3.1) is the sign of the first or only operand. These rules shall apply even when operands or results are zero or infinite.
When the sum of two operands with opposite signs (or the difference of two operands with like signs) is exactly zero, the sign of that sum (or difference) shall be +0 in all rounding-direction attributes except roundTowardNegative; under that attribute, the sign of an exact zero sum (or difference) shall be −0. However, x + x = x − (−x) retains the same sign as x even when x is zero.
The Case of Addition
Under the default rounding mode (Round-to-Nearest, Ties-to-Even), we see that x+0.0 produces x, EXCEPT when x is -0.0: In that case we have a sum of two operands with opposite signs whose sum is zero, and §6.3 paragraph 3 rules this addition produces +0.0.
Since +0.0 is not bitwise identical to the original -0.0, and that -0.0 is a legitimate value that may occur as input, the compiler is obliged to put in the code that will transform potential negative zeros to +0.0.
The summary: Under the default rounding mode, in x+0.0, if x
is not -0.0, then x itself is an acceptable output value.
is -0.0, then the output value must be +0.0, which is not bitwise identical to -0.0.
The Case of Multiplication
Under the default rounding mode, no such problem occurs with x*1.0. If x:
is a (sub)normal number, x*1.0 == x always.
is +/- infinity, then the result is +/- infinity of the same sign.
is NaN, then according to
IEEE 754 § 6.2.3 NaN Propagation
An operation that propagates a NaN operand to its result and has a single NaN as an input should produce a NaN with the payload of the input NaN if representable in the destination format.
which means that the exponent and mantissa (though not the sign) of NaN*1.0 are recommended to be unchanged from the input NaN. The sign is unspecified in accordance with §6.3p1 above, but an implementation may specify it to be identical to the source NaN.
is +/- 0.0, then the result is a 0 with its sign bit XORed with the sign bit of 1.0, in agreement with §6.3p2. Since the sign bit of 1.0 is 0, the output value is unchanged from the input. Thus, x*1.0 == x even when x is a (negative) zero.
The Case of Subtraction
Under the default rounding mode, the subtraction x-0.0 is also a no-op, because it is equivalent to x + (-0.0). If x is
is NaN, then §6.3p1 and §6.2.3 apply in much the same way as for addition and multiplication.
is +/- infinity, then the result is +/- infinity of the same sign.
is a (sub)normal number, x-0.0 == x always.
is -0.0, then by §6.3p2 we have "[...] the sign of a sum, or of a difference x − y regarded as a sum x + (−y), differs from at most one of the addends’ signs;". This forces us to assign -0.0 as the result of (-0.0) + (-0.0), because -0.0 differs in sign from none of the addends, while +0.0 differs in sign from two of the addends, in violation of this clause.
is +0.0, then this reduces to the addition case (+0.0) + (-0.0) considered above in The Case of Addition, which by §6.3p3 is ruled to give +0.0.
Since for all cases the input value is legal as the output, it is permissible to consider x-0.0 a no-op, and x == x-0.0 a tautology.
Value-Changing Optimizations
The IEEE 754-2008 Standard has the following interesting quote:
IEEE 754 § 10.4 Literal meaning and value-changing optimizations
[...]
The following value-changing transformations, among others, preserve the literal meaning of the source code:
Applying the identity property 0 + x when x is not zero and is not a signaling NaN and the result has the same exponent as x.
Applying the identity property 1 × x when x is not a signaling NaN and the result has the same exponent as x.
Changing the payload or sign bit of a quiet NaN.
[...]
Since all NaNs and all infinities share the same exponent, and the correctly rounded result of x+0.0 and x*1.0 for finite x has exactly the same magnitude as x, their exponent is the same.
sNaNs
Signaling NaNs are floating-point trap values; They are special NaN values whose use as a floating-point operand results in an invalid operation exception (SIGFPE). If a loop that triggers an exception were optimized out, the software would no longer behave the same.
However, as user2357112 points out in the comments, the C11 Standard explicitly leaves undefined the behaviour of signaling NaNs (sNaN), so the compiler is allowed to assume they do not occur, and thus that the exceptions that they raise also do not occur. The C++11 standard omits describing a behaviour for signaling NaNs, and thus also leaves it undefined.
Rounding Modes
In alternate rounding modes, the permissible optimizations may change. For instance, under Round-to-Negative-Infinity mode, the optimization x+0.0 -> x becomes permissible, but x-0.0 -> x becomes forbidden.
To prevent GCC from assuming default rounding modes and behaviours, the experimental flag -frounding-math can be passed to GCC.
Conclusion
Clang and GCC, even at -O3, remains IEEE-754 compliant. This means it must keep to the above rules of the IEEE-754 standard. x+0.0 is not bit-identical to x for all x under those rules, but x*1.0 may be chosen to be so: Namely, when we
Obey the recommendation to pass unchanged the payload of x when it is a NaN.
Leave the sign bit of a NaN result unchanged by * 1.0.
Obey the order to XOR the sign bit during a quotient/product, when x is not a NaN.
To enable the IEEE-754-unsafe optimization (x+0.0) -> x, the flag -ffast-math needs to be passed to Clang or GCC.

x += 0.0 isn't a NOOP if x is -0.0. The optimizer could strip out the whole loop anyway since the results aren't used, though. In general, it's hard to tell why an optimizer makes the decisions it does.

results from convert DBL_MAX to int is different from std::numeric_limits<double>::max() to int

While doing a conversion test I encounter some strange behavior in C++.
Context
The online C++ reference indicates that the return value of std::numeric_limits<double>::max() (defined in limit.h) should be DBL_MAX (defined in float.h). In my test, when I print these values out, both are indeed exactly the same. However, when I cast them from double to int, strange things came out.
'Same' input, different results?
int32_t t1 = (int) std::numeric_limits<double>::max(); sets t1 to INT_MIN, but int32_t t2 = (int) DBL_MAX; sets t2 to INT_MAX. The same is true when the cast is done using static_cast<int>.
'Same' input, same results in similar situation
However, if I define a function
int32_t doubleToInt(double dvalue) {
return (int) value;
}
both doubleToInt(std::numeric_limits<double>::max()) and doubleToInt(DBL_MAX) return INT_MIN.
To help make sense of things, I implemented a similar program in Java. There, all casts returned the value of INT_MAX, regardless of being in a function or not.
Can someone point out the reason why in C++ the result is INT_MIN in some cases, and INT_MAX in the others? What should the expected behaviour be like when casting DBL_MAX to int in C++?
Sample Code for C++
#include <iostream>
#include <limits>
#include <float.h>
#include <stdlib.h>
#include <stdio.h>
using namespace std;
template <typename T, typename D> D cast(T a, D b) { return (D) a;}
int main()
{
int32_t t1 = 9;
std::cout << std::numeric_limits<double>::max() << std::endl;
std::cout << DBL_MAX << std::endl;
std::cout << (int32_t) std::numeric_limits<double>::max() << std::endl;
std::cout << (int32_t) DBL_MAX << std::endl;
std::cout << cast(std::numeric_limits<double>::max(), t1) << std::endl;
std::cout << cast(DBL_MAX, t1) << std::endl;
return 0;
}
For completeness: I am using cygwin gcc and java 8.

Attempting to convert a floating point number greater than INT_MAX to an int is undefined behaviour:
A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be
represented in the destination type. (§4.9 [conv.fpint], para. 1)
So a compiler can produce any value (or even do something else, like throw an exception) for the conversion. Different compilers can do different things. The same compiler can do different things at different times.
There is no real point attempting to understand why a particular instance of undefined behaviour shows the result it shows (unless you are trying to reverse engineer the compiler, and even then UB is not usually particularly interesting). Rather, you need to concentrate on avoiding undefined behaviour.
For example, since any out-of-range cast of a floating value to an integer is undefined, you need to ensure that such casts do not involve out-of-range values. Unlike some other languages [note 1], the C++ standard does not provide an easily-recognizable result which can be tested for, so you need to test before doing the cast.
Note that DBL_MAX is a macro, whose substitution is a string representing an approximation of the largest representable floating point number. std::numeric_limits<double>::max(), on the other hand, is the precise largest representable floating point number.
The difference should not normally be noticeable, but (as indicated by a note in the standard in §5.20 [expr.const], para 6):
Since this International Standard imposes no restrictions on the accuracy of floating-point operations, it is unspecified whether the evaluation of a floating-point expression during translation yields the same result as the evaluation of the same expression (or the same operations on the same values) during program execution.
Although std::numeric_limits<double>::max() is declared a constexpr, the cast to int is not a constant expression (as per §5.20/p2.5) precisely because its behaviour is undefined.
Notes
In Java, for example, the conversion is well defined. See the Java Language Specification for details.

log base 2 precision error in c++

Please explain output of the below given code.I m getting different values of c for both the cases i.e,
Case 1 : Value of n is taken from the standard input.
Case 2 : Value of n is directly written in the code.
link : http://www.ideone.com/UjYFQd
#include <iostream>
#include <cstdio>
#include <math.h>
using namespace std;
int main()
{
int c;
int n;
scanf("%d", &n); //n = 64
c = (log(n) / log(2));
cout << c << endl; //OUTPUT = 5
n = 64;
c = (log(n) / log(2));
cout << c << endl; //OUTPUT = 6
return 0;
}

You may see this because of how the floating point number is stored:
double result = log(n) / log(2); // where you input n as 64
int c = (int)result; // this will truncate result. If result is 5.99999999999999, you will get 5
When you hardcode the value, the compiler will optimize it for you:
double result = log(64) / log(2); // which is the same as 6 * log(2) / log(2)
int c = (int)result;
Will more than likely be replaced entirely with:
int c = 6;
Because the compiler will see that you are using a bunch of compile-time constants to store the value in a variable (it will go ahead and crunch the value at compile time).
If you want to get the integer result for the operation, you should use std::round instead of just casting to an int.
int c = std::round(log(n) / log(2));

The first time around, log(n)/log(2) is computed and the result is very close to 6 but slightly less. This is just how floating point computation works: neither log(64) nor log(2) have an infinitely precise representation in binary floating point, so you can expect the result of dividing one by the other to be slightly off from the true mathematical value. Depending on the implementation you can expect to get 5 or 6.
In the second computation:
n = 64;
c = (log(n) / log(2));
The value assigned to c can be inferred to be a compile-time constant and can be computed by the compiler. The compiler does the computation in a different environment than the program while it runs, so you can expect to get slightly different results from computations performed at compile-time and at runtime.
For example, a compiler generating code for x86 may choose to use x87 floating point instructions that use 80bit floating point arithmetic, while the compiler itself uses standard 64bit floating point arithmetic to compute compile-time constants.
Check the assembler output from your compiler to confirm this. Using GCC 4.8 I get 6 from both computations.

The difference in output can be explained by the fact that gcc is optimizing out the calls to log in the constant cases for example, in this case:
n = 64;
c = (log(n) / log(2));
both calls to log are being done at compile time, these compile time evaluation can cause different results. This is documented in the gcc manual in the Other Built-in Functions Provided by GCC section which says:
GCC includes built-in versions of many of the functions in the standard C library. The versions prefixed with _builtin are always treated as having the same meaning as the C library function even if you specify the -fno-builtin option. (see C Dialect Options) Many of these functions are only optimized in certain cases; if they are not optimized in a particular case, a call to the library function is emitted.
and log is one of the many functions that has builtin versions. If I build using -fno-builtin all four calls to log are made but without that only one call to log is emitted you can check this by building with the -S flag which will output the assembly which gcc generate.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js