_mm_max_ss has different behavior between clang and gcc

_mm_max_ss has different behavior between clang and gcc - c++

I'm trying to cross compile a project using clang and gcc but I'm seeing some odd differences when using _mm_max_ss e.g.
__m128 a = _mm_set_ss(std::numeric_limits<float>::quiet_NaN());
__m128 b = _mm_set_ss(2.0f);
__m128 c = _mm_max_ss(a,b);
__m128 d = _mm_max_ss(b,a);
Now I expected std::max type behavior when NaNs are involved but clang and gcc give different results:
Clang: (what I expected)
c: 2.000000 0.000000 0.000000 0.000000
d: nan 0.000000 0.000000 0.000000
Gcc: (Seems to ignore order)
c: nan 0.000000 0.000000 0.000000
d: nan 0.000000 0.000000 0.000000
_mm_max_ps does the expected thing when I use it. I've tried using -ffast-math, -fno-fast-math but it doesn't seem to have an effect. Any ideas to make the behavior similar across compilers?
Godbolt link here

My understanding is that IEEE-754 requires: (NaN cmp x) to return false for all cmp operators {==, <, <=, >, >=}, except {!=} which returns true. An implementation of a max() function might be defined in terms of any of the inequality operators.
So, the question is, how is _mm_max_ps implemented? With {<, <=, >, >=}, or a bit comparison?
Interestingly, when disabling optimization in your link, the corresponding maxss instruction is used by both gcc and clang. Both yield:
2.000000 0.000000 0.000000 0.000000
nan 0.000000 0.000000 0.000000
This suggests, given: max(NaN, 2.0f) -> 2.0f, that: max(a, b) = (a op b) ? a : b, where op is one of: {<, <=, >, >=}. With IEEE-754 rules, the result of this comparison is always false, so:
(NaN op val) is always false, returning (val),
(val op NaN) is always false, returning (NaN)
With optimization on, the compiler is free to precompute (c) and (d) at compile time. It appears that clang evaluates the results as the maxss instruction would - correct 'as-if' behaviour. GCC is either falling back on another implementation of max() - it uses the GMP and MPFR libraries for compile-time numerics - or is just being careless with the _mm_max_ss semantics.
GCC is still getting it wrong with 10.2 and trunk versions on godbolt. So I think you've found a bug! I haven't answered the second part, because I can't think of an all-purpose hack that will efficiently work around this.
From Intel's ISA reference:
If the values being compared are both 0.0s (of either sign), the value
in the second source operand is returned. If a value in the second
source operand is an SNaN, that SNaN is returned unchanged to the
destination (that is, a QNaN version of the SNaN is not returned).
If only one value is a NaN (SNaN or QNaN) for this instruction, the
second source operand, either a NaN or a valid floating-point value,
is written to the result. If instead of this behavior, it is required
that the NaN from either source operand be returned, the action of
MAXSS can be emulated using a sequence of instructions, such as, a
comparison followed by AND, ANDN and OR.

Related

Neglecting NaN terms in a summation in Fortran

Is there a way to combine NaN and ordinary numbers in a different way then usually done in Fortran?
I have several summations which contains 'safe' terms, which cannot be NaN, and some other terms which can be NaN.
I would like the evaluation of the expression to neglect the addends in case they are NaN.
I cannot just get rid of them multiplying them times a null factor when they are NaN as NaN x 0 gives NaN anyway.
Ideas?
Thanks

There is no arithmetic operation that does not propagate NaN. So ideas like multiplying by 0 will not work.
Your only solution is to miss out the NaN terms in the sum. Do that with something based on
IF (IEEE_IS_NAN(x))
If you are not using IEEE754 or are using an older standard of FORTRAN, then you can use
IF(x .NE. x)
which will be TRUE if and only if x is NaN.

The behaviour of floating point division by zero

Consider
#include <iostream>
int main()
{
double a = 1.0 / 0;
double b = -1.0 / 0;
double c = 0.0 / 0;
std::cout << a << b << c; // to stop compilers from optimising out the code.
}
I have always thought that a will be +Inf, b will be -Inf, and c will be NaN. But I also hear rumours that strictly speaking the behaviour of floating point division by zero is undefined and therefore the above code cannot considered to be portable C++. (That theoretically obliterates the integrity of my million line plus code stack. Oops.)
Who's correct?
Note I'm happy with implementation defined, but I'm talking about cat-eating, demon-sneezing undefined behaviour here.

C++ standard does not force the IEEE 754 standard, because that depends mostly on hardware architecture.
If the hardware/compiler implement correctly the IEEE 754 standard, the division will provide the expected INF, -INF and NaN, otherwise... it depends.
Undefined means, the compiler implementation decides, and there are many variables to that like the hardware architecture, code generation efficiency, compiler developer laziness, etc..
Source:
The C++ standard state that a division by 0.0 is undefined
C++ Standard 5.6.4
... If the second operand of / or % is zero the behavior is undefined
C++ Standard 18.3.2.4
...static constexpr bool is_iec559;
...56. True if and only if the type adheres to IEC 559 standard.217
...57. Meaningful for all floating point types.
C++ detection of IEEE754:
The standard library includes a template to detect if IEEE754 is supported or not:
static constexpr bool is_iec559;
#include <numeric>
bool isFloatIeee754 = std::numeric_limits<float>::is_iec559();
What if IEEE754 is not supported?
It depends, usually a division by 0 trigger a hardware exception and make the application terminate.

Quoting cppreference:
If the second operand is zero, the behavior is undefined, except that if floating-point division is taking place and the type supports IEEE floating-point arithmetic (see std::numeric_limits::is_iec559), then:
if one operand is NaN, the result is NaN
dividing a non-zero number by ±0.0 gives the correctly-signed infinity and FE_DIVBYZERO is raised
dividing 0.0 by 0.0 gives NaN and FE_INVALID is raised
We are talking about floating-point division here, so it is actually implementation-defined whether double division by zero is undefined.
If std::numeric_limits<double>::is_iec559 is true, and it is "usually true", then the behaviour is well-defined and produces the expected results.
A pretty safe bet would be to plop down a:
static_assert(std::numeric_limits<double>::is_iec559, "Please use IEEE754, you weirdo");
... near your code.

Division by zero both integer and floating point are undefined behavior [expr.mul]p4:
The binary / operator yields the quotient, and the binary % operator yields the remainder from the division
of the first expression by the second. If the second operand of / or % is zero the behavior is undefined. ...
Although implementation can optionally support Annex F which has well defined semantics for floating point division by zero.
We can see from this clang bug report clang sanitizer regards IEC 60559 floating-point division by zero as undefined that even though the macro __STDC_IEC_559__ is defined, it is being defined by the system headers and at least for clang does not support Annex F and so for clang remains undefined behavior:
Annex F of the C standard (IEC 60559 / IEEE 754 support) defines the
floating-point division by zero, but clang (3.3 and 3.4 Debian snapshot)
regards it as undefined. This is incorrect:
Support for Annex F is optional, and we do not support it.
#if STDC_IEC_559
This macro is being defined by your system headers, not by us; this is
a bug in your system headers. (FWIW, GCC does not fully support Annex
F either, IIRC, so it's not even a Clang-specific bug.)
That bug report and two other bug reports UBSan: Floating point division by zero is not undefined and clang should support Annex F of ISO C (IEC 60559 / IEEE 754) indicate that gcc is conforming to Annex F with respect to floating point divide by zero.
Though I agree that it isn't up to the C library to define STDC_IEC_559 unconditionally, the problem is specific to clang. GCC does not fully support Annex F, but at least its intent is to support it by default and the division is well-defined with it if the rounding mode isn't changed. Nowadays not supporting IEEE 754 (at least the basic features like the handling of division by zero) is regarded as bad behavior.
This is further support by the gcc Semantics of Floating Point Math in GCC wiki which indicates that -fno-signaling-nans is the default which agrees with the gcc optimizations options documentation which says:
The default is -fno-signaling-nans.
Interesting to note that UBSan for clang defaults to including float-divide-by-zero under -fsanitize=undefined while gcc does not:
Detect floating-point division by zero. Unlike other similar options, -fsanitize=float-divide-by-zero is not enabled by -fsanitize=undefined, since floating-point division by zero can be a legitimate way of obtaining infinities and NaNs.
See it live for clang and live for gcc.

Division by 0 is undefined behavior.
From section 5.6 of the C++ standard (C++11):
The binary / operator yields the quotient, and the binary % operator
yields the remainder from the division of the first expression by the
second. If the second operand of / or % is zero the behavior is
undefined. For integral operands the / operator yields the algebraic
quotient with any fractional part discarded; if the quotient a/b is
representable in the type of the result, (a/b)*b + a%b is equal to a .
No distinction is made between integer and floating point operands for the / operator. The standard only states that dividing by zero is undefined without regard to the operands.

In [expr]/4 we have
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined. [ Note: most existing implementations of C++ ignore integer overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all floating point exceptions vary among machines, and is usually adjustable by a library function. —end note ]
Emphasis mine
So per the standard this is undefined behavior. It does go on to say that some of these cases are actually handled by the implementation and are configurable. So it won't say it is implementation defined but it does let you know that implementations do define some of this behavior.

As to the submitter's question 'Who's correct?', it is perfectly OK to say that both answers are correct. The fact that the C standard describes the behavior as 'undefined' DOES NOT dictate what the underlying hardware actually does; it merely means that if you want your program to be meaningful according to the standard you -may not assume- that the hardware actually implements that operation. But if you happen to be running on hardware that implements the IEEE standard, you will find the operation is in fact implemented, with the results as stipulated by the IEEE standard.

This also depends on the floating point environment.
cppreference has details:
http://en.cppreference.com/w/cpp/numeric/fenv
(no examples though).
This should be available in most desktop/server C++11 and C99 environments. There are also platform-specific variations that predate the standardization of all this.
I would expect that enabling exceptions makes the code run more slowly, so probably for this reason most platforms that I know of disable exceptions by default.

Float in c++ smaller range than IEEE 754

I try to make the following division: 1/16777216, what is equal to 5.96046448e-8
but this:
printf("number: %f \n", 1.0f / 16777216.0f);
allways gives me 0.00000 instead of the answer I would expect.
I looked up the ranges, because I thought well, that might be a problem that float is simply to smal
to handle such a number, but IEEE 754 states it to be ±1.18×10−38.
Am I missing something and thats why the result not the expected one?

When using fixed formatting (%f) you get a format with a decimal point and up to 6 digits. Since the value you used rounds to a value smaller than 0.000001 it seems reasonable to have 0.000000 printed. You can either use more digits (I think using %.10f but I'm not that good at <stdio.h> format specifiers) or you change the format to use either scientific notation (%e) or the "better" of both options (%g).

value changes while reinterpret_cast

while converting an integer to float by using reinterpret_cast the content of memory changes.
For example,
float fDest = 0;
__int32 nTempDest = -4808638;
fDest = *reinterpret_cast<float*>(&nTempDest);
Hexadecimal representation for the variable value nTempest is '42 a0 b6 ff' but after reinterpret_cast the content of fDest is '42 a0 f6 ff'.
Can any one could give an answer why this third byte changed from b6 to f6.

In pure C++ this is in fact undefined behavior. Nevertheless there is an explanation for what you see.
I assume that the hexadecimal representations you give are from a bytewise view of memory. Obviously you are on a little endian architecture. So the 32-bit quantity we are starting from is 0xffb6a042, which is indeed the two's complement representation of -4808638.
As a IEC 60559 single-precision floating point number (also 32 bits) 0xffb6a042 is a negative, signaling NaN. NaNs in this representation have the form (in binary)
s1111111 1qxxxxxx xxxxxxxx xxxxxxxx
Here s is the sign, the x are arbitrary and q=1 for a quiet NaN and q=0 for a signaling NaN.
Now you are using the signaling Nan in that you assign it to fDest. This would raise a floating point invalid exception if floating point signaling is active. By default such exceptions are simply ignored and signaling NaN values are 'quieted' when propagated on.
So: In assigning to fDest, the NaN is propagated and the implementation converts it to a quiet NaN by setting bit 22. This is the change you observe.

Your code produces an Undefined Behavior(UB).
reinterpret_cast only gives the guarantee that if you cast from one pointer type to another and cast it back to the original pointer type then you get the original data. Anything other than that produces UB[Note 1]
This is an UB because you cannot rely on the fact that:
sizeof(float) == sizeof(nTempDest)
This is not guaranteed to be true on all implementations, definitely not true for the ones which follow strict aliasing. And if the implementation doesn't what you get is Undefined Behavior.
[Note 1]There are exceptions to this rule, if you need to rely on these corner rules, you are swimming in rough waters, So be absolutely sure of what you are doing.

glsl infinity constant

Does GLSL have any pre-defined constants for +/-infinity or NaN? I'm doing this as a workaround but I wonder if there is a cleaner way:
// GLSL FRAGMENT SHADER
#version 410
<snip>
const float infinity = 1. / 0.;
void main ()
{
<snip>
}
I am aware of the isinf function but I need to assign infinity to a variable so that does not help me.

Like Nicol mentioned, there are no pre-defined constants.
However, from OpenGL 4.1 on, your solution is at least guaranteed to work and correctly generate an infinite value.
See for example in glsl 4.4:
4.7.1 Range and Precision
...
However, dividing a non-zero by 0 results in the
appropriately signed IEEE Inf: If both positive and negative zeros are implemented, the correctly signed
Inf will be generated, otherwise positive Inf is generated.
Be careful when you use an older version of OpenGL though:
For example in glsl 4.0 it says:
4.1.4 Floats
...
Similarly, treatment of conditions such as divide by 0 may lead to an unspecified result, but in no case should such a condition lead to the interruption or termination of processing.

There are no pre-defined constants for it, but there is the isinf function to test if something is infinity.
While I'm at it, are there constants for other things like FLT_MAX FLT_EPSILON etc the way there are in C?
No, there are not.

This might work?
const float pos_infinity = uintBitsToFloat(0x7F800000);
const float neg_infinity = uintBitsToFloat(0xFF800000);
"If the encoding of a floating point infinity is passed in parameter x, the resulting floating-point value is the corresponding (positive or negative) floating point infinity"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js