Are floatN_t in stdfloat guarenteed to be IEEE compliant? - c++

Unlike fundamental types – float, double and long double – are the new floatN_t types in <stdfloat> introduced in C++23 going to be always IEEE standard binary floating point types?
The cppreference page for fixed width floating-point does mention the bits of precision and exponent, which matches with IEEE standards. But that page doesn't explicitly mentions about IEEE standards anywhere. IEEE compliant floating points means, they not only should have same bits of precision and exponent, but the standard also lists many operations which have to be supported in a standard compliant way. So do these types strictly follow that?

Yes. The relevant section from the latest Draft for the C++23 Standard (cited below) makes explicit mention of the ISO/IEC/IEEE 60559 floating-point standard for the float*_t types. That is identical to the IEEE-754 standard according to Wikipedia:
The international standard ISO/IEC/IEEE 60559:2011 (with content
identical to IEEE 754-2008) has been approved for adoption through
ISO/IEC JTC 1/SC 25 under the ISO/IEEE PSDO Agreement and published.
Here is the first part of the relevant section from the Draft C++23 Standard (the definitions for other 'precision' types are similar):
6.8.3 Optional extended floating-point types    [basic.extended.fp]
1    If the implementation supports an
extended floating-point type ([basic.fundamental]) whose properties
are specified by the ISO/IEC/IEEE 60559 floating-point interchange
format binary16, then the typedef-name std​::​float16_­t is
defined in the header <stdfloat> and names such a type, the macro
__STDCPP_­FLOAT16_­T__ is defined ([cpp.predefined]), and the
floating-point literal suffixes f16 and F16 are supported
([lex.fcon]).
…
(… And similarly for float32_t, float64_t, etc.)
Note: In terms of whether the cited paragraph demands that operations on such a type conform to the IEEE/ISO Standard, I would argue that it does. The "properties" of such variables includes their behaviour, and not just their representation format.

Yes, they are.
[stdfloat.syn] states that
The header defines type aliases for the optional extended floating-point types that are specified in [basic.extended.fp].
In turn, [basic.extended.fp] references types which are specified by ISO/IEC/IEEE 60559 floating-point interchange format
ISO/IEC/IEEE 60559 is the newer version of 754

Related

Are there any floating-point requirements concerning precision and range in the C++ standard

The C++ Standard makes the following statements:
[basic.fundamental]: There are three floating-point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined. [Note: This document imposes no requirements on the accuracy of floating-point operations; see also [support.limits]. — end note] Integral and floating-point types are collectively called arithmetic types. Specialisations of the standard library template std​::​numeric_­limits shall specify the maximum and minimum values of each arithmetic type for an implementation.
[support.limits]: The headers <limits> ([limits.syn]), <climits>; ([climits.syn], and <cfloat> ([cfloat.syn]) supply characteristics of implementation-dependent arithmetic types ([basic.fundamental]).
[cfloat.syn]: The header <cfloat> defines all macros the same as the C standard library header <float.h> See also: ISO C 5.2.4.2.2
This all seem to point towards the fact that the C++ standard does not want to make any statement what a float, double or long double should minimally be. However, the last quoted point references <float.h> of the C standard. This however defines a minimal requirement on the precision and range floating-point numbers.
Question: Does the statement in [cfloat.syn] imply that the same macros are defined and have an identical meaning. Or does it go one step further and also implies that the minimal requirements defined in the C-standard are followed?
5.2.4.2.2 Characteristics of floating types:
365 The values given in the following list shall be replaced by constant expressions with implementation-defined values that are greater or equal in magnitude (absolute value) to those shown, with the same sign:
...
FLT_DIG 6
DBL_DIG 10
LDBL_DIG 10
...
Related questions:
Does the C++ standard specify anything on the representation of floating point numbers?

The behaviour of floating point division by zero

Consider
#include <iostream>
int main()
{
double a = 1.0 / 0;
double b = -1.0 / 0;
double c = 0.0 / 0;
std::cout << a << b << c; // to stop compilers from optimising out the code.
}
I have always thought that a will be +Inf, b will be -Inf, and c will be NaN. But I also hear rumours that strictly speaking the behaviour of floating point division by zero is undefined and therefore the above code cannot considered to be portable C++. (That theoretically obliterates the integrity of my million line plus code stack. Oops.)
Who's correct?
Note I'm happy with implementation defined, but I'm talking about cat-eating, demon-sneezing undefined behaviour here.
C++ standard does not force the IEEE 754 standard, because that depends mostly on hardware architecture.
If the hardware/compiler implement correctly the IEEE 754 standard, the division will provide the expected INF, -INF and NaN, otherwise... it depends.
Undefined means, the compiler implementation decides, and there are many variables to that like the hardware architecture, code generation efficiency, compiler developer laziness, etc..
Source:
The C++ standard state that a division by 0.0 is undefined
C++ Standard 5.6.4
... If the second operand of / or % is zero the behavior is undefined
C++ Standard 18.3.2.4
...static constexpr bool is_iec559;
...56. True if and only if the type adheres to IEC 559 standard.217
...57. Meaningful for all floating point types.
C++ detection of IEEE754:
The standard library includes a template to detect if IEEE754 is supported or not:
static constexpr bool is_iec559;
#include <numeric>
bool isFloatIeee754 = std::numeric_limits<float>::is_iec559();
What if IEEE754 is not supported?
It depends, usually a division by 0 trigger a hardware exception and make the application terminate.
Quoting cppreference:
If the second operand is zero, the behavior is undefined, except that if floating-point division is taking place and the type supports IEEE floating-point arithmetic (see std::numeric_limits::is_iec559), then:
if one operand is NaN, the result is NaN
dividing a non-zero number by ±0.0 gives the correctly-signed infinity and FE_DIVBYZERO is raised
dividing 0.0 by 0.0 gives NaN and FE_INVALID is raised
We are talking about floating-point division here, so it is actually implementation-defined whether double division by zero is undefined.
If std::numeric_limits<double>::is_iec559 is true, and it is "usually true", then the behaviour is well-defined and produces the expected results.
A pretty safe bet would be to plop down a:
static_assert(std::numeric_limits<double>::is_iec559, "Please use IEEE754, you weirdo");
... near your code.
Division by zero both integer and floating point are undefined behavior [expr.mul]p4:
The binary / operator yields the quotient, and the binary % operator yields the remainder from the division
of the first expression by the second. If the second operand of / or % is zero the behavior is undefined. ...
Although implementation can optionally support Annex F which has well defined semantics for floating point division by zero.
We can see from this clang bug report clang sanitizer regards IEC 60559 floating-point division by zero as undefined that even though the macro __STDC_IEC_559__ is defined, it is being defined by the system headers and at least for clang does not support Annex F and so for clang remains undefined behavior:
Annex F of the C standard (IEC 60559 / IEEE 754 support) defines the
floating-point division by zero, but clang (3.3 and 3.4 Debian snapshot)
regards it as undefined. This is incorrect:
Support for Annex F is optional, and we do not support it.
#if STDC_IEC_559
This macro is being defined by your system headers, not by us; this is
a bug in your system headers. (FWIW, GCC does not fully support Annex
F either, IIRC, so it's not even a Clang-specific bug.)
That bug report and two other bug reports UBSan: Floating point division by zero is not undefined and clang should support Annex F of ISO C (IEC 60559 / IEEE 754) indicate that gcc is conforming to Annex F with respect to floating point divide by zero.
Though I agree that it isn't up to the C library to define STDC_IEC_559 unconditionally, the problem is specific to clang. GCC does not fully support Annex F, but at least its intent is to support it by default and the division is well-defined with it if the rounding mode isn't changed. Nowadays not supporting IEEE 754 (at least the basic features like the handling of division by zero) is regarded as bad behavior.
This is further support by the gcc Semantics of Floating Point Math in GCC wiki which indicates that -fno-signaling-nans is the default which agrees with the gcc optimizations options documentation which says:
The default is -fno-signaling-nans.
Interesting to note that UBSan for clang defaults to including float-divide-by-zero under -fsanitize=undefined while gcc does not:
Detect floating-point division by zero. Unlike other similar options, -fsanitize=float-divide-by-zero is not enabled by -fsanitize=undefined, since floating-point division by zero can be a legitimate way of obtaining infinities and NaNs.
See it live for clang and live for gcc.
Division by 0 is undefined behavior.
From section 5.6 of the C++ standard (C++11):
The binary / operator yields the quotient, and the binary % operator
yields the remainder from the division of the first expression by the
second. If the second operand of / or % is zero the behavior is
undefined. For integral operands the / operator yields the algebraic
quotient with any fractional part discarded; if the quotient a/b is
representable in the type of the result, (a/b)*b + a%b is equal to a .
No distinction is made between integer and floating point operands for the / operator. The standard only states that dividing by zero is undefined without regard to the operands.
In [expr]/4 we have
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined. [ Note: most existing implementations of C++ ignore integer overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all floating point exceptions vary among machines, and is usually adjustable by a library function. —end note ]
Emphasis mine
So per the standard this is undefined behavior. It does go on to say that some of these cases are actually handled by the implementation and are configurable. So it won't say it is implementation defined but it does let you know that implementations do define some of this behavior.
As to the submitter's question 'Who's correct?', it is perfectly OK to say that both answers are correct. The fact that the C standard describes the behavior as 'undefined' DOES NOT dictate what the underlying hardware actually does; it merely means that if you want your program to be meaningful according to the standard you -may not assume- that the hardware actually implements that operation. But if you happen to be running on hardware that implements the IEEE standard, you will find the operation is in fact implemented, with the results as stipulated by the IEEE standard.
This also depends on the floating point environment.
cppreference has details:
http://en.cppreference.com/w/cpp/numeric/fenv
(no examples though).
This should be available in most desktop/server C++11 and C99 environments. There are also platform-specific variations that predate the standardization of all this.
I would expect that enabling exceptions makes the code run more slowly, so probably for this reason most platforms that I know of disable exceptions by default.

Integer to float conversions with IEEE FP

What are the guarantees regarding conversions from integral to floating-point types in a C++ implementation supporting IEEE-754 FP arithmetic?
Specifically, is it always well-defined behaviour to convert any integral value to any floating-point type, possibly resulting in a value of +-inf? Or are there situations in which this would result in undefined behaviour?
(Note, I am not asking about exact conversion, just if performing the conversion is always legal from the point of view of the language standard)
IEC 60559 (the current successor standard to IEEE 754) makes integer-to-float conversion well-defined in all cases, as discussed in Franck's answer, but it is the language standard that has the final word on the subject.
In the base standard, C++11 section 4.9 "Floating-integral conversions", paragraph 2, makes out-of-range integer-to-floating-point conversions undefined behavior. (Quotation is from document N3337, which is the closest approximation to the official 2011 C++ standard that is publicly available at no charge.)
A prvalue of an integer type or of an unscoped enumeration type can be converted to a prvalue of a floating
point type. The result is exact if possible. If the value being converted is in the range of values that can
be represented but the value cannot be represented exactly, it is an implementation-defined choice of either
the next lower or higher representable value. [ Note: Loss of precision occurs if the integral value cannot
be represented exactly as a value of the floating type. — end note ] If the value being converted is outside
the range of values that can be represented, the behavior is undefined. If the source type is bool, the value
false is converted to zero and the value true is converted to one.
Emphasis mine. The C standard says the same thing in different words (section 6.3.1.4 paragraph 2).
The C++ standard does not discuss what it would mean for an implementation of C++ to supply IEC 60559-conformant floating-point arithmetic. However, the C standard (closest approximation to C11 available online at no charge is N1570) does discuss this in its Annex F, and C++ implementors do tend to turn to C for guidance when C++ leaves something unspecified. There is no explicit discussion of integer to floating point conversion in Annex F, but there is this sentence in F.1p1:
Since
negative and positive infinity are representable in IEC 60559 formats, all real numbers lie
within the range of representable values.
Putting that sentence together with 6.3.1.4p2 suggests to me that the C committee meant for integer-to-floating conversion to produce ±Inf when the integer's magnitude is outside the range of representable finite numbers. And that interpretation is consistent with the IEC 60559-specified behavior of conversions, so we can be reasonably confident that that's what an implementation of C that claimed to conform to Annex F would do.
However, applying any interpretation of the C standard to C++ is risky at best; C++ has not been defined as a superset of C for a very long time.
If your implementation of C++ predefines the macro __STDC_IEC_559__ and/or documents compliance with IEC 60559 in some way and you don't use the "be sloppy about floating-point math in the name of speed" compilation mode (which may be on by default), you can probably rely on out-of-range conversions to produce ±Inf. Otherwise, it's UB.
In section 7.4 the standard IEEE-754 (2008) says it is well-defined behavior for it. But it is related to IEEE-754 and C/C++ implementations are free to respect it or not (see the zwol answer).
The overflow exception shall be signaled if and only if
the destination format’s largest finite number is
exceeded in magnitude by what would have been the rounded floating-point result were the exponent
range unbounded. The default result shall be determined by the rounding-direction attribute and the sign of
the intermediate result as follows:
a)
roundTiesToEven and roundTiesToAway carry all overflows to
∞
with the sign of the intermediate
result.
b) roundTowardZero carries all overflows to the format’s largest finite number with the sign of the
intermediate result.
c)
roundTowardNegative carries positive overflows to the format’s largest finite number, and carries
negative overflows to
−∞
d) roundTowardPositive carries negative overflows to the format’s most negative finite number, and
carries positive overflows to +
∞
All cases of these 4 points provide a deterministic result for the conversion from integral to floating-point type.
All other cases (no overflow) are also well-defined with deterministic results given by the IEEE-754 standard.

What does standard say about cmath functions like std::pow, std::log etc?

Does the standard guarantee that functions return the exact same result across all implementations?
Take for example pow(float,float) for 32bit IEEE floats. Is the result across all implementations identical if the same two floats are passed in?
Or is there some flexibility that the standard allows with regard to tiny differences depending on the algorithm used to implement pow?
No, the C++ standard doesn't require the results of cmath functions to be the same across all implementations. For starters, you may not get IEEE-754/IEC 60559 floating point arithmetic.
That said, if an implementation does use IEC 60559 and defines __STDC_IEC_559__, then it must adhere to Annex F of the C standard (yes, your question is about C++, but the C++ standard defers to the C standard for C headers like math.h). Annex F states:
The float type matches the IEC 60559 single format.
The double type matches the IEC 60559 double format.
The long double type matches an IEC 60559 extended format, else a
non-IEC 60559 extended format, else the IEC 60559 double format.
Further, it says normal arithmetic must follow the IEC 60559 standard:
The +, −, *, and / operators provide the IEC 60559 add, subtract, multiply, and divide operations.
It further requires sqrt to follow IEC 60559:
The sqrt functions in <math.h> provide the IEC 60559 square root operation.
It then goes on to describe the behavior of several other floating-point functions, most of which you probably aren't interested in for this question.
Finally, it gets to the math.h header, and specifies how the various math functions (i.e. sin, cos, atan2, exp, etc.) should handle special cases (i.e. asin(±0) returns ±0, atanh(x) returns a NaN and raises the "invalid" floating-point exception for |x| > 1, etc.). But it never nails down the exact computation for normal inputs, which means you can't rely on all implementations producing the exact same computation.
So no, it doesn't require these functions to behave the same across all implementations, even if the implementations all define __STDC_IEC_559__.
This is all from a theoretical perspective. In practice, things are even worse. CPUs generally implement IEC 60559 arithmetic, but that can have different modes for rounding (so results will differ from computer to computer), and the compiler (depending on optimization flags) might make some assumptions that aren't strictly standards conforming in regards to your floating point arithmetic.
So in practice, it's even less strict than it is in theory, and you're very likely to see two computers produce slightly different results at some point or another.
A real world example of this is in glibc, the GNU C library implementation. They have a table of known error limits for their math functions across different CPUs. If all C math functions were bit-exact, those tables would all show 0 error ULPs. But they don't. The tables show there is indeed varying amounts of error in their C math functions. I think this sentence is the most interesting summary:
Except for certain functions such as sqrt, fma and rint whose results are fully specified by reference to corresponding IEEE 754 floating-point operations, and conversions between strings and floating point, the GNU C Library does not aim for correctly rounded results for functions in the math library[...]
The only things that are bit-exact in glibc are the things that are required to be bit-exact by Annex F of the C standard. And as you can see in their table, most things aren't.

What is the size of float and double in C and C++? [duplicate]

This question already has answers here:
Fixed-size floating point types
(4 answers)
Closed 8 years ago.
I was looking to see if there is any standard type similar to uint32_t which always would map into a 32-bit unsigned integral type but I could not find any.
Is the size of float always 4 byte on all platform?
Is the size of double always 8?
Does either standard say anything on the matter?
I want to make sure that my size is always the same on all platforms (x86 and x64) so I am using standard int types, but I could not find any similar typedef for float and double.
Excerpt from the C99 standard, normative annex F (The C++-standard does not explicitly mention this annex, though it includes all affected functions without change per reference. Also, the types have to match for compatibility.):
IEC 60559 floating-point arithmetic
F.1 Introduction
1 This annex specifies C language support for the IEC 60559 floating-point standard. The
IEC 60559 floating-point standard is specifically Binary floating-point arithmetic for
microprocessor systems, second edition (IEC 60559:1989), previously designated
IEC 559:1989 and as IEEE Standard for Binary Floating-Point Arithmetic
(ANSI/IEEE 754−1985). IEEE Standard for Radix-Independent Floating-Point
Arithmetic (ANSI/IEEE 854−1987) generalizes the binary standard to remove
dependencies on radix and word length. IEC 60559 generally refers to the floating-point
standard, as in IEC 60559 operation, IEC 60559 format, etc. An implementation that
defines __STDC_IEC_559__ shall conform to the specifications in this annex.356)
Where a binding between the C language and IEC 60559 is indicated, the
IEC 60559-specified behavior is adopted by reference, unless stated otherwise. Since
negative and positive infinity are representable in IEC 60559 formats, all real numbers lie
within the range of representable values.
So, include <math.h> (or in C++ maybe <cmath>), and test for __STDC_IEC_559__.
If the macro is defined, not only are the types better specified (float being 32bits and double being 64bits among others), but also the behavior of builtin operators and standard-functions is more specified.
Lack of the macro does not give any guarantees.
For x86 and x86_64 (amd64), you can rely on the types float and double being IEC-60559-conformant, though functions using them and operations on them might not be.
Does not say anything about the size.
3.9.1.8
There are three floating point types: float, double, and long double.
The type double provides at least as much precision as float, and the
type long double provides at least as much precision as double. The
set of values of the type float is a subset of the set of values of
the type double; the set of values of the type double is a subset of
the set of values of the type long double. The value representation of
floating-point types is implementation-defined. Integral and floating
types are collectively called arithmetic types. Specializations of the
standard template std::numeric_limits (18.3) shall specify the maximum
and minimum values of each arithmetic type for an implementation.
The C++ standard doesn't say anything, but in most of the platforms C++ use the single/double precision standard from IEEE, which define single precision as 4 bytes, and double precision as 8 bytes.
http://en.wikipedia.org/wiki/Single-precision_floating-point_format
http://en.wikipedia.org/wiki/Double-precision_floating-point_format
I'm not sure about the exceptions for these cases.
As floating point operations are implemented at a low level by CPUs, the C++ standard does not mandate a size for either a float, double or long double. All it says is that the order I specified them is in equal or increasing order of precision.
Your best bet is to use static_assert, sizeof, typedef and #define carefully in order to define cross platform floating point types.
I want to point out that even if you have same size floats you can not be sure these floats are equally interpreted on different platforms. You can read a lot of papers about 'floats over network'. Floats non-determinism is a known problem.
You can try to use a library offering cross-platform data types compatibility.
"The integral types C++ inherited from C are a cross-platform hazard. int, long and friends have different sizes on different platforms (32-bit and 64-bit on today's systems, maybe 128-bit later). For some applications it might seem irrelevant because they never approach the 32-bit limit (or rather 31-bit if you use unsigned integers), but if you serialize your objects on a 64-bit system and deserialize on a 32-bit system you might be unpleasantly surprised.
APR provides a set of typedefs for basic types that might be different on different platforms. These typedefs provide a guaranteed size and avoid the fuzzy built-in types. However, for some applications (mostly numerical) it is sometimes important to use the native machine word size (typically what int stands for) to achieve maximal performance."
Gigi SAYFAN - Building Your Own Plugin Framework (From http://philippe.ameline.free.fr/techytechy/071125_PluginFramework.htm)
In the case of X86, even if using IEEE single and double precision numbers, the internal calculations are affected by a floating point control word (FCW). The internal calculations are normally 64 bit or 80 bit (long double). You can override this using inline assembly code, but there's no guarantee that some double precision library function won't set it back.
Microsoft supported 80 bit long doubles with their 16 bit compilers, but dropped support for them with their 32 bit and 64 bit compilers, and long doubles are now the same as doubles at 64 bits.