We have special functions like std::nanl to make a NaN with a payload. Currently here's what I have to do to print it back:
#include <iostream>
#include <cmath>
#include <cstring>
#include <cstdint>
int main()
{
const auto x=std::nanl("1311768467463790325");
std::uint64_t y;
std::memcpy(&y,&x,sizeof y);
std::cout << (y&~(3ull<<62)) << "\n";
}
This relies on the particular representation of long double, namely on it being 80-bit type of x87 FPU. Is there any standard way to achieve this without relying on such detail of implementation?
C++ imports nan* functions from ISO C. ISO C states in 7.22.1.3:
the meaning of the n-char sequence is implementation-defined
with a comment
An implementation may use the n-char sequence to determine extra information to be represented in the NaN’s significand.
There is no method to get the stored information.
I stumbled across this one here in 2023.Things haven’t improved much.
C11 supports nan*() functions (if QNaN is supported on your target processor), but
MSVC 2022 does not actually implement payload compiling
Payload must specified as a string anyway, and
There is still no Standard way to get the data.
(C23 proposes the GNU extension getPayload(), but it returns yet another double, which is far less interesting than an integer would have been.)
However
It has always been possible to get a QNaN payload, assuming you have a proper IEEE 754 QNaN with payload data. It has been put to good use on systems that do in things like Javascript and Lua, for example.[citation needed]
According to Wikipedia, after discussing some dinosaurs: [link]
It may therefore appear strange that the widespread IEEE 754 floating-point standard does not specify endianness.[3] Theoretically, this means that even standard IEEE floating-point data written by one machine might not be readable by another. However, on modern standard computers (i.e., implementing IEEE 754), one may safely assume that the endianness is the same for floating-point numbers as for integers, making the conversion straightforward regardless of data type. Small embedded systems using special floating-point formats may be another matter, however.Emphasis added
So as long as you aren’t leaking abstractions outside of internal use or playing with specialized (or ancient) hardware then you should be good to play with stuffing stuff in your QNaNs.
As this question is tagged C++ we will have to resort to slightly uglier code than strictly necessary in C, as type-punning with a union is (probably) UB in C++.[more link] The following should work in both C and C++ and produce just as well-optimized code either way.
Da codez or go home
qnan.h
#ifndef QNAN_H
#define QNAN_H
// Copyright stackoverflow.com
// Distributed under the Boost Software License, Version 1.0.
// (See accompanying file LICENSE_1_0.txt or copy at
// https://www.boost.org/LICENSE_1_0.txt )
#include <assert.h>
#include <math.h>
#include <stdint.h>
#ifndef NAN
#error "IEEE 754 Quiet NaN is required."
#endif
#ifndef UINT64_MAX
#error "uint64_t required."
#endif
static_assert( sizeof(double) == 8, "IEEE 754 64-bit double-precision is required" );
double qnan ( unsigned long long payload );
unsigned long long qnan_payload ( double qnan );
#endif
qnan.c
// Copyright stackoverflow.com
// Distributed under the Boost Software License, Version 1.0.
// (See accompanying file LICENSE_1_0.txt or copy at
// https://www.boost.org/LICENSE_1_0.txt )
#include <string.h>
#include "qnan.h"
double
qnan( unsigned long long payload )
{
double qnan = NAN;
uint64_t n;
memcpy( &n, &qnan, 8 );
n |= payload & 0x7FFFFFFFFFFFFULL;
memcpy( &qnan, &n, 8 );
return qnan;
}
unsigned long long
qnan_payload( double qnan )
{
uint64_t n;
memcpy( &n, &qnan, 8 );
return n & 0x7FFFFFFFFFFFFULL;
}
These two functions allow you access to all 51 bits of payload data as an unsigned integer.
Note, however, that unlike the weird-o getPayload() function the qnan_payload() function does not bother to fact-check you about your choice of input — it assumes you have given it an actual QNaN.
If you are unsure what kind of double you have, the isnan() function from <math.h> works just fine to check for QNaN-ness.
Similar code will give you access to a four-byte float or a N-byte long double (which is probably just an 8-byte double, unless it isn’t, and is probably more trouble supporting than it’s worth).
Related
I've built a custom version of frexp:
auto frexp(float f) noexcept
{
static_assert(std::numeric_limits<float>::is_iec559);
constexpr uint32_t ExpMask = 0xff;
constexpr int32_t ExpOffset = 126;
constexpr int MantBits = 23;
uint32_t u;
std::memcpy(&u, &f, sizeof(float)); // well defined bit transformation from float to int
int exp = ((u >> MantBits) & ExpMask) - ExpOffset; // extract the 8 bits of the exponent (it has an offset of 126)
// divide by 2^exp (leaving mantissa intact while placing "0" into the exponent)
u &= ~(ExpMask << MantBits); // zero out the exponent bits
u |= ExpOffset << MantBits; // place 126 into exponent bits (representing 0)
std::memcpy(&f, &u, sizeof(float)); // copy back to f
return std::make_pair(exp, f);
}
By checking is_iec559 I'm making sure that float fulfills
the requirements of IEC 559 (IEEE 754) standard.
My question is: Does this mean that the bit operations I'm doing are well defined and do what I want? If not, is there a way to fix it?
I tested it for some random values and it seems to be correct, at least on Windows 10 compiled with msvc and on wandbox. Note however, that (on purpose) I'm not handling the edge cases of subnormals, NaN, and inf.
If anyone wonders why I'm doing this: In benchmarks I found that this version of frexp is up to 15 times faster than std::frexp on Windows 10. I haven't tested other platforms yet. But I want to make sure that this not just works by coincident and may brake in future.
Edit:
As mentioned in the comments, endianess could be an issue. Does anybody know?
"Does this mean that the bit operations I'm doing are well defined..."
The TL;DR;, by the strict definition of "well defined": no.
Your assumptions are likely correct but not well defined, because there are no guarantees about the bit width, or the implementation of float. From § 3.9.1:
there are three floating point types: float, double, and long double. The type double provides at least as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined.
The is_iec559 clause only qualifies with:
True if and only if the type adheres to IEC 559 standard
If a literal genie wrote you a terrible compiler, and made float = binary16, double = binary32, and long double = binary64, and made is_iec559 true for all the types, it would still adhere to the standard.
does that mean that I can extract exponent and mantissa in a well defined way?
The TL;DR;, by the limited guarantees of the C++ standard: no.
Assume you use float32_t and is_iec559 is true, and logically deduced from all the rules that it could only be binary32 with no trap representations, and you correctly argued that memcpy is a well defined for conversion between arithmetic types of the same width, and won't break strict aliasing. Even with all those assumptions, the behavior might be well defined but it's only likely and not guaranteed that you can extract the mantissa this way.
The IEEE 754 standard and 2's complement regard bit string encodings, and the behavior of memcpy is described using bytes. While it's plausible to assume the bit string of uint32_t and float32_t would be encoded the same way (e.g. same endianness), there's no guarantee in the standard for that. If the bit strings are stored differently and you shift and mask the copied integer representation to get the mantissa, the answer will be incorrect, despite the memcpy behavior being well defined.
As mentioned in the comments, endianess could be an issue. Does anybody know?
At least a few architectures have used different endianness for floating point registers and integer registers. The same link says that except for small embedded processors, this isn't a concern. I trust Wikipedia entirely for all topics and refuse to do any further research.
I am new in boost. I have 128 bit integer (int128_t boost/multiprecision/cpp_int.hpp) in my project, which I need to divide to floating point number. In my current platform I have limitation and can't use boost/multiprecision/float128.hpp. It's still not supported in clang now https://github.com/boostorg/math/issues/181
Is there any way to this with boost math lib?
Although you can't use float128, Boost has several other implementations of long floating-point types:
cpp_bin_float
cpp_dec_float
gmp_float
mpfr_float
In particular, if you need binary high-precision floating-point type without dependencies on external libraries like GMP, you can use cpp_bin_float. Example:
#include <iomanip>
#include <iostream>
#include <boost/multiprecision/cpp_int.hpp>
#include <boost/multiprecision/cpp_bin_float.hpp>
int main()
{
using LongFloat=boost::multiprecision::cpp_bin_float_quad;
const auto x=boost::multiprecision::int128_t(1234123521);
const auto y=LongFloat(34532.52346246234);
const auto z=LongFloat(x)/y;
std::cout << "Ratio: " << std::setprecision(10) << z << "\n";
}
Here we've used a built-in typedef for 113-bit floating-point number, which has the same precision and range as IEEE 754 binary128. You can choose other parameters for the precision and range, see the docs I've linked to above for details.
Note though, that int128_t has more precision than any kind of float128, because some bits of the latter are used to store its exponent. If that's an issue, be sure to use higher precision.
Perhaps split the int128 into 64-bit numbers?
i128 = h64 * (1<<64) + l64
Then you could easily load those values shift and sum them on the 64bit floating point to get the equivalent number.
Or, as the floating point hardware is actually only 64 bit precision anyway, you could just shift down your int128 until it fits in 64 bit, cast that to float and then shift it back up, but the former may actually be faster because it is simpler.
I want to use exponential function which returns IEEE_FLOAT64 value
Currently I am using expf function, but still I am getting lots of warnings.
value = IEEEPosOne - (IEEE_FLOAT64)expf(value1);
From man 3 exp:
NAME
exp, expf, expl - base-e exponential function
SYNOPSIS
#include <math.h>
double exp(double x);
float expf(float x);
long double expl(long double x);
Link with -lm.
So just use exp().
//c++
#include <cmath>
double x = 7.0;//float64
auto y = std::exp(x);//exp(float64);
C++ standard provides appropriate overloads. No need to reflect operand type in function name.
This does answer does not apply to C only for C++
For C see #iBug's answer.
The C++ standard does not require an implementation to use the IEEE standard. Though this is usually the easiest floating point implementation to use as the chips are relatively standard in modern machines.
The standard provides a way to check using std::numeric_limits.
So if it is a requirement then you should validate.
#include <limits>
#include <iostream>
int main()
{
static_assert(sizeof(double) == 64);
static_assert(std::numeric_limits<double>::is_iec559());
// If the above compiles your double is IEEE 64 bit value.
// Or IEEE_754 compliant https://en.wikipedia.org/wiki/IEEE_754_revision
}
Now that you have established you are using IEEE values you can look at the cmath header to see that the functions there all take and return a double value.
exp
log
etc....
Note: You should note that Windows machines (usually) use an 80 bit floating point register (not 64). So things can get super whaky if you need strict compliance.
Note: Do NOT use:
expf() for float,
expl() for long double
These are for C library users where the language does not do the correct type checking. In C++ the language uses overloading to use the correct version of the function. If you look at the standard for exp:
Defined in header <cmath>
* float exp( float arg );
* double exp( double arg );
* long double exp( long double arg );
* double exp( Integral arg );
The above are all in the standard namespace.
std::cout << std::exp(100000.1) << "\n";
Notice that exp() can take any floating point type float, double or long double and generate the appropriate result type.
For example:
float a = 3.14159f;
If I was to inspect the bits in this number (or any other normalized floating point number), what are the chances that the bits are different in some other platform/compiler combination, or is that possible?
Not necessarily: The c++ standard doesn't define floating point representation (it doesn't even define the representation of signed integers), although most platforms probably orient themselves on the same IEEE standard (IEEE 754-2008?).
Your question can be rephrased as: Will the final assertion in the following code always be upheld, no matter what platform you run it on?
#include <cassert>
#include <cstring>
#include <cstdint>
#include <limits>
#if __cplusplus < 201103L // no static_assert prior to C++11
#define static_assert(a,b) assert(a)
#endif
int main() {
float f = 3.14159f;
std::uint32_t i = 0x40490fd0;// IEC 659/IEEE 754 representation
static_assert(std::numeric_limits<float>::is_iec559, "floating point must be IEEE 754");
static_assert(sizeof(f) == sizeof(i), "float must be 32 bits wide");
assert(std::memcmp(&f, &i, sizeof(f)) == 0);
}
Answer: There's nothing in the C++ standard that guarantees that the assertion will be upheld. Yet, on most sane platforms the assertion will hold and the code won't abort, no matter if the platform is big- or little-endian. As long as you only care that your code works on some known set of platforms, it'll be OK: you can verify that the tests pass there :)
Realistically speaking, some compilers might use a sub-par decimal-to-IEEE-754 conversion routine that doesn't properly round the result, so if you specify f to enough digits of precision, it might be a couple of LSBs of mantissa off from the value that would be nearest to the decimal representation. And then the assertion won't hold anymore. For such platforms, you might wish to test a couple mantissa LSBs around the desired one.
If I test my code with the following:
#ifndef __STDC_IEC_559__
#error Warning: __STDC_IEC_559__ not defined. The code assumes we're using the IEEE 754 floating point for binary serialization of floats and doubles.
#endif
...such as is described here, am I guaranteed that this:
float myFloat = ...;
unsigned char *data = reinterpret_cast<unsigned char*>(&myFloat)
unsigned char buffer[4];
std::memcpy(&Buffer[0], data, sizeof(float));
...would safely serialize the float for writing to a file or network packet?
If not, how can I safely serialize floats and doubles?
Also, who's responsible for byte ordering - my code or the Operating System?
To clarifiy my question: Can I cast floats to 4 bytes and doubles to 8 bytes, and safely serialize to and from files or across networks, if I:
Assert that we're using IEC 559
Convert the resulting to/from a standard byte order (such as network byte order).
__STDC_IEC_559__ is a macro defined by C99/C11, I didn't find reference about whether C++ guarantees to support it.
A better solution is to use std::numeric_limits< float >::is_iec559 or std::numeric_limits< double >::is_iec559
C++11 18.2.1.1 Class template numeric_limits
static const bool is_iec559 ;
52 True if and only if the type adheres to IEC 559 standard.210)
53 Meaningful for all floating point types.
In the footnote:
210) International Electrotechnical Commission standard 559 is the same as IEEE 754.
About your second assumption, I don't think you can say any byte order is "standard", but if the byte order is the same between machines(little or big endian), then yes, I think you can serialize like that.
How about considering standard serialization like XDR [used in Unix RPC] or CDR etc ?
http://en.wikipedia.org/wiki/External_Data_Representation
for example :
bool_t xdr_float(XDR *xdrs, float *fp); from linux.die.net/man/3/xdr
or a c++ library
http://xstream.sourceforge.net/
You might also be intersted in CDR [used by CORBA] , ACE [adaptive communication environment] has CDR classes [But its very heavy library]