specify floating point constant value bitwise in c/c++ - c++

This is what I am trying to do:
//Let Bin2Float be a magic macro that packages specified bit pattern into float as a constant
const float MyInf = Bin2Float(01111111,10000000,00000000,00000000);
We all know how to package the bit patterns into integers ("binary constant" hacks) and the input to this magic prototype macro is the same as would be for corresponding 32-bit integer binary constant macro. Packaging the bits into integer constant is not a problem. But, after playing with pointer and union punning, I realized that type-punning the integer into float, however, leads to many issues (some on MSVC side, some on gcc side). So here is the list of requirements:
Must compile under gcc (C mode), g++, MSVC (even if I have to use conditional compiling to do two separate versions)
Must compile for both C and C++
In resulting assembly code, must compile into hardcoded constant,
not be dynamically computed
Must not use memcpy
Must not use static or global variables
Must not use the pointer-based type punning to avoid issues with
strict aliasing

First, there is rarely a need to specify floating-point constants in this way. For infinity, use INFINITY. For a NaN, use either NAN or nanf(string). These are defined in <math.h>. The compiler is likely to compile INFINITY and NAN to some sort of assembly-language constant (could be in the read-only data section, could be formed in immediate fields of instructions, et cetera). However, this cannot be guaranteed except by the compiler implementors, since the C standard does not guarantee it. nanf is likely to result in a function call, although the compiler is free to optimize it to a constant, if the string is a constant. For finite numbers, use hexadecimal floating-point constants (e.g., “0x3.4p5”). The only IEEE 754 floating-point object you cannot completely specify this way, down the last bit, is NaNs. The nan and nanf functions are not fully specified by the C standard, so you do not have full control of the significand bits unless the implementation provides it.
I am unfamiliar with the binary constant hacks you allude to. Supposing you have a macro Bin2Unsigned that provides an unsigned int, then you can use this:
const float MyInf = (union { unsigned u; float f; }) { Bin2Unsigned(…) } .f;
That is, believe it or not, standard C syntax and semantics up to the point where the bits are reinterpreted as a float. Obviously, the interpretation of the bits depends on the implementation. However, the compound literal and reinterpreting through a union is specified by the C standard.
I tested with gcc version 4.2.1 (Apple Inc. build 5666), targeting x86_64, with -O3 and default options otherwise, and the resulting assembly code used a constant, .long 2139095040.

Related

Is it guaranteed that the copy of a float variable will be bitwise equivalent to the original?

I am working on floating point determinism and having already studied so many surprising potential causes of indeterminism, I am starting to get paranoid about copying floats:
Does anything in the C++ standard or in general guarantee me that a float lvalue, after being copied to another float variable or when used as a const-ref or by-value parameter, will always be bitwise equivalent to the original value?
Can anything cause a copied float to be bitwise inquivalent to the original value, such as changing the floating point environment or passing it into a different thread?
Here is some sample code based on what I use to check for equivalence of floating point values in my test-cases, this one will fail because it expects FE_TONEAREST:
#include <cfenv>
#include <cstdint>
// MSVC-specific pragmas for floating point control
#pragma float_control(precise, on)
#pragma float_control(except, on)
#pragma fenv_access(on)
#pragma fp_contract(off)
// May make a copy of the floats
bool compareFloats(float resultValue, float comparisonValue)
{
// I was originally doing a bit-wise comparison here but I was made
// aware in the comments that this might not actually be what I want
// so I only check against the equality of the values here now
// (NaN values etc. have to be handled extra)
bool areEqual = (resultValue == comparisonValue);
// Additional outputs if not equal
// ...
return areEqual;
}
int main()
{
std::fesetround(FE_TOWARDZERO)
float value = 1.f / 10;
float expectedResult = 0x1.99999ap-4;
compareFloats(value, expectedResult);
}
Do I have to be worried that if I pass a float by-value into the comparison function it might come out differently on the other side, even though it is an lvalue?
No there is no such guarantee.
Subnormal, non-normalised floating points, and NaN are all cases where the bit patterns may differ.
I believe that signed negative zero is allowed to become a signed positive zero on assignment, although IEEE754 disallows that.
The C++ standard itself has virtually no guarantees on floating point math because it does not mandate IEEE-754 but leaves it up to the implementation (emphasis mine):
[basic.fundamental/12]
There are three floating-point types: float, double, and long double.
The type double provides at least as much precision as float, and the type long double provides at least as much precision as double.
The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double.
The value representation of floating-point types is implementation-defined.
[ Note: This document imposes no requirements on the accuracy of floating-point operations; see also [support.limits]. — end note ]
The C++ code you write is a high-level abstract description of what you want the abstract machine to do, and it is fully in the hands of the compiler what this gets translated to. "Assignments" is an aspect of the C++ standard, and as shown above, the C++ standard does not mandate the behavior of floating point operations. To verify the statement "assignments leave floating point values unchanged" your compiler would have to specify its floating point behavior in terms of the C++ abstract machine, and I've not seen any such documentation (especially not for MSVC).
In other words: Without nailing down the exact compiler, compiler version, compilation flags etc., it is impossible to say for sure what the floating point semantics of a C++ program are (especially regarding the difficult cases like rounding, NaNs or signed zero). Most compilers differentiate between strict IEEE conformance and relaxing some of those restrictions, but even then you are not necessarily guaranteed that the program has the same outputs in non-optimized vs optimized builds due to, say, constant folding, precision of intermediate results and so on.
Point in case: For gcc, even with -O0, your program in question does not compute 1.f / 10 at run-time but at compile-time and thus your rounding mode settings are ignored: https://godbolt.org/z/U8B6bc
You should not be paranoid about copying floats in particular but paranoid of compiler optimizations for floating point in general.

Operations on "double" and optimization in C

I have recently analyzed an old piece of code compiled with VS2005 because of a different numerical behaviour in "debug" (no optimizations) and "release" (/O2 /Oi /Ot options) compilations. The (reduced) code looks like:
void f(double x1, double y1, double x2, double y2)
{
double a1, a2, d;
a1 = atan2(y1,x1);
a2 = atan2(y2,x2);
d = a1 - a2;
if (d == 0.0) { // NOTE: I know that == on reals is "evil"!
printf("EQUAL!\n");
}
The function f is expected to print "EQUAL" if invoked with identical pairs of values (e.g. f(1,2,1,2)), but this doesn't always happen in "release". Indeed it happened that the compiler has optimized the code as if it were something like d = a1-atan2(y2,x2) and removed completely the assignment to the intermediate variable a2. Moreover, it has taken advantage of the fact that the second atan2()'s result is already on the FPU stack, so reloaded a1 on FPU and subtracted the values. The problem is that the FPU works at extended precision (80 bits) while a1 was "only" double (64 bits), so saving the first atan2()'s result in memory has actually lost precision. Eventually, d contains the "conversion error" between extended and double precision.
I know perfectly that identity (== operator) with float/double should be avoided. My question is not about how to check proximity between doubles. My question is about how "contractual" an assignment to a local variable should be considered. By my "naive" point of view, an assignment should force the compiler to convert a value to the precision represented by the variable's type (double, in my case). What if the variables were "float"? What if they were "int" (weird, but legal)?
So, in short, what does the C standard say about that cases?
By my "naive" point of view, an assignment should force the compiler to convert a value to the precision represented by the variable's type (double, in my case).
Yes, this is what the C99 standard says. See below.
So, in short, what does the C standard say about that cases?
The C99 standard allows, in some circumstances, floating-point operations to be computed at a higher precision than that implied by the type: look for FLT_EVAL_METHOD and FP_CONTRACT in the standard, these are the two constructs related to excess precision. But I am not aware of any words that could be interpreted as meaning that the compiler is allowed to arbitrarily reduce the precision of a floating-point value from the computing precision to the type precision. This should, in a strict interpretation of the standard, only happen in specific spots, such as assignments and casts, in a deterministic fashion.
The best is to read Joseph S. Myers's analysis of the parts relevant to FLT_EVAL_METHOD:
C99 allows evaluation with excess range and precision following
certain rules. These are outlined in 5.2.4.2.2 paragraph 8:
Except for assignment and cast (which remove all extra range and
precision), the values of operations with floating operands and
values subject to the usual arithmetic conversions and of floating
constants are evaluated to a format whose range and precision may
be greater than required by the type. The use of evaluation
formats is characterized by the implementation-defined value of
FLT_EVAL_METHOD:
Joseph S. Myers goes on to describe the situation in GCC before the patch that accompanies his post. The situation was just as bad as it is in your compiler (and countless others):
GCC defines FLT_EVAL_METHOD to 2 when using x87 floating point. Its
implementation, however, does not conform to the C99 requirements for
FLT_EVAL_METHOD == 2, since it is implemented by the back end
pretending that the processor supports operations on SFmode and
DFmode:
Sometimes, depending on optimization, a value may be spilled to
memory in SFmode or DFmode, so losing excess precision unpredictably
and in places other than when C99 specifies that it is lost.
An assignment will not generally lose excess precision, although
-ffloat-store may make it more likely that it does.
…
The C++ standard inherits the definition of math.h from C99, and math.h is the header that defines FLT_EVAL_METHOD. For this reason you might expect C++ compilers to follow suit, but they do not seem to be taking the issue as seriously. Even G++ still does not support -fexcess-precision=standard, although it uses the same back-end as GCC (which has supported this option since Joseph S. Myers' post and accompanying patch).

Gcc extension or macro to check the bits used for some fundamental types at compile time

At compile time, with some using static_asserts I would like to check the size in bits of some simple type like unsigned int or char, the important thing that it will be granted to happen at compile time given my usage .
I haven't found anything about this in the gcc manual nor I have any knowledge of a similar feature offered by clang, anyone knows how to check the number of bits used by a type ?
No sizeof please, my focus is on the bits and compile time .
No sizeof please, my focus is on the bits and compile time .
Keep an open mind ;-P
#include <cstdint>
static_assert(sizeof(X) * CHAR_BIT == 32, "type X must be 32 bits in size");
1. How to find the number of bits in a type without using the CHAR_BIT macro
If the type is a numeric type (like int and char), you can get the number of significant bits using std::numeric_limits<T>::digits, assuming that T is a binary type (that is, that std::numeric_limits<T>::radix == 2). Those are constexpr so they can be used in static_assert.
It is possible that the implementation is not capable of using all of the stored bits in some numeric type (other than char), in which case the number of significant digits may not relate to the physical size in bits. Also, the sign bit doesn't count, so you need to add std::numeric_limits<T>::is_signed to get the number of non-padding bits.
Since char types are not allowed to have padding and char, signed char and unsigned char are required to be exactly the same size, std::numeric_limits<unsigned char>::digits must be the number of bits in a char, otherwise known as the required macro CHAR_BIT. So you could use the two expressions interchangeably, and consequently the bit-size (physical, not meaningful) of any type T will be sizeof(T)*std::numeric_limits<unsigned char>::digits.
I don't believe that the compiler itself needs to know what the bitsize of char is (although most compilers probably do). It does need to know what sizeof(T) is for every primitive type. There is no standard-mandated way of figuring out what the value of std::numeric_limits<unsigned char>::digits is without including some header file.
2. Why you shouldn't worry about it.
In a freestanding environment, <numeric_limits> is not required, but <climits> still is, so you can count on the CHAR_BIT even in a freestanding environment, while you can only count on std::numeric_limits<unsigned char>::digits in a hosted environment.
In other words, the compiler is obliged to have some way of providing the results of #include <climits>, because that header is required by the standard even in freestanding environments (that is, environments without a standard library or even operating system). That's the "built-in" you are looking for; even if you don't provide <climits> in your standard library implementation, and even if you don't have a standard library handy, the compiler must still arrange for the macro CHAR_BIT to be correctly defined following the occurrence of #include <climits>. How it does that is up to the compiler; <climits> does not have to be an actual file.
Notes
None of the above will work with C, but then neither will static_assert so I am assuming that tagging this question as C was an oversight. As #mafso points out in a comment, C11 does have a static_assert declaration, but it only works with C11-style constant expressions, not C++-style constant expressions. C++ constant expressions can use things like constexpr functions, which might be built-in. C constant expressions, on the other hands, can only involve integer literals. They are still useful (for non-purists) because macro expansion happens first, and the macro can expand to an integer literal (or even an expression involving several integer literals).
According to this document, the gnu compiler will define these macros for you:
__CHAR_BIT__ // bits
__SIZEOF_INT__ // bytes
__SIZEOF_LONG__
__SIZEOF_LONG_LONG__
etc...
You can define your own Bit macros from the Byte macros by just multiplying by 8.
Edit: Since you apparently need to know the "word size" and consider pointers to be the same size as a "word", then use this:
__SIZEOF_POINTER__

Compile-time vs runtime constants

I'm currently developing my own math lib to improve my c++ skills. I stumbled over boost's constants header file and I'm asking myself what is the point of using compile-time constants over runtime declared constants?
const float root_two = 1.414213562373095048801688724209698078e+00;
const float root_two = std::sqrt( 2.0f );
Isn't there an error introduced when using the fixed compile-time constant but calculations while running the application with functions?
Wouldn't then the error be negleted if you use runtime constants?
As HansPassant said, it may save you a micro-Watt. However, note that the compiler will sometimes optimize that away by evaluating the expression during compilation and substituting in the literal value. See this answer to my earlier question about this.
Isn't there an error introduced when using the fixed compile-time constant?
If you are using arbitrary-precision data types, perhaps. But it is more efficient to use plain data types like double and these are limited to about 16 decimal digits of precision anyways.
Based on (2), your second initialization would not be more precise than your first one. In fact, if you precomputed the value of the square root with an arbitrary precision calculator, the literal may even be more precise.
A library such as Boost must work in tons of environments. The application that uses the library could have set FPU could be in flush-to-zero mode, giving you 0.0 for denormalized (tiny) results.
Or the application could have been compiled with the -fast-math flag, giving
inaccurate results.
Furthermore, a runtime computation of (a + b + c) depends on how the compiler generated code will store intermediate results. It might chose to pop (a + b) from the FPU as a 64-bit double, or it could leave it on the FPU stack as 80 bits. It depends on tons of things, also algebraic rewrites of associativity.
All in all, if you mix different processors, operating systems, compilers and the different applications the library is used inside, you will get a different result for each permutation of the above.
In some (rare) siturations this is not wanted; you can be in need for an exact constant value.

How to portably check extremal values for SuSv3 data types?

By SuSv3, ssize_t is required to be a signed integer type. If I want to check if a value I calculate is larger than the maximal value allowed for such a data type, I could compare it to INT_MAX, which isn't nice.
Is there a more portable way this comparison can be done - a macro/function f that works as in
f(<typedef'ed datatype>) = {maximum value allowed for <TDDT> on this system)?
, or a short sequence of such operations to the same sort?
System:
Ubuntu 12.04.
glibc 2.15
Kernel 3.2.0
P.S.: When googling this, I first thought that the gcc extension 'typeof' sounded promising; but it seemed to not help here (or does it?). This is to say I'm fine with anything that might be a gcc extension/attribute/etc.
For an unsigned arithmetic type, (type)-1 is the maximum value. Since you don't know what the relative size of types is, cast to uintmax_t:
#define UNSIGNED_TYPE_MAX(t) ((uintmax_t)(t)-1)
if ((uintmax_t)x > UNSIGNED_TYPE_MAX(size_t)) puts("too large");
There is no such shortcut for signed types. In fact, I don't think there's any way of determining the largest value of a signed type in strictly portable C89 or C99, without using the corresponding constant, such as SSIZE_MAX for ssize_t. C99 specifies constants for each type designed for arithmetic defined in stdint.h for the types defined in ISO C. For types defined in POSIX but not in standard C, there are many values in limits.h; note that they are the limit of what can be valid values for what the type is intended for, rather than the limit of what can fit in the type. For example, if size_t is a 32-bit type, then SIZE_MAX is guaranteed to be 232-1, whereas SSIZE_MAX could be less than 231-1 if the implementation doesn't support any byte count larger than that.
With the added assumption that integers are represented in binary and there are no padding bits, which is safe if you're limiting yourself to POSIX (where CHAR_BIT is always 8), you can deduce the maximum value by computing the size of the type: there is one sign bit in a signed type, and everything else is a value bit.
#define SIGNED_TYPE_MAX(t) (((uintmax_t)1 << (sizeof(t) * CHAR_BIT - 1)) - 1)
Note that things like “double until it stops growing” or “shove in the bit pattern 0111…111” are dodgy. The C standard says that the behavior is undefined for signed types, and GCC takes advantage of this to perform optimizations on operations on signed types that can result in the wrong value if an overflow happens. For example, it might perform computations in a larger-size register, so that the overflow turns out not to happen.