Floating point expressions can sometimes be contracted on the processing hardware, e.g. using fused multiply-and-add as a single hardware operation.
Apparently, using these this isn't merely an implementation detail but governed by programming language specification. Specifically, the C89 standard does not allow such contractions, while in C99 they are allowed provided that some macro is defined. See details in this SO answer.
But what about C++? Are floating-point contractions not allowed? Allowed in some standards? Allowed universally?
Summary
Contractions are permitted, but a facility is provided for the user to disable them. Unclear language in the standard clouds the issue of whether disabling them will provide desired results.
I investigated this in the official C++ 2003 standard and the 2017 n4659 draft. C++ citations are from 2003 unless otherwise indicated.
Extra Precision and Range
The text “contract” does not appear in either document. However, clause 5 Expressions [expr] paragraph 10 (same text in 2017’s 8 [expr] 13) says:
The values of the floating operands and the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby.
I would prefer this statement explicitly stated whether this extra precision and range could be used freely (the implementation may use it in some expressions, including subexpressions, while not using it in others) or had to be used uniformly (if the implementation uses extra precision, it must use it in every floating-point expression) or according to some other rules (such as it may use one precision for float, another for double).
If we interpret it permissively, it means that, in a*b+c, a*b could be evaluated with infinite precision and range, and then the addition could be evaluated with whatever precision and range is normal for the implementation. This is mathematically equivalent to contraction, as it has the same result as evaluating a*b+c with a fused multiply-add instruction.
Hence, with this interpretation, implementations may contract expressions.
Contractions Inherited From C
17.4.1.2 [lib.headers] 3 (similar text in 2017’s 20.5.1.2 [headers] 3) says:
The facilities of the Standard C Library are provided in 18 additional headers, as shown in Table 12…
Table 12 includes <cmath>, and paragraph 4 indicates this corresponds to math.h. Technically, the C++ 2003 standard refers to the C 1990 standard, but I do not have it in electronic form and do not know where my paper copy is, so I will use the C 2011 standard (but unofficial draft N1570), which the C++ 2017 draft refers to.
The C standard defines, in <math.h>, a pragma FP_CONTRACT:
#pragma STDC FP_CONTRACT on-off-switch
where on-off-switch is on to allow contraction of expressions or off to disallow them. It also says the default state for the pragma is implementation-defined.
The C++ standard does not define “facility” or “facilities.” A dictionary definition of “facility” is “a place, amenity, or piece of equipment provided for a particular purpose” (New Oxford American Dictionary, Apple Dictionary application version 2.2.2 (203)). An amenity is “a desirable or useful feature or facility of a building or place.” A pragma is a useful feature provided for a particular purpose, so it seems to be a facility, so it is included in <cmath>.
Hence, using this pragma should permit or disallow contractions.
Conclusions
Contractions are permitted when FP_CONTRACT is on, and it may be on by default.
The text of 8 [expr] 13 can be interpreted to effectively allow contractions even if FP_CONTRACT is off but is insufficiently clear for definitive interpretation.
Yes, it is allowed.
For example in Visual Studio Compiler, by default, fp_contract is on. This tells the compiler to use floating-point contraction instructions where possible. Set fp_contract to off to preserve individual floating-point instructions.
// pragma_directive_fp_contract.cpp
// on x86 and x64 compile with: /O2 /fp:fast /arch:AVX2
// other platforms compile with: /O2
#include <stdio.h>
// remove the following line to enable FP contractions
#pragma fp_contract (off)
int main() {
double z, b, t;
for (int i = 0; i < 10; i++) {
b = i * 5.5;
t = i * 56.025;
z = t * i + b;
printf("out = %.15e\n", z);
}
}
Detailed information about Specify Floating-Point Behavior.
Using the GNU Compiler Collection (GCC):
The default state for the FP_CONTRACT pragma (C99 and C11 7.12.2).
This pragma is not implemented. Expressions are currently only contracted if -ffp-contract=fast, -funsafe-math-optimizations or -ffast-math are used.
Related
#include <cmath>
#include <cstdio>
int main() {
float a = std::asin(-1.f);
printf("%.10f\n", a);
return 0;
}
I ran the code above on multiple platforms using clang, g++ and Visual studio. They all gave me the same answer: -1.5707963705
If I run it on macOS using clang it gives me -1.5707962513.
Clang on macOS is supposed to use libc++, but does macOS has its own implementation of libc++?
If I run clang --verison I get:
Apple LLVM version 10.0.0 (clang-1000.11.45.5)
Target: x86_64-apple-darwin18.0.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
asin is implemented in libm, which is part of the standard C library, not the C++ standard library. (Technically, the C++ standard library includes C library functions, but in practice both the Gnu and the LLVM C++ library implementations rely on the underlying platform math library.) The three platforms -- Linux, OS X and Windows -- each have their own implementation of the math library, so if the library function were being used, it could certainly be a different library function and the result might differ in the last bit position (which is what your test shows).
However, it is quite possible that the library function is never being called in all cases. This will depend on the compilers and the optimization options you pass them (and maybe some other options as well). Since the asin function is part of the standard library and thus has known behaviour, it is entirely legitimate for a compiler to compute the value of std::asin(-1.0F) at compile time, as it would for any other constant expression (like 1.0 + 1.0, which almost any compiler will constant-fold to 2.0 at compile time).
Since you don't mention what optimization settings you are using, it's hard to tell exactly what's going on, but I did a few tests with http://gcc.godbolt.org to get a basic idea:
GCC constant folds the call to asin without any optimisation flags, but it does not precompute the argument promotion in printf (which converts a to a double in order to pass it to printf) unless you specify at least -O1. (Tested with GCC 8.3).
Clang (7.0) calls the standard library function unless you specify at least -O2. However, if you explicitly call asinf, it constant folds at -O1. Go figure.
MSVC (v19.16) does not constant fold. It either calls a std::asin wrapper or directly calls asinf, depending on optimisation settings. I don't really understand what the wrapper does, and I didn't spend much time investigating.
Both GCC and Clang constant fold the expression to precisely the same binary value (0xBFF921FB60000000 as a double), which is the binary value -1.10010010000111111011011 (trailing zeros truncated).
Note that there is also a difference between the printf implementations on the three platforms (printf is also part of the platform C library). In theory, you could see a different decimal output from the same binary value, but since the argument to printf is promoted to double before printf is called and the promotion is precisely defined and not value-altering, it is extremely unlikely that this has any impact in this particular case.
As a side note, if you really care about the seventh decimal point, use double instead of float. Indeed, you should only use float in very specific applications in which precision is unimportant; the normal floating point type is double.
The mathematically exact value of asin(-1) would be -pi/2, which of course is irrational and not possible to represent exactly as a float. The binary digits of pi/2 start with
1.1001001000011111101101010100010001000010110100011000010001101..._2
Your first three libraries round this (correctly) to
1.10010010000111111011011_2 = 1.57079637050628662109375_10
On MacOS it appears to get truncated to:
1.10010010000111111011010_2 = 1.57079625129699707031250_10
This is an error of less than 1 ULP (unit in the last place). This could be caused either by a different implementation, or your FPU is set to a different rounding mode, or perhaps in some cases the compiler computes the value at compile-time.
I don't think the C++ standard really gives any guarantees on the accuracy of transcendental functions.
If you have code which really depends on having (platform/hardware independent) accuracy, I suggest to use a library, like e.g., MPFR. Otherwise, just live with the difference. Or have a look at the source of the asin function which is called in each case.
I was given some legacy code to compile. Unfortunately I only have access to a f95 compiler and have 0 knowledge of Fortran. Some modules compiled but others I was getting this error:
Error: Old-style type declaration REAL*16 not supported at (1)
My plan is to at least try to fix this error and see what else happens. So here are my 2 questions.
How likely will it be that my code written for Fortran 75 is compatible in the Fortran 95 compiler? (In /usr/bin my compiler is f95 - so I assume it is Fortran 95)
How do I fix this error that I am getting? I tried googling it but cannot see to find a clear crisp answer.
The error you are seeing is due to an old declaration style that was frequent before Fortran 90, but never became standard. Thus, the compiler does not accept the (formally incorrect) code.
In the olden days before Fortran 90, you had only two types of real numbers: REAL and DOUBLE PRECISION. These types were platform dependent, but most compilers nowadays map them to the IEEE754 formats binary32 and binary64.
However, some machines offered different formats, often with extra precision. In order to make them accessible to Fortran code, the type REAL*n was invented, with n an integer literal from a set of compiler-dependent values. This syntax was never standard, so you cannot be sure of what it will mean to a given compiler without reading its documentation.
In the real world, most compilers that have not been asked to be strictly standards-compliant (with some option like -std=f95) will recognize at least REAL*4 and REAL*8, mapping them to the binary32/64 formats mentioned before, but everything else is completely platform dependent. Your compiler may have a REAL*10 type for the 80-bit arithmetic used by the x86 387 FPU, or a REAL*16 type for some 128-bit floating point math. However, it is important to stress that since the syntax is not standard, the meaning of that type could change from compiler to compiler.
Finally, in Fortran 90 a way to refer to different kinds of real and integer types was made standard. The new syntax is REAL(n) or REAL(kind=n) and is supported by all standard-compliant compilers. However, The values of n are still compiler-dependent, but the standard provides three ways to obtain specific, repeatable results:
The SELECTED_REAL_KIND function, which allows you to query the system for the value of n to specify if you want a real type with certain precision and range requirements. Usually, what you do is ask for it once and store the result in an INTEGER, PARAMETER variable that you use when declaring the real variables in question. For example, you would declare a type with at least 15 digits of precision (decimal) and exponent range of at least 100 like this:
INTEGER, PARAMETER :: rk = SELECTED_REAL_KIND(15, 100)
REAL(rk) :: v
In Fortran 2003 and onwards, the ISO_C_BINDING module contains a series of constants that are meant to give you types guaranteed to be equivalent to the C types of the same compiler family (e.g. gcc for gfortran, icc for ifort, etc.). They are called C_FLOAT, C_DOUBLE and C_LONG_DOUBLE. Thus, you could declare a variable equivalent to a C double as REAL(C_DOUBLE) :: d.
In Fortran 2008 and onwards, the ISO_FORTRAN_ENV module contains a different series of constants REAL32, REAL64 and REAL128 that will give you a floating point type of the appropriate width - if some platform does not support one of those types, the constant will be a negative number. Thus, you can declare a 128-bit float as REAL(real128) :: q.
Javier has given an excellent answer to your immediate problem. However I'd just briefly like to address "How likely will it be that my code written for Fortran 77 is compatible in the Fortran 95 compiler?", with the obvious typo corrected.
Fortran, if the programmer adheres to the standard, is amazingly backward compatible. With very, very few exceptions standard conforming Fortran 77 is Fortran 2008 conforming. The problem is that it appears in your case the original programmer has not adhered to the international standard: real*8 and similar is not, and has never been part of any such standard and the problems you are seeing are precisely why such forms should never be, and should never have been used. That said if the original programmer only made this one mistake it may well be that the rest of the code will be OK, however without the detail it is impossible to tell
TL;DR: International standards are important, stick to them!
When we are at guessing instead of requesting proper code and full details I will venture to say that the other two answers are not correct.
The error message
Error: Old-style type declaration REAL*16 not supported at (1)
DOES NOT mean that the REAL*n syntax is not supported.
The error message is misleading. It actually means that the 16-byte reals are no supported. Had the OP requested the same real by the kind notation (in any of the many ways which return the gfortran's kind 16) the error message would be:
Error: Kind 16 not supported for type REAL at (1)
That can happen in some gfortran versions. Especially in MS Windows.
This explanation can be found with just a very quick google search for the error message: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56850#c1
It does not not complain that the old syntax is the problem, it just mentions it, because mentioning kinds might be confusing even more (especially for COMPLEX*32 which is kind 16).
The main message is: We really should be closing these kinds of questions and wait for proper details instead of guessing and upvoting the question where the OP can't even copy the full message the compiler has spit.
If you are using GNU gfortran, try "real(kind=10)", which will give you 10 bytes (80-bits) extended precision. I'm converting some older F77 code to run floating point math tests - it has a quad-precision ( the real*16 you mention) defined, but not used, as the other answers provided correctly point out (extended precision formats tend to be machine/compiler specific). The gfortran 5.2 I am using does not support real(kind=16), surprisingly. But gfortran (available for 64-bit MacOS and 64-bit Linux machines), does have the "real(kind=10)" which will give you precision beyond the typical real*8 or "double precision" as it was called in some Fortran compilers.
Be careful if your old Fortran code is calling C programs and/or maybe making assumptions about how precision is handled and floating point numbers are represented. You may have to get deep into exactly what is happening, and check the code carefully, to ensure things are working as expected, especially if fortran and C routines are calling each other. Here is url for the GNU gfortran info on quad-precision: https://people.sc.fsu.edu/~jburkardt/f77_src/gfortran_quadmath/gfortran_quadmath.html
Hope this helps.
I'm currently looking at code which does multi-precision floating-point arithmetic. To work correctly, that code requires values to be reduced to their final precision at well-defined points. So even if an intermediate result was computed to an 80 bit extended precision floating point register, at some point it has to be rounded to 64 bit double for subsequent operations.
The code uses a macro INEXACT to describe this requirement, but doesn't have a perfect definition. The gcc manual mentions -fexcess-precision=standard as a way to force well-defined precision for cast and assignment operations. However, it also writes:
‘-fexcess-precision=standard’ is not implemented for languages other than C
Now I'm thinking about porting those ideas to C++ (comments welcome if anyone knows an existing implementation). So it seems I can't use that switch for C++. But what is the g++ default behavior in absence of any switch? Are there more C++-like ways to control the handling of excess precision?
I guess that for my current use case, I'll probably use -mfpmath=sse in any case, which should not incur any excess precision as far as I know. But I'm still curious.
Are there more C++-like ways to control the handling of excess precision?
The C99 standard defines FLT_EVAL_METHOD, a compiler-set macro that defines how excess precision should happen in a C program (many C compilers still behave in a way that does not exactly conform to the most reasonable interpretation of the value of FP_EVAL_METHOD that they define: older GCC versions generating 387 code, Clang when generating 387 code, …). Subtle points in relation with the effects of FLT_EVAL_METHOD were clarified in the C11 standard.
Since the 2011 standard, C++ defers to C99 for the definition of FLT_EVAL_METHOD (header cfloat).
So GCC should simply allow -fexcess-precision=standard for C++, and hopefully it eventually will. The same semantics as that of C are already in the C++ standard, they only need to be implemented in C++ compilers.
I guess that for my current use case, I'll probably use -mfpmath=sse in any case, which should not incur any excess precision as far as I know.
That is the usual solution.
Be aware that C99 also defines FP_CONTRACT in math.h that you may want to look at: it relates to the same problem of some expressions being computed at a higher precision, striking from a completely different side (the modern fused-multiply-add instruction instead of the old 387 instruction set). This is a pragma to decide whether the compiler is allowed to replace source-level additions and multiplications with FMA instructions (this has the effect that the multiplication is virtually computed at infinite precision, because this is how this instruction works, instead of being rounded to the precision of the type as it would be with separate multiplication and addition instructions). This pragma has apparently not been incorporated in the C++ standard (as far as I can see).
The default value for this option is implementation-defined and some people argue for the default to be to allow FMA instructions to be generated (for C compilers that otherwise define FLT_EVAL_METHOD as 0).
You should, in C, future-proof
your code with:
#include <math.h>
#pragma STDC FP_CONTRACT off
And the equivalent incantation in C++ if your compiler documents one.
what is the g++ default behavior in absence of any switch?
I am afraid that the answer to this question is that GCC's behavior, say, when generating 387 code, is nonsensical. See the description of the situation that motivated Joseph Myers to fix the situation for C. If g++ does not implement -fexcess-precision=standard, it probably means that 80-bit computations are randomly rounded to the precision of the type when the compiler happened to have to spill some floating-point registers to memory, leading the program below to print "foo" in some circumstances outside the programmer's control:
if (x == 0.0) return;
... // code that does not modify x
if (x == 0.0) printf("foo\n");
… because the code in the ellipsis caused x, that was held in an 80-bit floating-point register, to be spilt to a 64-bit slot on the stack.
But what is the g++ default behavior in absence of any switch?
I found one answer myself via an experiment, using the following code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
double a = atof("1.2345678");
double b = a*a;
printf("%.20e\n", b - 1.52415765279683990130);
return 0;
}
If b is rounded (-fexcess-precision=standard), then the result is zero. Otherwise (-fexcess-precision=fast) it is something like 8e-17. Compiling with -mfpmath=387 -O3, I could reproduce both cases for gcc-4.8.2. For g++-4.8.2 I get an error for -fexcess-precision=standard if I try that, and without a flag I get the same behavior as -fexcess-precision=fast gives for C. Adding -std=c++11 does not help. So now the suspicion already voiced by Pascal is official: g++ does not necessarily round everywhere it should.
There are mathematical operations that yield real numbers from +/- infinity. For example exp(-infinity) = 0. Is there a standard for mathematical functions in the standard C library that accept IEEE-754 infinities (without throwing, or returning NaN). I am on a linux system and would be interested in such a list for glibc. I could not find such a list in their online manual. For instance their documentation on exp does not mention how it handles the -infinity case. Any help will be much appreciated.
The See Also section of POSIX' math.h definition links to the POSIX definitions of acceptable domains.
E.g. fabs():
If x is ±0, +0 shall be returned.
If x is ±Inf, +Inf shall be returned.
I converted mentioned See Also-section to StackOverflow-Markdown:
acos(),
acosh(),
asin(),
atan(),
atan2(),
cbrt(),
ceil(),
cos(),
cosh(),
erf(),
exp(),
expm1(),
fabs(),
floor(),
fmod(),
frexp(),
hypot(),
ilogb(),
isnan(),
j0(),
ldexp(),
lgamma(),
log(),
log10(),
log1p(),
logb(),
modf(),
nextafter(),
pow(),
remainder(),
rint(),
scalb(),
sin(),
sinh(),
sqrt(),
tan(),
tanh(),
y0(),
I contributed search/replace/regex-fu. We now just need someone with cURL-fu.
In C99 it's on Appendix F:
F.9.3.1 The exp functions
-- exp(±0) returns 1.
-- exp(-∞) returns +0.
-- exp(+∞) returns +∞.
Appendix F is normative and:
An implementation that defines __STDC_IEC_559__ shall conform to the specifications in this annex.
Why is there a difference in the output produced when the code is compiled using the two compilers gcc and turbo c.
#include <stdio.h>
int main()
{
char *p = "I am a string";
char *q = "I am a string";
if(p==q)
{
printf("Optimized");
}
else{
printf("Change your compiler");
}
return 0;
}
I get "Optimized" on gcc and "Change your compiler" on turbo c. Why?
Your questions has been tagged C as well as C++. So I'd answer for both the languages.
[C]
From ISO C99 (Section 6.4.5/6)
It is unspecified whether these arrays are distinct provided their elements have the appropriate values.
That means it is unspecified whether p and q are pointing to the same string literal or not. In case of gcc they both are pointing to "I am a string" (gcc optimizes your code) whereas in turbo c they are not.
Unspecified Behavior:
Use of an unspecified value, or other behavior where this International Standard provides
two or more possibilities and imposes no further requirements on which is chosen in any
instance
[C++]
From ISO C++-98 (Section 2.13.4/2)
Whether all string literals are distinct(that is, are stored in non overlapping objects) is implementation defined.
In C++ your code invokes Implementation defined behaviour.
Implementation-defined Behavior:
Unspecified Behavior where each implementation documents how the choice is made
Also see this question.
Since your string literal is a constant expression, i.e. you should not modify it via a pointer, there is no real purpose in storing it in the memory space twice. Being a newer compiler, gcc merges the literals by default while Turbo C does not. It is a sign of gcc's support for the newer language standard that has the notion of const data.
Please forget the answers in the same line as
"It's because Turbo C is SO TOTALLY OLD and they couldn't do it THEN, because it had to be FAST, but the GCC is totally NEW and RAD and that's why it does that!".
Both compiler support merging string constants as an option. The GCC option (-fmerge-constants) is turned on at optimization levels, while the Turbo C Option (-d) is turned off on default. If you are using the TCC IDE, then go to Options|Compiler...|Code Generation.. and check "Duplicate strings merged".
From the gcc manual page :
-fmerge-constants
Attempt to merge identical constants (string constants and
floating point constants) across
compilation units.
This option is the default for optimized compilation if the assembler
and linker support it. Use
-fno-merge-constants to inhibit this behavior.
Enabled at levels -O, -O2, -O3, -Os.
Hence the output.
Turbo C was optimized for fast compilation, so it doesn't have any features that would slow it down. Recognizing duplicate strings would be a slow-down, even if only minor.
The compiler may keep two copies of identical literals if it thinks proper. Finding out if that is the case is presumably the point of this program.
In the good old days, assemblers kept all literals in a literal pool, and patching the literal pool was a recognised (if not approved) technique of modifying 'constants' throughout the program.
If by some chance the compiler allows in this case *p = 'H'; then important differences in behaviour would result.
Historical footnote: Since addresses were smaller than floating-point numeric constants, FORTRAN used to handle floating-point constants much like C handles strings. Since memory was precious, identical constants would be allocated the same space. Also, parameter passing was always done by reference. This meant that if one passed a numeric constant to a procedure that modified its argument, other occurrences of that "constant" would change value.
Hence the old saying: "Variables won't; constants aren't."
Incidentally, has anyone noticed the bug in the Turbo C 2.0 printf which would fail when using a format like "%1.1f" to print numbers like 99.99 (outputs 00.0)? Fixed in 2.01, it reminds me of the Windows 3.1 calculator bug.