asin produces different answers on different platforms using Clang - c++

#include <cmath>
#include <cstdio>
int main() {
float a = std::asin(-1.f);
printf("%.10f\n", a);
return 0;
}
I ran the code above on multiple platforms using clang, g++ and Visual studio. They all gave me the same answer: -1.5707963705
If I run it on macOS using clang it gives me -1.5707962513.
Clang on macOS is supposed to use libc++, but does macOS has its own implementation of libc++?
If I run clang --verison I get:
Apple LLVM version 10.0.0 (clang-1000.11.45.5)
Target: x86_64-apple-darwin18.0.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

asin is implemented in libm, which is part of the standard C library, not the C++ standard library. (Technically, the C++ standard library includes C library functions, but in practice both the Gnu and the LLVM C++ library implementations rely on the underlying platform math library.) The three platforms -- Linux, OS X and Windows -- each have their own implementation of the math library, so if the library function were being used, it could certainly be a different library function and the result might differ in the last bit position (which is what your test shows).
However, it is quite possible that the library function is never being called in all cases. This will depend on the compilers and the optimization options you pass them (and maybe some other options as well). Since the asin function is part of the standard library and thus has known behaviour, it is entirely legitimate for a compiler to compute the value of std::asin(-1.0F) at compile time, as it would for any other constant expression (like 1.0 + 1.0, which almost any compiler will constant-fold to 2.0 at compile time).
Since you don't mention what optimization settings you are using, it's hard to tell exactly what's going on, but I did a few tests with http://gcc.godbolt.org to get a basic idea:
GCC constant folds the call to asin without any optimisation flags, but it does not precompute the argument promotion in printf (which converts a to a double in order to pass it to printf) unless you specify at least -O1. (Tested with GCC 8.3).
Clang (7.0) calls the standard library function unless you specify at least -O2. However, if you explicitly call asinf, it constant folds at -O1. Go figure.
MSVC (v19.16) does not constant fold. It either calls a std::asin wrapper or directly calls asinf, depending on optimisation settings. I don't really understand what the wrapper does, and I didn't spend much time investigating.
Both GCC and Clang constant fold the expression to precisely the same binary value (0xBFF921FB60000000 as a double), which is the binary value -1.10010010000111111011011 (trailing zeros truncated).
Note that there is also a difference between the printf implementations on the three platforms (printf is also part of the platform C library). In theory, you could see a different decimal output from the same binary value, but since the argument to printf is promoted to double before printf is called and the promotion is precisely defined and not value-altering, it is extremely unlikely that this has any impact in this particular case.
As a side note, if you really care about the seventh decimal point, use double instead of float. Indeed, you should only use float in very specific applications in which precision is unimportant; the normal floating point type is double.

The mathematically exact value of asin(-1) would be -pi/2, which of course is irrational and not possible to represent exactly as a float. The binary digits of pi/2 start with
1.1001001000011111101101010100010001000010110100011000010001101..._2
Your first three libraries round this (correctly) to
1.10010010000111111011011_2 = 1.57079637050628662109375_10
On MacOS it appears to get truncated to:
1.10010010000111111011010_2 = 1.57079625129699707031250_10
This is an error of less than 1 ULP (unit in the last place). This could be caused either by a different implementation, or your FPU is set to a different rounding mode, or perhaps in some cases the compiler computes the value at compile-time.
I don't think the C++ standard really gives any guarantees on the accuracy of transcendental functions.
If you have code which really depends on having (platform/hardware independent) accuracy, I suggest to use a library, like e.g., MPFR. Otherwise, just live with the difference. Or have a look at the source of the asin function which is called in each case.

Related

How to enable __fp16 type on gcc for x86_64

The __fp16 floating point data-type is a well known extension to the C standard used notably on ARM processors. I would like to run the IEEE version of them on my x86_64 processor. While I know they typically do not have that, I would be fine with emulating them with "unsigned short" storage (they have the same alignment requirement and storage space), and (hardware) float arithmetic.
Is there a way to request that in gcc?
I assume the rounding might be slightly "incorrect", but that is ok to me.
If this were to work in C++ too that would be ideal.
I did not find a way to do so in gcc (as of gcc 8.2.0).
As for clang, in 6.0.0 the following options showed some success:
clang -cc1 -fnative-half-type -fallow-half-arguments-and-returns
The option -fnative-half-type enable the use of __fp16 type (instead of promoting them to float). While the option -fallow-half-arguments-and-returns allows to pass __fp16 by value, the API being non-standard be careful not to mix different compilers.
That being said, it does not provide math functions using __fp16 types (it will promote them to/from float or double).
It was sufficient for my use case.
C++23 introduces std::float16_t
#include <stdfloat> // C++23
int main()
{
std::float16_t f = 0.1F16;
}
_Float16 is the type you should be looking for now in recent versions of clang and gcc.
At least in the compilers I've worked with __fp16 was a limited type that you could only convert to/from binary32 (using hardware where supported) while _Float16 is more like a "real" arithmetic type, not that you should attempt too much in such limited precision.

Error: Old-style type declaration REAL*16 not supported

I was given some legacy code to compile. Unfortunately I only have access to a f95 compiler and have 0 knowledge of Fortran. Some modules compiled but others I was getting this error:
Error: Old-style type declaration REAL*16 not supported at (1)
My plan is to at least try to fix this error and see what else happens. So here are my 2 questions.
How likely will it be that my code written for Fortran 75 is compatible in the Fortran 95 compiler? (In /usr/bin my compiler is f95 - so I assume it is Fortran 95)
How do I fix this error that I am getting? I tried googling it but cannot see to find a clear crisp answer.
The error you are seeing is due to an old declaration style that was frequent before Fortran 90, but never became standard. Thus, the compiler does not accept the (formally incorrect) code.
In the olden days before Fortran 90, you had only two types of real numbers: REAL and DOUBLE PRECISION. These types were platform dependent, but most compilers nowadays map them to the IEEE754 formats binary32 and binary64.
However, some machines offered different formats, often with extra precision. In order to make them accessible to Fortran code, the type REAL*n was invented, with n an integer literal from a set of compiler-dependent values. This syntax was never standard, so you cannot be sure of what it will mean to a given compiler without reading its documentation.
In the real world, most compilers that have not been asked to be strictly standards-compliant (with some option like -std=f95) will recognize at least REAL*4 and REAL*8, mapping them to the binary32/64 formats mentioned before, but everything else is completely platform dependent. Your compiler may have a REAL*10 type for the 80-bit arithmetic used by the x86 387 FPU, or a REAL*16 type for some 128-bit floating point math. However, it is important to stress that since the syntax is not standard, the meaning of that type could change from compiler to compiler.
Finally, in Fortran 90 a way to refer to different kinds of real and integer types was made standard. The new syntax is REAL(n) or REAL(kind=n) and is supported by all standard-compliant compilers. However, The values of n are still compiler-dependent, but the standard provides three ways to obtain specific, repeatable results:
The SELECTED_REAL_KIND function, which allows you to query the system for the value of n to specify if you want a real type with certain precision and range requirements. Usually, what you do is ask for it once and store the result in an INTEGER, PARAMETER variable that you use when declaring the real variables in question. For example, you would declare a type with at least 15 digits of precision (decimal) and exponent range of at least 100 like this:
INTEGER, PARAMETER :: rk = SELECTED_REAL_KIND(15, 100)
REAL(rk) :: v
In Fortran 2003 and onwards, the ISO_C_BINDING module contains a series of constants that are meant to give you types guaranteed to be equivalent to the C types of the same compiler family (e.g. gcc for gfortran, icc for ifort, etc.). They are called C_FLOAT, C_DOUBLE and C_LONG_DOUBLE. Thus, you could declare a variable equivalent to a C double as REAL(C_DOUBLE) :: d.
In Fortran 2008 and onwards, the ISO_FORTRAN_ENV module contains a different series of constants REAL32, REAL64 and REAL128 that will give you a floating point type of the appropriate width - if some platform does not support one of those types, the constant will be a negative number. Thus, you can declare a 128-bit float as REAL(real128) :: q.
Javier has given an excellent answer to your immediate problem. However I'd just briefly like to address "How likely will it be that my code written for Fortran 77 is compatible in the Fortran 95 compiler?", with the obvious typo corrected.
Fortran, if the programmer adheres to the standard, is amazingly backward compatible. With very, very few exceptions standard conforming Fortran 77 is Fortran 2008 conforming. The problem is that it appears in your case the original programmer has not adhered to the international standard: real*8 and similar is not, and has never been part of any such standard and the problems you are seeing are precisely why such forms should never be, and should never have been used. That said if the original programmer only made this one mistake it may well be that the rest of the code will be OK, however without the detail it is impossible to tell
TL;DR: International standards are important, stick to them!
When we are at guessing instead of requesting proper code and full details I will venture to say that the other two answers are not correct.
The error message
Error: Old-style type declaration REAL*16 not supported at (1)
DOES NOT mean that the REAL*n syntax is not supported.
The error message is misleading. It actually means that the 16-byte reals are no supported. Had the OP requested the same real by the kind notation (in any of the many ways which return the gfortran's kind 16) the error message would be:
Error: Kind 16 not supported for type REAL at (1)
That can happen in some gfortran versions. Especially in MS Windows.
This explanation can be found with just a very quick google search for the error message: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56850#c1
It does not not complain that the old syntax is the problem, it just mentions it, because mentioning kinds might be confusing even more (especially for COMPLEX*32 which is kind 16).
The main message is: We really should be closing these kinds of questions and wait for proper details instead of guessing and upvoting the question where the OP can't even copy the full message the compiler has spit.
If you are using GNU gfortran, try "real(kind=10)", which will give you 10 bytes (80-bits) extended precision. I'm converting some older F77 code to run floating point math tests - it has a quad-precision ( the real*16 you mention) defined, but not used, as the other answers provided correctly point out (extended precision formats tend to be machine/compiler specific). The gfortran 5.2 I am using does not support real(kind=16), surprisingly. But gfortran (available for 64-bit MacOS and 64-bit Linux machines), does have the "real(kind=10)" which will give you precision beyond the typical real*8 or "double precision" as it was called in some Fortran compilers.
Be careful if your old Fortran code is calling C programs and/or maybe making assumptions about how precision is handled and floating point numbers are represented. You may have to get deep into exactly what is happening, and check the code carefully, to ensure things are working as expected, especially if fortran and C routines are calling each other. Here is url for the GNU gfortran info on quad-precision: https://people.sc.fsu.edu/~jburkardt/f77_src/gfortran_quadmath/gfortran_quadmath.html
Hope this helps.

C++ handling of excess precision

I'm currently looking at code which does multi-precision floating-point arithmetic. To work correctly, that code requires values to be reduced to their final precision at well-defined points. So even if an intermediate result was computed to an 80 bit extended precision floating point register, at some point it has to be rounded to 64 bit double for subsequent operations.
The code uses a macro INEXACT to describe this requirement, but doesn't have a perfect definition. The gcc manual mentions -fexcess-precision=standard as a way to force well-defined precision for cast and assignment operations. However, it also writes:
‘-fexcess-precision=standard’ is not implemented for languages other than C
Now I'm thinking about porting those ideas to C++ (comments welcome if anyone knows an existing implementation). So it seems I can't use that switch for C++. But what is the g++ default behavior in absence of any switch? Are there more C++-like ways to control the handling of excess precision?
I guess that for my current use case, I'll probably use -mfpmath=sse in any case, which should not incur any excess precision as far as I know. But I'm still curious.
Are there more C++-like ways to control the handling of excess precision?
The C99 standard defines FLT_EVAL_METHOD, a compiler-set macro that defines how excess precision should happen in a C program (many C compilers still behave in a way that does not exactly conform to the most reasonable interpretation of the value of FP_EVAL_METHOD that they define: older GCC versions generating 387 code, Clang when generating 387 code, …). Subtle points in relation with the effects of FLT_EVAL_METHOD were clarified in the C11 standard.
Since the 2011 standard, C++ defers to C99 for the definition of FLT_EVAL_METHOD (header cfloat).
So GCC should simply allow -fexcess-precision=standard for C++, and hopefully it eventually will. The same semantics as that of C are already in the C++ standard, they only need to be implemented in C++ compilers.
I guess that for my current use case, I'll probably use -mfpmath=sse in any case, which should not incur any excess precision as far as I know.
That is the usual solution.
Be aware that C99 also defines FP_CONTRACT in math.h that you may want to look at: it relates to the same problem of some expressions being computed at a higher precision, striking from a completely different side (the modern fused-multiply-add instruction instead of the old 387 instruction set). This is a pragma to decide whether the compiler is allowed to replace source-level additions and multiplications with FMA instructions (this has the effect that the multiplication is virtually computed at infinite precision, because this is how this instruction works, instead of being rounded to the precision of the type as it would be with separate multiplication and addition instructions). This pragma has apparently not been incorporated in the C++ standard (as far as I can see).
The default value for this option is implementation-defined and some people argue for the default to be to allow FMA instructions to be generated (for C compilers that otherwise define FLT_EVAL_METHOD as 0).
You should, in C, future-proof
your code with:
#include <math.h>
#pragma STDC FP_CONTRACT off
And the equivalent incantation in C++ if your compiler documents one.
what is the g++ default behavior in absence of any switch?
I am afraid that the answer to this question is that GCC's behavior, say, when generating 387 code, is nonsensical. See the description of the situation that motivated Joseph Myers to fix the situation for C. If g++ does not implement -fexcess-precision=standard, it probably means that 80-bit computations are randomly rounded to the precision of the type when the compiler happened to have to spill some floating-point registers to memory, leading the program below to print "foo" in some circumstances outside the programmer's control:
if (x == 0.0) return;
... // code that does not modify x
if (x == 0.0) printf("foo\n");
… because the code in the ellipsis caused x, that was held in an 80-bit floating-point register, to be spilt to a 64-bit slot on the stack.
But what is the g++ default behavior in absence of any switch?
I found one answer myself via an experiment, using the following code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
double a = atof("1.2345678");
double b = a*a;
printf("%.20e\n", b - 1.52415765279683990130);
return 0;
}
If b is rounded (-fexcess-precision=standard), then the result is zero. Otherwise (-fexcess-precision=fast) it is something like 8e-17. Compiling with -mfpmath=387 -O3, I could reproduce both cases for gcc-4.8.2. For g++-4.8.2 I get an error for -fexcess-precision=standard if I try that, and without a flag I get the same behavior as -fexcess-precision=fast gives for C. Adding -std=c++11 does not help. So now the suspicion already voiced by Pascal is official: g++ does not necessarily round everywhere it should.

Can I guarantee the C++ compiler will not reorder my calculations?

I'm currently reading through the excellent Library for Double-Double and Quad-Double Arithmetic paper, and in the first few lines I notice they perform a sum in the following way:
std::pair<double, double> TwoSum(double a, double b)
{
double s = a + b;
double v = s - a;
double e = (a - (s - v)) + (b - v);
return std::make_pair(s, e);
}
The calculation of the error, e, relies on the fact that the calculation follows that order of operations exactly because of the non-associative properties of IEEE-754 floating point math.
If I compile this within a modern optimizing C++ compiler (e.g. MSVC or gcc), can I be ensured that the compiler won't optimize out the way this calculation is done?
Secondly, is this guaranteed anywhere within the C++ standard?
You might like to look at the g++ manual page: http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Optimize-Options.html#Optimize-Options
Particularly -fassociative-math, -ffast-math and -ffloat-store
According to the g++ manual it will not reorder your expression unless you specifically request it.
Yes, that is safe (at least in this case). You only use two "operators" there, the primary expression (something) and the binary something +/- something (additive).
Section 1.9 Program execution (of C++0x N3092) states:
Operators can be regrouped according to the usual mathematical rules only where the operators really are associative or commutative.
In terms of the grouping, 5.1 Primary expressions states:
A parenthesized expression is a primary expression whose type and value are identical to those of the enclosed expression. ... The parenthesized expression can be used in exactly the same contexts as those where the enclosed expression can be used, and with the same meaning, except as otherwise indicated.
I believe the use of the word "identical" in that quote requires a conforming implementation to guarantee that it will be executed in the specified order unless another order can give the exact same results.
And for adding and subtracting, section 5.7 Additive operators has:
The additive operators + and - group left-to-right.
So the standard dictates the results. If the compiler can ascertain that the same results can be obtained with different ordering of the operations then it may re-arrange them. But whether this happens or not, you will not be able to discern a difference.
This is a very valid concern, because Intel's C++ compiler, which is very widely used, defaults to performing optimizations that can change the result.
See http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/lin/compiler_c/copts/common_options/option_fp_model.htm#option_fp_model
I would be quite surprised if any compiler wrongly assumed associativity of arithmetic operators with default optimising options.
But be wary of extended precision of FP registers.
Consult compiler documentation on how to ensure that FP values do not have extended precision.
If you really need to, I think you can make a noinline function no_reorder(float x) { return x; }, and then use it instead of parenthesis. Obviously, it's not a particularly efficient solution though.
In general, you should be able to -- the optimizer should be aware of the properties of the real operations.
That said, I'd test the hell out of the compiler I was using.
Yes. The compiler will not change the order of your calculations within a block like that.
Between compiler optimizations and out-of-order execution on the processor, it is almost a guarantee that things will not happen exactly as you ordered them.
HOWEVER, it is also guaranteed that this will NEVER change the result. C++ follows standard order of operations and all optimizations preserve this behavior.
Bottom line: Don't worry about it. Write your C++ code to be mathematically correct and trust the compiler. If anything goes wrong, the problem was almost certainly not the compiler.
As per the other answers you should be able to rely on the compiler doing the right thing -- most compilers allow you to compile and inspect the assembler (use -S for gcc) -- you may want to do that to make sure you get the order of operation you expect.
Different optimization levels (in gcc, -O _O2 etc) allows code to be re-arranged (however sequential code like this is unlikely to be affected) -- but I would suggest you should then isolate that particular part of code into a separate file, so that you can control the optimization level for just the calculation.
The short answer is: the compiler will probably change the order of your calculations, but it will never change the behavior of your program (unless your code makes use of expression with undefined behavior: http://blog.regehr.org/archives/213)
However, you can still influence this behavior by deactivating all compiler optimizations (option "-O0" with gcc). If you still needs the compiler to optimize the rest of your code, you may put this function in a separate ".c" which you can compile with "-O0".
Additionally, you can use some hacks. For instance, if you interleaves your code with extern function calls the compiler may consider that it is unsafe to re-order your code as the function may have unknown side-effect. Calling "printf" to print the value of your intermediate results will conduct to similar behavior.
Anyway, unless you have any very good reason (e.g. debugging) you typically don't want to care about that, and you should trust the compiler.

Output difference in gcc and turbo C

Why is there a difference in the output produced when the code is compiled using the two compilers gcc and turbo c.
#include <stdio.h>
int main()
{
char *p = "I am a string";
char *q = "I am a string";
if(p==q)
{
printf("Optimized");
}
else{
printf("Change your compiler");
}
return 0;
}
I get "Optimized" on gcc and "Change your compiler" on turbo c. Why?
Your questions has been tagged C as well as C++. So I'd answer for both the languages.
[C]
From ISO C99 (Section 6.4.5/6)
It is unspecified whether these arrays are distinct provided their elements have the appropriate values.
That means it is unspecified whether p and q are pointing to the same string literal or not. In case of gcc they both are pointing to "I am a string" (gcc optimizes your code) whereas in turbo c they are not.
Unspecified Behavior:
Use of an unspecified value, or other behavior where this International Standard provides
two or more possibilities and imposes no further requirements on which is chosen in any
instance
[C++]
From ISO C++-98 (Section 2.13.4/2)
Whether all string literals are distinct(that is, are stored in non overlapping objects) is implementation defined.
In C++ your code invokes Implementation defined behaviour.
Implementation-defined Behavior:
Unspecified Behavior where each implementation documents how the choice is made
Also see this question.
Since your string literal is a constant expression, i.e. you should not modify it via a pointer, there is no real purpose in storing it in the memory space twice. Being a newer compiler, gcc merges the literals by default while Turbo C does not. It is a sign of gcc's support for the newer language standard that has the notion of const data.
Please forget the answers in the same line as
"It's because Turbo C is SO TOTALLY OLD and they couldn't do it THEN, because it had to be FAST, but the GCC is totally NEW and RAD and that's why it does that!".
Both compiler support merging string constants as an option. The GCC option (-fmerge-constants) is turned on at optimization levels, while the Turbo C Option (-d) is turned off on default. If you are using the TCC IDE, then go to Options|Compiler...|Code Generation.. and check "Duplicate strings merged".
From the gcc manual page :
-fmerge-constants
Attempt to merge identical constants (string constants and
floating point constants) across
compilation units.
This option is the default for optimized compilation if the assembler
and linker support it. Use
-fno-merge-constants to inhibit this behavior.
Enabled at levels -O, -O2, -O3, -Os.
Hence the output.
Turbo C was optimized for fast compilation, so it doesn't have any features that would slow it down. Recognizing duplicate strings would be a slow-down, even if only minor.
The compiler may keep two copies of identical literals if it thinks proper. Finding out if that is the case is presumably the point of this program.
In the good old days, assemblers kept all literals in a literal pool, and patching the literal pool was a recognised (if not approved) technique of modifying 'constants' throughout the program.
If by some chance the compiler allows in this case *p = 'H'; then important differences in behaviour would result.
Historical footnote: Since addresses were smaller than floating-point numeric constants, FORTRAN used to handle floating-point constants much like C handles strings. Since memory was precious, identical constants would be allocated the same space. Also, parameter passing was always done by reference. This meant that if one passed a numeric constant to a procedure that modified its argument, other occurrences of that "constant" would change value.
Hence the old saying: "Variables won't; constants aren't."
Incidentally, has anyone noticed the bug in the Turbo C 2.0 printf which would fail when using a format like "%1.1f" to print numbers like 99.99 (outputs 00.0)? Fixed in 2.01, it reminds me of the Windows 3.1 calculator bug.