Comparing floating point numbers in VS C++ vs C++ Builder - c++

I am working with some astronomy code originally compiled in Visual C++. I am compiling it in C++Builder XE4 on the 32bit VCL platform.
In this code, there are a lot of comparisons for very small numbers, all defined as double. The code snip below shows the headers and some sample comparisons from the VC++ code. I need the results to be the same in VC++ and C++ Builder, so I have some questions about comparing floating point numbers:
Does C++Builder compare floating point numbers the same as VC++?
In C++Builder, do I need to rewrite the code using the CompareValue(double, double) function?
Will I get the same result if I switch from #include <cmath> to using #include <math.h> and #include <math.hpp>?
Any suggestions for getting the same results in both compilers would be helpful.
#include "stdafx.h"
#include <cmath>
#include <cassert>
using namespace std;
...
else if ((fgamma > 0.9972) && (fgamma < (1.5433 + details.u)))
{
if ((fgamma > 0.9972) && (fgamma < (0.9972 + fabs(details.u))))
{
if (details.u < 0)
...

Short answer
No
Depends on the compiler settings in both + thread environment.
Yes, but see #2
Long answer
Compiler settings
The most important compiler setting is target instruction set. Depending on the setting, double-precision floating-point code can be compiled into legacy x87 instructions, to SSE2, or higher (SSE 4, AVX, etc.)
The funny thing is, some compilers with some settings compile into both. Within the same program, they may use x87 for one things, SSE for other things.
There’re are other relevant compiler switches, e.g. /fp in Visual C++
Thread environment
For x87 code, the interesting part of the thread state is x87 FPU control register. For Visual C++, see _controlfp_s API.
SSE components of the CPU use similar thing, MxCsr register.

Related

Fast floating point model broken on next-generation intel compiler

Description
I'm trying to switch over from using the classic intel compiler from the Intel OneAPI toolkit to the next-generation DPC/C++ compiler, but the default behaviour for handling floating point operations appears broken or different, in that comparison with infinity always evaluates to false in fast floating point modes. The above is both a compiler warning and the behaviour I now experience with ICX, but not a behaviour experienced with the classic compiler (for the same minimal set of compiler flags used).
Minimally reproducible example
#include <iostream>
#include <cmath>
int main()
{
double a = 1.0/0.0;
if (std::isinf(a))
std::cout << "is infinite";
else
std::cout << "is not infinite;";
}
Compiler Flags:
-O3 -Wall -fp-model=fast
ICC 2021.5.0 Output:
is infinite
(also tested on several older versions)
ICX 2022.0.0 Output:
is not infinite
(also tested on 2022.0.1)
Live demo on compiler-explorer:
https://godbolt.org/z/vzeYj1Wa3
By default -fp-model=fast is enabled on both compilers. If I manually specify -fp-model=precise I can recover the behaviour but not the performance.
Does anyone know of a potential solution to both maintain the previous behaviour & performance of the fast floating point model using the next-gen compiler?
If you add -fp-speculation=safe to -fp-model=fast, you will still get the warning that you shouldn't use -fp-model=fast if you want to check for infinity, but the condition will evaluate correctly: godbolt.
In the Intel Porting Guide for ICC Users to DPCPP or ICX it is stated that:
FP Strictness: Nothing stricter than the default is supported. There is no support for -fp-model strict, -fp-speculation=safe, #pragma fenv_access, etc. Implementing support for these is a work-in-progress in the open source community.
Even though it works for the current version of the tested compiler (icx 2022.0.0), there is a discrepancy: either the documentation is outdated (more probable), or this feature is working by accident (less probable).

Trouble with large numbers while using <iomanip> larger in cpp

While sorting and displaying big numbers you usually end up displaying the large numbers in enotation. I was trying to display the whole number by using the <iomanip> library in cpp and it fails for very large numbers.
//big sorting
#include<bits/stdc++.h>
#include<iomanip>
using namespace std;
int main()
{
int n;
cin>>n;
double arr[n];
for (int i = 0;i < n; i++)
cin>>arr[i];
sort(arr, arr+n);
cout<<fixed<<setprecision(0);
for (int i = 0;i < n; i++)
cout<<arr[i]<<endl;
}
Input:
31415926535897932384626433832795
1
3
10
3
5
Expected output:
1
3
3
5
10
31415926535897932384626433832795
Actual output:
1
3
3
5
10
31415926535897933290036940242944
The last digit is getting messed up.
The double type precision is only 15 decimal digits, so very large whole numbers can't be expressed in double type without loss of precison.
Read more about C++, perhaps the C++11 standard n3337.
Read also the documentation of your C++ compiler, e.g. GCC (invoked as g++) or Clang (invoked as clang++). Read of course a good C++ programming book, since C++ is a very difficult programming language. Use C++ standard containers and smart pointers.
Large numbers does not fit natively in a computer memory (or in its registers). For example, with C++ code compiled by GCC on Linux/x86-64, an int has just 32 bits.
Consider using arbitrary precision arithmetic. You might be interested by GMPlib.
Floating point numbers are weird. Be sure to read the famous floating-point-gui.de website, and see also this answer.
#include<bits/stdc++.h>
is wrong since non-standard. Take the habit of #include-ing only headers needed by your translation unit, except if you use pre-compiled headers.
Take some time to read more about numbers and arithmetic. Some notion of modular arithmetic is incredibly useful when programming: a lot of computers are computing modulo 232 or 264.
Study for inspiration the C++ source code of existing open source software (e.g. on github or gitlab, including FLTK). If you use Linux, its fish-shell has a nice C++ code. You could even glance inside the source code of GCC and of Clang, both being nice C++ open source compilers.
In practice, read also about build automation tools such as GNU make (free software coded in C) or ninja (open source tool coded in C++).
Don't forget to use a version control system (I recommend git).
Read How to debug small programs.
Enable all warnings and debug info when compiling your C++ code (with GCC, use g++ -Wall -Wextra -g).
Read of course the documentation of your favorite debugger.
I am a happy user of GDB.
Consider using static program analysis tools such as the Clang static analyzer or Frama-C++.

c++ exp function different results under x64 on i7-3770 and i7-4790

When I execute a simple x64 application with the following code, I get different results on Windows PCs with a i7-3770 and i7-4790 CPU.
#include <cmath>
#include <iostream>
#include <limits>
void main()
{
double val = exp(-10.240990982718174);
std::cout.precision(std::numeric_limits<double>::max_digits10);
std::cout << val;
}
Result on i7-3770:
3.5677476354876406e-05
Result on i7-4790:
3.5677476354876413e-05
When I modify the code to call
unsigned int control_word;
_controlfp_s(&control_word, _RC_UP, MCW_RC);
before the exp function call, both CPUs deliver the same results.
My questions:
Does anyone have an idea for the reason of the differences between the i7-3770 and i7-4790?
Is there a way to set the floating point precision or consistency in a Visual Studio 2015/2017 C++ project for the whole project and not only for the following function call? The "Floating Point Model" setting (/fp) does not have any influence on the results here.
Assuming that double is coded using IEEE-754, and using this decimal to binary converter, you can see that:
3.5677476354876406e-05 is represented in hexa as 0x3F02B48CC0D0ABA8
3.5677476354876413e-05 is represented in hexa as 0x3F02B48CC0D0ABA9
which differ only in the last bit, probably due round error.
I did some further investigations and I found out the following facts:
the problem does also occur on Windows with a different compiler (Intel)
on a linux system both values are equal
I also posted this question to the Visual Studio Community. I got the information, that Haswell and newer CPUs use FMA3. You can disable this feature with _set_FMA3_enable(0) at the beginning of the program. When I do this, the results are the same.

C++ handling of excess precision

I'm currently looking at code which does multi-precision floating-point arithmetic. To work correctly, that code requires values to be reduced to their final precision at well-defined points. So even if an intermediate result was computed to an 80 bit extended precision floating point register, at some point it has to be rounded to 64 bit double for subsequent operations.
The code uses a macro INEXACT to describe this requirement, but doesn't have a perfect definition. The gcc manual mentions -fexcess-precision=standard as a way to force well-defined precision for cast and assignment operations. However, it also writes:
‘-fexcess-precision=standard’ is not implemented for languages other than C
Now I'm thinking about porting those ideas to C++ (comments welcome if anyone knows an existing implementation). So it seems I can't use that switch for C++. But what is the g++ default behavior in absence of any switch? Are there more C++-like ways to control the handling of excess precision?
I guess that for my current use case, I'll probably use -mfpmath=sse in any case, which should not incur any excess precision as far as I know. But I'm still curious.
Are there more C++-like ways to control the handling of excess precision?
The C99 standard defines FLT_EVAL_METHOD, a compiler-set macro that defines how excess precision should happen in a C program (many C compilers still behave in a way that does not exactly conform to the most reasonable interpretation of the value of FP_EVAL_METHOD that they define: older GCC versions generating 387 code, Clang when generating 387 code, …). Subtle points in relation with the effects of FLT_EVAL_METHOD were clarified in the C11 standard.
Since the 2011 standard, C++ defers to C99 for the definition of FLT_EVAL_METHOD (header cfloat).
So GCC should simply allow -fexcess-precision=standard for C++, and hopefully it eventually will. The same semantics as that of C are already in the C++ standard, they only need to be implemented in C++ compilers.
I guess that for my current use case, I'll probably use -mfpmath=sse in any case, which should not incur any excess precision as far as I know.
That is the usual solution.
Be aware that C99 also defines FP_CONTRACT in math.h that you may want to look at: it relates to the same problem of some expressions being computed at a higher precision, striking from a completely different side (the modern fused-multiply-add instruction instead of the old 387 instruction set). This is a pragma to decide whether the compiler is allowed to replace source-level additions and multiplications with FMA instructions (this has the effect that the multiplication is virtually computed at infinite precision, because this is how this instruction works, instead of being rounded to the precision of the type as it would be with separate multiplication and addition instructions). This pragma has apparently not been incorporated in the C++ standard (as far as I can see).
The default value for this option is implementation-defined and some people argue for the default to be to allow FMA instructions to be generated (for C compilers that otherwise define FLT_EVAL_METHOD as 0).
You should, in C, future-proof
your code with:
#include <math.h>
#pragma STDC FP_CONTRACT off
And the equivalent incantation in C++ if your compiler documents one.
what is the g++ default behavior in absence of any switch?
I am afraid that the answer to this question is that GCC's behavior, say, when generating 387 code, is nonsensical. See the description of the situation that motivated Joseph Myers to fix the situation for C. If g++ does not implement -fexcess-precision=standard, it probably means that 80-bit computations are randomly rounded to the precision of the type when the compiler happened to have to spill some floating-point registers to memory, leading the program below to print "foo" in some circumstances outside the programmer's control:
if (x == 0.0) return;
... // code that does not modify x
if (x == 0.0) printf("foo\n");
… because the code in the ellipsis caused x, that was held in an 80-bit floating-point register, to be spilt to a 64-bit slot on the stack.
But what is the g++ default behavior in absence of any switch?
I found one answer myself via an experiment, using the following code:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
double a = atof("1.2345678");
double b = a*a;
printf("%.20e\n", b - 1.52415765279683990130);
return 0;
}
If b is rounded (-fexcess-precision=standard), then the result is zero. Otherwise (-fexcess-precision=fast) it is something like 8e-17. Compiling with -mfpmath=387 -O3, I could reproduce both cases for gcc-4.8.2. For g++-4.8.2 I get an error for -fexcess-precision=standard if I try that, and without a flag I get the same behavior as -fexcess-precision=fast gives for C. Adding -std=c++11 does not help. So now the suspicion already voiced by Pascal is official: g++ does not necessarily round everywhere it should.

what library do I have to use for 'ctz' command in c++?

Is there a library for 'count trailing zeroes'(ctz command)? What is the procedure for do that?
I tried:
#include<iostream>
using namespace std;
int main()
{
int value = 12;
cout<<ctz(value);
}
C/C++ standard libraries don't offer that operation. There are, however, compiler-specific intrinsics for most of this kind of bitwise operations.
With gcc/clang, it's __builtin_ctz. You don't need any #include to use it, precisely because it's an intrinsic command. There's a list of GCC intrinsics here, and a list of Clang intrinsics here.
With Visual Studio, you need to #include <intrin.h>, and use _BitScanReverse as shown in this answer.
If you want to make your code portable across compilers, you're encouraged to provide your own macros/wrappers.
On POSIX, you can also use the ffs (find first set) function from <strings.h> (not <string.h>!) which is documented as:
int ffs(int i);
The ffs() function shall find the first bit set (beginning with the least significant bit) in i, and return the index of that bit. Bits are numbered starting at one (the least significant bit).
Note that this function is part of the XSI extensions, so you should set the _XOPEN_SOURCE feature test macro before including <strings.h> or any system headers, so the prototype is visible:
#define _XOPEN_SOURCE 700
#include <strings.h>
gcc recognizes ffs and compiles it into a bsf instruction on x86.