C++ DLL floating-point determinism - c++

Can the same compilation of a C++ DLL exhibit different floating-point results on different machines?
We have some code in our DLL which performs a < comparison of two doubles. For a particular set of inputs those doubles are expected to be equal. Of course, the < comparison is dubious in this case, but what we didn't expect was to see different results from the comparison in our test versus the client's machine.

The same DLL on 2 different computers even though both are running Windows XP could conceivably produce the different results you're seeing. These are the reasons that occur to me:
They could use different version of the C++ runtime (since that is likely dynamically linked) or of other system dlls.
I don't know how likely this is but I would believe that the floating point operations on different CPUs could produce different enough results for 2 series of calculations of a and b such that a < b == true on one machine and a < b == false on another.
What I've used in the past to find out what DLLs are being used by an application is Dependency Walker.

Yes, there can be differences in floating point implementations that are significant enough to cause equality comparisons to fail.
You can attribute it to failure to implement the IEEE standards properly, but I can see situations where, for example, a different number of guard digits might be used in different implementations, and so the round-off errors might be different. It should be noted, however that the IEEE standards are rather strict.
Comparisons between floating point numbers should never use exact equality. Favor an approach where you can test for numbers being within a small range of error, rather than exact equality.
Further Reading
What Every Computer Scientist Should Know About Floating Point Arithmetic

In VS 2003, MS C++ compiler introduced a new model for floating-point optimization. It provides you with 3 compiler options: fp:fast; fp:precise; fp:struct.
Under the fp:strict mode, the compiler never performs any optimizations that perturb the accuracy of floating-point computations, so if you want accuracy over speed, you should use this one. The default one is fp:precise. You can change in the project properties->C++->Code generation.
Please read this: Microsoft Visual C++ Floating-Point Optimization

Related

Standard math functions reproducibility on different CPU's

I am working on project with a lot math calculations. After switching on a new test machine, I have noticed that a lot of tests failed. But also important to notice that tests also failed on my develop machine, and on some machines of other developers. After tracing values and comparing with values from the old machine I found that some functions (At this moment I found only cosine) from math.h sometimes returns slightly different values (for example: 40965.8966304650828827e-01 and 40965.8966304650828816e-01, -3.3088623618085204e-08 and -3.3088623618085197e-08).
New CPU: Intel Xeon Gold 6230R (Intel64 Family 6 Model 85 Stepping 7)
Old CPU: Exact model is unknown (Intel64 Family 6 Model 42 Stepping 7)
My CPU: Intel Core i7-4790K
Tests results doesn't depend on Windows version (7 and 10 were tested).
I have tried to test with binary that was statically linked with standard library to exclude loading of different libraries for different processes and Windows versions, but all results were the same.
Project compiled with /fp:precise, switching to /fp:strict changed nothing.
MSVC from Visual Studio 15 is used: 19.00.24215.1 for x64.
How to make calculations fully reproducible?
Since you are on Windows, I am pretty sure the different results are because the UCRT detects during runtime whether FMA3 (fused-multiply-add) instructions are available for the CPU and if yes, use them in transcendental functions such as cosine. This gives slightly different results. The solution is to place the call set_FMA3_enable(0); at the very start of your main() or WinMain() function, as described here.
If you want to have reproducibility also between different operating systems, things become harder or even impossible. See e.g. this blog post.
In response also to the comments stating that you should just use some tolerance, I do not agree with this as a general statement. Certainly, there are many applications where this is the way to go. But I do think that it can be a sensible requirement to get exactly the same floating point results for some applications, at least when staying on the same OS (Windows, in this case). In fact, we had the very same issue with set_FMA3_enable a while ago. I am a software developer for a traffic simulation, and minor differences such as 10^-16 often build up and lead to entirely different simulation results eventually. Naturally, one is supposed to run many simulations with different seeds and average over all of them, making the different behavior irrelevant for the final result. But: Sometimes customers have a problem at a specific simulation second for a specific seed (e.g. an application crash or incorrect behavior of an entity), and not being able to reproduce it on our developer machines due to a different CPU makes it much harder to diagnose and fix the issue. Moreover, if the test system consists of a mixture of older and newer CPUs and test cases are not bound to specific resources, means that sometimes tests can deviate seemingly without reason (flaky tests). This is certainly not desired. Requiring exact reproducibility also makes writing the tests much easier because you do not require heuristic thresholds (e.g. a tolerance or some guessed value for the amount of samples). Moreover, our customers expect the results to remain stable for a specific version of the program since they calibrated (more or less...) their traffic networks to real data. This is somewhat questionable, since (again) one should actually look at averages, but the naive expectation in reality usually wins.
IEEE-745 double precision binary floating point provides no more than 15 decimal significant digits of precision. You are looking at the "noise" of different library implementations and possibly different FPU implementations.
How to make calculations fully reproducible?
That is an X-Y problem. The answer is you can't. But it is the wrong question. You would do better to ask how you can implement valid and robust tests that are sympathetic to this well-known and unavoidable technical issue with floating-point representation. Without providing the test code you are trying to use, it is not possible to answer that directly.
Generally you should avoid comparing floating point values for exact equality, and rather subtract the result from the desired value, and test for some acceptable discrepancy within the supported precision of the FP type used. For example:
#define EXPECTED_RESULT 40965.8966304650
#define RESULT_PRECISION 00000.0000000001
double actual_result = test() ;
bool error = fabs( actual_result-
EXPECTED_RESULT ) >
RESULT_PRECISION ;
First of all, 40965.8966304650828827e-01 cannot be a result from cos() function, as cos(x) is a function that, for real valued arguments always returns a value in the interval [-1.0, 1.0] so the result shown cannot be the output from it.
Second, you will have probably read somewhere that double values have a precision of roughly 17 digits in the significand, while your are trying to show 21 digit. You cannot get correct data past the ...508, as you are trying to force the result farther from the 17dig limit.
The reason you get different results in different computers is that what is shown after the precise digits are shown is undefined behaviour, so it's normal that you get different values (you could get different values even on different runs on the same machine with the same program)

Looking for datatypes of similar precision in multiple programming languages e.g. C/C++, D, Go

I am trying to implement a program with floating point numbers, using two or more programming languages. The program does say 50k iterations to finally bring the error to very small value.
To ensure that my results are comparable, I wanted to make sure I use data types of same precision in different languages. Would you please tell if there is correspondence between float/double of C/C++ to that in D and Go. I expect C/C++ and D to be quite close in this regard, but not sure. Thanks a lot.
Generally, for compiled languages, floating point format and precision comes down to two things:
The library used to implement the floating point functions that aren't directly supported in hardware.
The hardware the system is running on.
It may also depend on what compiler options you give (and how sophisticated the compiler is in general) - many modern processors have vector instructions, and the result may be subtly different than if you use "regular" floating point instructions (e.g. FPU vs. SSE on x86 processors). You may also see differences, sometimes, because the internal calculations on an x86 FPU is 80-bits, stored as 64-bits when the computation is completed.
But generally, given the same hardware, and similar type of compilers, I'd expect to get the same result [and roughly the same performance] from two different [sufficiently similar] languages.
Most languages have either only "double" (typically 64-bit) or "single and double" (e.g. float - typically 32-bit and double - typically 64-bit in C/C++ - and probably D as well, but I'm not that into D).
In Go, floating point types follow the IEEE-754 standard.
Straight from the spec (http://golang.org/ref/spec#Numeric_types)
float32 the set of all IEEE-754 32-bit floating-point numbers
float64 the set of all IEEE-754 64-bit floating-point numbers
I'm not familiar with D, but this page might be of interest: http://dlang.org/float.html.
For C/C++, the standard doesn't require IEEE-754, but in C++ you could use is_iec559() to check if your compiler is using IEEE-754. See this question: How to check if C++ compiler uses IEEE 754 floating point standard

Same code using floats on two computers gives two different results

I've got some image processing code in C++ which calculates gradients and finds straight lines in them with the hough transformation algorithm. The program does most of the calculations with floats.
When I run this code on the same image on two different computers, one Pentium IV running latest Fedora, the other a Core i5 latest Ubuntu, both 32 bit, I get slightly different results. E.g. I have after some lengthy calculation 1.3456f for some variable on the one machine and 1.3457f on the other. Is this expected behavior or should I search for errors in my program?
My first guess was, that I'm accessing some uninitialized or out-of-bounds memory but I did run the program through valgrind and it can't find any errors, also running multiple times on the same machine always gives the same results.
This is not uncommon and it will depend on your compiler, optimisation settings, math libraries, CPU, and of course the numerical stability of the algorithms that you are using.
You need to have a good idea of your accuracy requirements and if you are not meeting these then you may need to look at your algorithms and e.g. consider using double rather than float where needed.
For background on why given source code might not result in the same output on different computers, see What Every Computer Scientist Should Know About Floating-Point Arithmetic. I doubt this is due to any deficiency of your code unless it performs aggregation in a non-deterministic way eg. by centrally collating calculation results from multiple threads.
Floating point behaviour is often tunable per compiler options, even to the level of different CPUs. Check your compiler docs to see if you can reduce or eliminate the discrepancy. On Visual C++ (for example) this is done via /fp.
Is it due to the a phonomena called machine epsilon?
http://en.wikipedia.org/wiki/Machine_epsilon
There are limitations on flaoting-point number. The fact that floating-point numbers cannot precisely represent all real numbers, and that floating-point operations cannot precisely represent true arithmetic operations, leads to many surprising situations. This is related to the finite precision with which computers generally represent numbers.
Basically, the same C++ instructions can be compiled to different machine instructions (even on the same CPU and certainly on different CPUs) depending on a large number of factors, and the same machine instructions can lead to different low-level CPU actions depending on a large number of factors. In theory, these are supposed to be semantically equivalent, but with floating-point numbers, there are edge cases where they aren't.
Read "The pitfalls of verifying floating-point computations" by David Monniaux for details.
I will also say that this is very common, and probably not your fault.
I spent a lot of time in the past trying to figure out the same problem.
I would suggest to use decimal instead of float and double as long as your numbers do not refer to scientific calculations but to values like prices, quantities, exchange rates, etc.
This is totally normal, unfortunately.
There are libraries which can produce identical results everywhere--see http://www.mpfr.org/ for an example. But the performance cost is substantial and it's probably not worth it unless exact identical results are the most important criterion.
I've actually written a closed-source library which implemented floating-point math in the integer unit, in order to make floats provide identical results on multiple platforms (Intel, AMD, PowerPC) across different compilers. We had an app which simply could not function if floating-point results varied. It was quite a challenge, though. If we could do it again we'd have just designed the original app in fixed-point, but at the time it was too much code to rewrite.
Either this is a difference between the internal representation of the float, making slightly different results, or perhaps it is a difference in the way the float is printed to the screen? I doubt that it is your fault...

How to write portable floating point arithmetic in c++?

Say you're writing a C++ application doing lots of floating point arithmetic. Say this application needs to be portable accross a reasonable range of hardware and OS platforms (say 32 and 64 bits hardware, Windows and Linux both in 32 and 64 bits flavors...).
How would you make sure that your floating point arithmetic is the same on all platforms ? For instance, how to be sure that a 32 bits floating point value will really be 32 bits on all platforms ?
For integers we have stdint.h but there doesn't seem to exist a floating point equivalent.
[EDIT]
I got very interesting answers but I'd like to add some precision to the question.
For integers, I can write:
#include <stdint>
[...]
int32_t myInt;
and be sure that whatever the (C99 compatible) platform I'm on, myInt is a 32 bits integer.
If I write:
double myDouble;
float myFloat;
am I certain that this will compile to, respectively, 64 bits and 32 bits floating point numbers on all platforms ?
Non-IEEE 754
Generally, you cannot. There's always a trade-off between consistency and performance, and C++ hands that to you.
For platforms that don't have floating point operations (like embedded and signal processing processors), you cannot use C++ "native" floating point operations, at least not portably so. While a software layer would be possible, that's certainly not feasible for this type of devices.
For these, you could use 16 bit or 32 bit fixed point arithmetic (but you might even discover that long is supported only rudimentary - and frequently, div is very expensive). However, this will be much slower than built-in fixed-point arithmetic, and becomes painful after the basic four operations.
I haven't come across devices that support floating point in a different format than IEEE 754. From my experience, your best bet is to hope for the standard, because otherwise you usually end up building algorithms and code around the capabilities of the device. When sin(x) suddenly costs 1000 times as much, you better pick an algorithm that doesn't need it.
IEEE 754 - Consistency
The only non-portability I found here is when you expect bit-identical results across platforms. The biggest influence is the optimizer. Again, you can trade accuracy and speed for consistency. Most compilers have a option for that - e.g. "floating point consistency" in Visual C++. But note that this is always accuracy beyond the guarantees of the standard.
Why results become inconsistent?
First, FPU registers often have higher resolution than double's (e.g. 80 bit), so as long as the code generator doesn't store the value back, intermediate values are held with higher accuracy.
Second, the equivalences like a*(b+c) = a*b + a*c are not exact due to the limited precision. Nonetheless the optimizer, if allowed, may make use of them.
Also - what I learned the hard way - printing and parsing functions are not necessarily consistent across platforms, probably due to numeric inaccuracies, too.
float
It is a common misconception that float operations are intrinsically faster than double. working on large float arrays is faster usually through less cache misses alone.
Be careful with float accuracy. it can be "good enough" for a long time, but I've often seen it fail faster than expected. Float-based FFT's can be much faster due to SIMD support, but generate notable artefacts quite early for audio processing.
Use fixed point.
However, if you want to approach the realm of possibly making portable floating point operations, you at least need to use controlfp to ensure consistent FPU behavior as well as ensuring that the compiler enforces ANSI conformance with respect to floating point operations. Why ANSI? Because it's a standard.
And even then you aren't guaranteeing that you can generate identical floating point behavior; that also depends on the CPU/FPU you are running on.
It shouldn't be an issue, IEEE 754 already defines all details of the layout of floats.
The maximum and minimum values storable should be defined in limits.h
Portable is one thing, generating consistent results on different platforms is another. Depending on what you are trying to do then writing portable code shouldn't be too difficult, but getting consistent results on ANY platform is practically impossible.
I believe "limits.h" will include the C library constants INT_MAX and its brethren. However, it is preferable to use "limits" and the classes it defines:
std::numeric_limits<float>, std::numeric_limits<double>, std::numberic_limits<int>, etc...
If you're assuming that you will get the same results on another system, read What could cause a deterministic process to generate floating point errors first. You might be surprised to learn that your floating point arithmetic isn't even the same across different runs on the very same machine!

Different math rounding behaviour between Linux, Mac OS X and Windows

HI,
I developed some mixed C/C++ code, with some intensive numerical calculations. When compiled in Linux and Mac OS X I get very similar results after the simulation ends. In Windows the program compiles as well but I get very different results and sometimes the program does not seem to work.
I used GNU compilers in all systems. Some friend recommend me to add -frounding-math and now the windows version seems to work more stable, but Linux and Os X, their results, do not change at all.
Could you recommend another options to get more concordance between Win and Linux/OSX versions?
Thanks
P.D. I also tried -O0 (no optimizations) and specified -m32
I can't speak to the implementation in Windows, but Intel chips contain 80-bit floating point registers, and can give greater precision than that specified in the IEEE-754 floating point standard. You can try calling this routine in the main() of your application (on Intel chip platforms):
inline void fpu_round_to_IEEE_double()
{
unsigned short cw = 0;
_FPU_GETCW(cw); // Get the FPU control word
cw &= ~_FPU_EXTENDED; // mask out '80-bit' register precision
cw |= _FPU_DOUBLE; // Mask in '64-bit' register precision
_FPU_SETCW(cw); // Set the FPU control word
}
I think this is distinct from the rounding modes discussed by #Alok.
There are four different types of rounding for floating-point numbers: round toward zero, round up, round down, and round to the nearest number. Depending upon compiler/operating system, the default may be different on different systems. For programmatically changing the rounding method, see fesetround. It is specified by C99 standard, but may be available to you.
You can also try -ffloat-store gcc option. This will try to prevent gcc from using 80-bit floating-point values in registers.
Also, if your results change depending upon the rounding method, and the differences are significant, it means that your calculations may not be stable. Please consider doing interval analysis, or using some other method to find the problem. For more information, see How Futile are Mindless Assessments of Roundoff in Floating-Point Computation? (pdf) and The pitfalls of verifying floating-point computations (ACM link, but you can get PDF from a lot of places if that doesn't work for you).
In addition to the runtime rounding settings that people mentioned, you can control the Visual Studio compiler settings in Properties > C++ > Code Generation > Floating Point Model. I've seen cases where setting this to "Fast" may cause some bad numerical behavior (e.g. iterative methods fail to converge).
The settings are explained here:
http://msdn.microsoft.com/en-us/library/e7s85ffb%28VS.80%29.aspx
The IEEE and C/C++ standards leave some aspects of floating-point math unspecified. Yes, the precise result of adding to floats is determined, but any more complicated calculation is not. For instance, if you add three floats then the compiler can do the evaluation at float precision, double precision, or higher. Similarly, if you add three doubles then the compiler may do the evaluation at double precision or higher.
VC++ defaults to setting the x87 FPUs precision to double. I believe that gcc leaves it at 80-bit precision. Neither is clearly better, but they can easily give different results, especially if there is any instability in your calculations. In particular 'tiny + large - large' may give very different results if you have extra bits of precision (or if the order of evaluation changes). The implications of varying intermediate precision are discussed here:
http://randomascii.wordpress.com/2012/03/21/intermediate-floating-point-precision/
The challenges of deterministic floating-point are discussed here:
http://randomascii.wordpress.com/2013/07/16/floating-point-determinism/
Floating-point math is tricky. You need to find out when your calculations diverge and examine the generated code to understand why. Only then can you decide what actions to take.