If two languages follow IEEE 754, will calculations in both languages result in the same answers? - c++

I'm in the process of converting a program from Scilab code to C++. One loop in particular is producing a slightly different result than the original Scilab code (it's a long piece of code so I'm not going to include it in the question but I'll try my best to summarise the issue below).
The problem is, each step of the loop uses calculations from the previous step. Additionally, the difference between calculations only becomes apparent around the 100,000th iteration (out of approximately 300,000).
Note: I'm comparing the output of my C++ program with the outputs of Scilab 5.5.2 using the "format(25);" command. Meaning I'm comparing 25 significant digits. I'd also like to point out I understand how precision cannot be guaranteed after a certain number of bits but read the sections below before commenting. So far, all calculations have been identical up to 25 digits between the two languages.
In attempts to get to the bottom of this issue, so far I've tried:
Examining the data type being used:
I've managed to confirm that Scilab is using IEEE 754 doubles (according to the language documentation). Also, according to Wikipedia, C++ isn't required to use IEEE 754 for doubles, but from what I can tell, everywhere I use a double in C++ it has perfectly match Scilab's results.
Examining the use of transcendental functions:
I've also read from What Every Computer Scientist Should Know About Floating-Point Arithmetic that IEEE does not require transcendental functions to be exactly rounded. With that in mind, I've compared the results of these functions (sin(), cos(), exp()) in both languages and again, the results appear to be the same (up to 25 digits).
The use of other functions and predefined values:
I repeated the above steps for the use of sqrt() and pow(). As well as the value of Pi (I'm using M_PI in C++ and %pi in Scilab). Again, the results were the same.
Lastly, I've rewritten the loop (very carefully) in order to ensure that the code is identical between the two languages.
Note: Interestingly, I noticed that for all the above calculations the results between the two languages match farther than the actual result of the calculations (outside of floating point arithmetic). For example:
Value of sin(x) using Wolfram Alpha = 0.123456789.....
Value of sin(x) using Scilab & C++ = 0.12345yyyyy.....
Where even once the value computed using Scilab or C++ started to differ from the actual result (from Wolfram). Each language's result still matched each other. This leads me to believe that most of the values are being calculated (between the two languages) in the same way. Even though they're not required to by IEEE 754.
My original thinking was one of the first three points above are implemented differently between the two languages. But from what I can tell everything seems to produce identical results.
Is it possible that even though all the inputs to these loops are identical, the results can be different? Possibly because a very small error (past what I can see with 25 digits) is occurring that accumulates over time? If so, how can I go about fixing this issue?

No, the format of the numbering system does not guarantee equivalent answers from functions in different languages.
Functions, such as sin(x), can be implemented in different ways, using the same language (as well as different languages). The sin(x) function is an excellent example. Many implementations will use a look-up table or look-up table with interpolation. This has speed advantages. However, some implementations may use a Taylor Series to evaluate the function. Some implementations may use polynomials to come up with a close approximation.
Having the same numeric format is one hurdle to solve between languages. Function implementation is another.
Remember, you need to consider the platform as well. A program that uses an 80-bit floating point processor will have different results than a program that uses a 64-bit floating point software implementation.

Some architectures provide the capability of using extended precision floating point registers (e.g. 80 bits internally, versus 64-bit values in RAM). So, it's possible to get slightly different results for the same calculation, depending on how the computations are structured, and the optimization level used to compile the code.

Yes, it's possible to have a different results. It's possible even if you are using exactly the same source code in the same programming language for the same platform. Sometimes it's enough to have a different compiler switch; for example -ffastmath would lead the compiler to optimize your code for speed rather than accuracy, and, if your computational problem is not well-conditioned to begin with, the result may be significantly different.
For example, suppose you have this code:
x_8th = x*x*x*x*x*x*x*x;
One way to compute this is to perform 7 multiplications. This would be the default behavior for most compilers. However, you may want to speed this up by specifying compiler option -ffastmath and the resulting code would have only 3 multiplications:
temp1 = x*x; temp2 = temp1*temp1; x_8th = temp2*temp2;
The result would be slightly different because finite precision arithmetic is not associative, but sufficiently close for most applications and much faster. However, if your computation is not well-conditioned that small error can quickly get amplified into a large one.

Note that it is possible that the Scilab and C++ are not using the exact same instruction sequence, or that one uses FPU and the other uses SSE, so there may not be a way to get them to be exactly the same.
As commented by IInspectable, if your compiler has _control87() or something similar, you can use it to change the precision and/or rounding settings. You could try combinations of this to see if it has any effect, but again, even you manage to get the settings identical for Scilab and C++, differences in the actual instruction sequences may be the issue.
http://msdn.microsoft.com/en-us/library/e9b52ceh.aspx
If SSE is used, I"m not sure what can be adjusted as I don't think SSE has an 80 bit precision mode.
In the case of using FPU in 32 bit mode, and if your compiler doesn't have something like _control87, you could use assembly code. If inline assembly is not allowed, you would need to call an assembly function. This example is from an old test program:
static short fcw; /* 16 bit floating point control word */
/* ... */
/* set precision control to extended precision */
__asm{
fnstcw fcw
or fcw,0300h
fldcw fcw
}

Related

Is cos(x) required to return identical values in different C++ implementations that use IEEE-754?

Is there any sort of guarantee - either in the C++ standard or in some other document - that C++ code computing cos(x) will produce identical values when compiled with g++, clang, MSVC, etc., assuming those implementations are using IEEE-754 64-bit doubles and the input value x is exactly equal? My assumption is "yes," but I'd like to confirm that before relying on this behavior.
Context: I'm teaching a course in which students may need to compute trigonometric functions of inputs. I can assure that those inputs are identical when fed into the functions. I'm aware that equality-testing doubles is not a good idea, but in this specific case I was wondering if it was safe to do so.
cos is a transcendental function. Transcendental functions are subject to the table-maker's dilemma. Informally, what this means is: let's say you come up with some iterative algorithm for approximating the cosine of an input value: for example, a Taylor series. When you run this iterative algorithm, you have to decide how much extra precision to keep at the intermediate stages (rounding too early may reduce the accuracy of the final result). But because the function is transcendental, it's very difficult to determine how many extra bits must be carried during the calculation in order to yield a correctly rounded final result, and for some input values, the number of extra bits required might be very large.
For this reason, it is generally not practical to design hardware that guarantees correctly rounded results for transcendental functions such as cos (where "correctly rounded" means that the resulting floating point value is the one that's closest to the true real value of the function). Instead, the hardware designers will implement a calculation technique that performs reasonably well and that, for most practical input values, will yield a result that is within 1 bit of the exact real result. (If you absolutely need a cosine function that always yields a correctly rounded result, then apparently it's possible to implement one: GNU MPFR claims to have done it. But this will perform much worse than hardware.)
IEEE 754 (2008) lists cos as one of the "recommended correctly rounded functions", which means that if you implement IEEE 754's version of cos, then you have to yield a correctly rounded result. But these functions are only "recommended" to be provided, and not required. Therefore, a conforming implementation of IEEE 754 might not provide a correctly rounded cos function, and might instead provide a "practical" cos function as described in the previous paragraph. Therefore, in practice, two implementations of C++ which both claim to be IEEE 754 compliant may not yield the exact same value for a transcendental function such as cos when applied to the same argument.
(Note that IEEE 754 requires implementations to provide a square root function that is correctly rounded. This is not a transcendental function, so correctly rounding it is not nearly as difficult.)

`std::sin` is wrong in the last bit

I am porting some program from Matlab to C++ for efficiency. It is important for the output of both programs to be exactly the same (**).
I am facing different results for this operation:
std::sin(0.497418836818383950) = 0.477158760259608410 (C++)
sin(0.497418836818383950) = 0.47715876025960846000 (Matlab)
N[Sin[0.497418836818383950], 20] = 0.477158760259608433 (Mathematica)
So, as far as I know both C++ and Matlab are using IEEE754 defined double arithmetic. I think I have read somewhere that IEEE754 allows differents results in the last bit. Using mathematica to decide, seems like C++ is more close to the result. How can I force Matlab to compute the sin with precision to the last bit included, so that the results are the same?
In my program this behaviour leads to big errors because the numerical differential equation solver keeps increasing this error in the last bit. However I am not sure that C++ ported version is correct. I am guessing that even if the IEEE754 allows the last bit to be different, somehow guarantees that this error does not get bigger when using the result in more IEEE754 defined double operations (because otherwise, two different programs correct according to the IEEE754 standard could produce completely different outputs). So the other question is Am I right about this?
I would like get an answer to both bolded questions. Edit: The first question is being quite controversial, but is the less important, can someone comment about the second one?
Note: This is not an error in the printing, just in case you want to check, this is how I obtained these results:
http://i.imgur.com/cy5ToYy.png
Note (**): What I mean by this is that the final output, which are the results of some calculations showing some real numbers with 4 decimal places, need to be exactly the same. The error I talk about in the question gets bigger (because of more operations, each of one is different in Matlab and in C++) so the final differences are huge) (If you are curious enough to see how the difference start getting bigger, here is the full output [link soon], but this has nothing to do with the question)
Firstly, if your numerical method depends on the accuracy of sin to the last bit, then you probably need to use an arbitrary precision library, such as MPFR.
The IEEE754 2008 standard doesn't require that the functions be correctly rounded (it does "recommend" it though). Some C libms do provide correctly rounded trigonometric functions: I believe that the glibc libm does (typically used on most linux distributions), as does CRlibm. Most other modern libms will provide trig functions that are within 1 ulp (i.e. one of the two floating point values either side of the true value), often termed faithfully rounded, which is much quicker to compute.
None of those values you printed could actually arise as IEEE 64bit floating point values (even if rounded): the 3 nearest (printed to full precision) are:
0.477158760259608 405451814405751065351068973541259765625
0.477158760259608 46096296563700889237225055694580078125
0.477158760259608 516474116868266719393432140350341796875
The possible values you could want are:
The exact sin of the decimal .497418836818383950, which is
0.477158760259608 433132061388630377105954125778369485736356219...
(this appears to be what Mathematica gives).
The exact sin of the 64-bit float nearest .497418836818383950:
0.477158760259608 430531153841011107415427334794384396325832953...
In both cases, the first of the above list is the nearest (though only barely in the case of 1).
The sine of the double constant you wrote is about 0x1.e89c4e59427b173a8753edbcb95p-2, whose nearest double is 0x1.e89c4e59427b1p-2. To 20 decimal places, the two closest doubles are 0.47715876025960840545 and 0.47715876025960846096.
Perhaps Matlab is displaying a truncated value? (EDIT: I now see that the fourth-last digit is a 6, not a 0. Matlab is giving you a result that's still faithfully-rounded, but it's the farther of the two closest doubles to the desired result. And it's still printing out the wrong number.
I should also point out that Mathematica is probably trying to solve a different problem---compute the sine of the decimal number 0.497418836818383950 to 20 decimal places. You should not expect this to match either the C++ code's result or Matlab's result.

Compile-time vs runtime constants

I'm currently developing my own math lib to improve my c++ skills. I stumbled over boost's constants header file and I'm asking myself what is the point of using compile-time constants over runtime declared constants?
const float root_two = 1.414213562373095048801688724209698078e+00;
const float root_two = std::sqrt( 2.0f );
Isn't there an error introduced when using the fixed compile-time constant but calculations while running the application with functions?
Wouldn't then the error be negleted if you use runtime constants?
As HansPassant said, it may save you a micro-Watt. However, note that the compiler will sometimes optimize that away by evaluating the expression during compilation and substituting in the literal value. See this answer to my earlier question about this.
Isn't there an error introduced when using the fixed compile-time constant?
If you are using arbitrary-precision data types, perhaps. But it is more efficient to use plain data types like double and these are limited to about 16 decimal digits of precision anyways.
Based on (2), your second initialization would not be more precise than your first one. In fact, if you precomputed the value of the square root with an arbitrary precision calculator, the literal may even be more precise.
A library such as Boost must work in tons of environments. The application that uses the library could have set FPU could be in flush-to-zero mode, giving you 0.0 for denormalized (tiny) results.
Or the application could have been compiled with the -fast-math flag, giving
inaccurate results.
Furthermore, a runtime computation of (a + b + c) depends on how the compiler generated code will store intermediate results. It might chose to pop (a + b) from the FPU as a 64-bit double, or it could leave it on the FPU stack as 80 bits. It depends on tons of things, also algebraic rewrites of associativity.
All in all, if you mix different processors, operating systems, compilers and the different applications the library is used inside, you will get a different result for each permutation of the above.
In some (rare) siturations this is not wanted; you can be in need for an exact constant value.

Is there any way to make sure the output of the float-point the same in different OS?

Here is my codeļ¼š
int a = 0x451998a0;
float b = *((float *)&a);
printf("coverto float: %f, %.10lf\n", b, b);
In windows the output is:
coverto float: 2457.539063, 2457.5390625000
In linux the output is:
coverto float: 2457.539062, 2457.5390625000
Is there any way to make sure the output is the same?
The behavior you're seeing is just a consequence of the fact that Windows' printf() function is implemented differently from Linux's printf() function. Most likely the difference is in how printf() implements number rounding.
How printf() works under the hood in either system is an implementation detail; thus the system is not likely to provide such fine-grained control on how printf() displays the floating point values.
There are two ways that may work to keep them the same:
Use more precision during calculation than while displaying it. For example, some scientific and graphing calculators use double precision for all internal calculations, but display the results with only float precision.
Use a cross-platform printf() library. Such libraries would most likely have the same behavior on all platforms, as the calculations required to determine what digits to display are usually platform-agnostic.
However, this really isn't as big of a problem as you think it is. The difference between the outputs is 0.000001. That is a ~0.0000000004% difference from either the two values. The display error is really quite negligible.
Consider this: the distance between Los Angeles and New York is 2464 miles, which is of the same order of magnitude as the numbers in your display outputs. A difference of 0.000001 miles is 1.61 millimeters. We of course don't measure distances between cities with anywhere near that kind of precision. :-)
If you use the same printf() implementation, there's a good chance they'll show the same output. Depending on what you're up to, it may be easier to use GNU GCC on both OSes, or to get printf() source code and add it to your project (you should have no trouble googling one).
BTW - have you actually checked what that hex number encodes? Should it round up or down? The 625 thing is likely itself rounded, so you shouldn't assume it should round to 63....
The obvious answer is to use less precision in your output. In general,
if there's any calculation involved, you can't even be sure that the
actual floating point values are identical. And how printf and
ostream round is implementation defined, even if the floating point
values are equal.
In general, C++ doesn't guarantee that two implementations produce the
same results. In this particular case, if it's important, you can do
the rounding by hand, before doing the conversion, but you'll still have
occasional problems because the actual floating point values will be
different. This may, in fact, occur even with different levels of
optimization with the same compiler. So anything you try (other than
writing the entire program in assembler) is bound to be a loosing battle
in the end.

precision differences in matlab and c++

I am trying to make equivalence tests on an algorithm written in C++ and in Matlab.
The algorithm contains some kind of a loop in time and runs more than 1000 times. It has arithmetic operations and some math functions.
I feed the initial inputs to both platforms by hand (like a=1.767, b=6.65, ...) and when i check the hexadecimal representations of those inputs they are the same. So no problem for inputs. And get the outputs of c++ to matlab by a text file with 16 decimal digits. (i use "setprecision(32)" statement)
But here comes the problem; although after the 614'th step of both code, all the results are exactly the same, on the step of 615 I get a difference about 2.xxx..xxe-19? And after this step the error becomes larger and larger, and at the end of the runs it is about 5.xx..xxe-14.
0x3ff1 3e42 a211 6cca--->[C++ function]--->0x3ff4 7619 7005 5a42
0x3ff1 3e42 a211 6cca--->[MATLAB function]--->ans
ans - 0x3ff4 7619 7005 5a42
= 2.xxx..xxe-19
I searched how matlab behaves the numbers and found really interesting things like "denormalized mantissa". While realmin is about e-308, by denormalizing the mantissa matlab has the smallest real number about e-324. Further matlab holds many more digits for "pi" or "exp(1)" than that of c++.
On the other hand, matlab help says that whatever the format it displays, matlab uses the double precision internally.
So,I'd really appreciate if someone explains what the exact reason is for these differences? How can we make equivalence tests on matlab and c++?
There is one thing in x86 CPU about floating points numbers. Internally, the floating point unit uses registers that are 10 bytes, i.e. 80 bits. Furthermore, the CPU has a setting that tells whether the floating point calculations should be made with 32 bits (float), 64 bits (double) or 80 bits precision. Less precision meaning faster executed floating point operations. (The 32 bits mode used to be popular for video games, where speed takes over precision).
From this I remember I tracked a bug in a calculation library (dll) that given the same input did not gave the same result whether it was started from a test C++ executable, or from MatLab.. Furthermore, this did not happen in Debug mode, only in Release!
The final conclusion was that MatLab did set the CPU floating point precision to 80 bits, whereas our test executable did not (and leave the default 64 bits precision). Furthermore, this calculation mismatch did not happen Debug mode because all the variables were written to memory into 64 bits double variables, and reloaded from there afterward, nullifying the additional 16 bits. In Release mode, some variables were optimized out (not written to memory), and all calculations were done with floating point registers only, on 80 bits, keeping the additional 16 bits non-zero value.
Don't know if this helps, but maybe worth knowing.
A similar discussion occurred before, the conclusion was that IEEE 754 tolerates error in the last bit for transcendental functions (cos, sin, exp, etc..). So you can't expect exactly same results between MATLAB and C (not even same C code compiled in different compilers).
I may be way off track here and you may already have investigated this possibility but it could be possible that there are differences between C++ and Matlab in the way that the mathematical library functions (sin() cos() and exp() that you mention) are implemented internally. Ultimately, some kind of functional approximation must be being used to generate function values and if there is some difference between these methods then presumably it is possible that this manifests itself in the form of numerical rounding error over a large number of iterations.
This question basically covers what I am trying to suggest How does C compute sin() and other math functions?