`std::sin` is wrong in the last bit - c++

I am porting some program from Matlab to C++ for efficiency. It is important for the output of both programs to be exactly the same (**).
I am facing different results for this operation:
std::sin(0.497418836818383950) = 0.477158760259608410 (C++)
sin(0.497418836818383950) = 0.47715876025960846000 (Matlab)
N[Sin[0.497418836818383950], 20] = 0.477158760259608433 (Mathematica)
So, as far as I know both C++ and Matlab are using IEEE754 defined double arithmetic. I think I have read somewhere that IEEE754 allows differents results in the last bit. Using mathematica to decide, seems like C++ is more close to the result. How can I force Matlab to compute the sin with precision to the last bit included, so that the results are the same?
In my program this behaviour leads to big errors because the numerical differential equation solver keeps increasing this error in the last bit. However I am not sure that C++ ported version is correct. I am guessing that even if the IEEE754 allows the last bit to be different, somehow guarantees that this error does not get bigger when using the result in more IEEE754 defined double operations (because otherwise, two different programs correct according to the IEEE754 standard could produce completely different outputs). So the other question is Am I right about this?
I would like get an answer to both bolded questions. Edit: The first question is being quite controversial, but is the less important, can someone comment about the second one?
Note: This is not an error in the printing, just in case you want to check, this is how I obtained these results:
http://i.imgur.com/cy5ToYy.png
Note (**): What I mean by this is that the final output, which are the results of some calculations showing some real numbers with 4 decimal places, need to be exactly the same. The error I talk about in the question gets bigger (because of more operations, each of one is different in Matlab and in C++) so the final differences are huge) (If you are curious enough to see how the difference start getting bigger, here is the full output [link soon], but this has nothing to do with the question)

Firstly, if your numerical method depends on the accuracy of sin to the last bit, then you probably need to use an arbitrary precision library, such as MPFR.
The IEEE754 2008 standard doesn't require that the functions be correctly rounded (it does "recommend" it though). Some C libms do provide correctly rounded trigonometric functions: I believe that the glibc libm does (typically used on most linux distributions), as does CRlibm. Most other modern libms will provide trig functions that are within 1 ulp (i.e. one of the two floating point values either side of the true value), often termed faithfully rounded, which is much quicker to compute.
None of those values you printed could actually arise as IEEE 64bit floating point values (even if rounded): the 3 nearest (printed to full precision) are:
0.477158760259608 405451814405751065351068973541259765625
0.477158760259608 46096296563700889237225055694580078125
0.477158760259608 516474116868266719393432140350341796875
The possible values you could want are:
The exact sin of the decimal .497418836818383950, which is
0.477158760259608 433132061388630377105954125778369485736356219...
(this appears to be what Mathematica gives).
The exact sin of the 64-bit float nearest .497418836818383950:
0.477158760259608 430531153841011107415427334794384396325832953...
In both cases, the first of the above list is the nearest (though only barely in the case of 1).

The sine of the double constant you wrote is about 0x1.e89c4e59427b173a8753edbcb95p-2, whose nearest double is 0x1.e89c4e59427b1p-2. To 20 decimal places, the two closest doubles are 0.47715876025960840545 and 0.47715876025960846096.
Perhaps Matlab is displaying a truncated value? (EDIT: I now see that the fourth-last digit is a 6, not a 0. Matlab is giving you a result that's still faithfully-rounded, but it's the farther of the two closest doubles to the desired result. And it's still printing out the wrong number.
I should also point out that Mathematica is probably trying to solve a different problem---compute the sine of the decimal number 0.497418836818383950 to 20 decimal places. You should not expect this to match either the C++ code's result or Matlab's result.

Related

Fortran - want to round to one decimal point

In fortran I have to round latitude and longitude to one digit after decimal point.
I am using gfortran compiler and the nint function but the following does not work:
print *, nint( 1.40 * 10. ) / 10. ! prints 1.39999998
print *, nint( 1.49 * 10. ) / 10. ! prints 1.50000000
Looking for both general and specific solutions here. For example:
How can we display numbers rounded to one decimal place?
How can we store such rounded numbers in fortran. It's not possible in a float variable, but are there other ways?
How can we write such numbers to NetCDF?
How can we write such numbers to a CSV or text file?
As others have said, the issue is the use of floating point representation in the NetCDF file. Using nco utilities, you can change the latitude/longitude to short integers with scale_factor and add_offset. Like this:
ncap2 -s 'latitude=pack(latitude, 0.1, 0); longitude=pack(longitude, 0.1, 0);' old.nc new.nc
There is no way to do what you are asking. The underlying problem is that the rounded values you desire are not necessarily able to be represented using floating point.
For example, if you had a value 10.58, this is represented exactly as 1.3225000 x 2^3 = 10.580000 in IEEE754 float32.
When you round this to value to one decimal point (however you choose to do so), the result would be 10.6, however 10.6 does not have an exact representation. The nearest representation is 1.3249999 x 2^3 = 10.599999 in float32. So no matter how you deal with the rounding, there is no way to store 10.6 exactly in a float32 value, and no way to write it as a floating point value into a netCDF file.
YES, IT CAN BE DONE! The "accepted" answer above is correct in its limited range, but is wrong about what you can actually accomplish in Fortran (or various other HGL's).
The only question is what price are you willing to pay, if the something like a Write with F(6.1) fails?
From one perspective, your problem is a particularly trivial variation on the subject of "Arbitrary Precision" computing. How do you imagine cryptography is handled when you need to store, manipulate, and perform "math" with, say, 1024 bit numbers, with exact precision?
A simple strategy in this case would be to separate each number into its constituent "LHSofD" (Left Hand Side of Decimal), and "RHSofD" values. For example, you might have an RLon(i,j) = 105.591, and would like to print 105.6 (or any manner of rounding) to your netCDF (or any normal) file. Split this into RLonLHS(i,j) = 105, and RLonRHS(i,j) = 591.
... at this point you have choices that increase generality, but at some expense. To save "money" the RHS might be retained as 0.591 (but loose generality if you need to do fancier things).
For simplicity, assume the "cheap and cheerful" second strategy.
The LHS is easy (Int()).
Now, for the RHS, multiply by 10 (if, you wish to round to 1 DEC), e.g. to arrive at RLonRHS(i,j) = 5.91, and then apply Fortran "round to nearest Int" NInt() intrinsic ... leaving you with RLonRHS(i,j) = 6.0.
... and Bob's your uncle:
Now you print the LHS and RHS to your netCDF using a suitable Write statement concatenating the "duals", and will created an EXACT representation as per the required objectives in the OP.
... of course later reading-in those values returns to the same issues as illustrated above, unless the read-in also is ArbPrec aware.
... we wrote our own ArbPrec lib, but there are several about, also in VBA and other HGL's ... but be warned a full ArbPrec bit of machinery is a non-trivial matter ... lucky you problem is so simple.
There are several aspects one can consider in relation to "rounding to one decimal place". These relate to: internal storage and manipulation; display and interchange.
Display and interchange
The simplest aspects cover how we report stored value, regardless of the internal representation used. As covered in depth in other answers and elsewhere we can use a numeric edit descriptor with a single fractional digit:
print '(F0.1,2X,F0.1)', 10.3, 10.17
end
How the output is rounded is a changeable mode:
print '(RU,F0.1,2X,RD,F0.1)', 10.17, 10.17
end
In this example we've chosen to round up and then down, but we could also round to zero or round to nearest (or let the compiler choose for us).
For any formatted output, whether to screen or file, such edit descriptors are available. A G edit descriptor, such as one may use to write CSV files, will also do this rounding.
For unformatted output this concept of rounding is not applicable as the internal representation is referenced. Equally for an interchange format such as NetCDF and HDF5 we do not have this rounding.
For NetCDF your attribute convention may specify something like FORTRAN_format which gives an appropriate format for ultimate display of the (default) real, non-rounded, variable .
Internal storage
Other answers and the question itself mention the impossibility of accurately representing (and working with) decimal digits. However, nothing in the Fortran language requires this to be impossible:
integer, parameter :: rk = SELECTED_REAL_KIND(radix=10)
real(rk) x
x = 0.1_rk
print *, x
end
is a Fortran program which has a radix-10 variable and literal constant. See also IEEE_SELECTED_REAL_KIND(radix=10).
Now, you are exceptionally likely to see that selected_real_kind(radix=10) gives you the value -5, but if you want something positive that can be used as a type parameter you just need to find someone offering you such a system.
If you aren't able to find such a thing then you will need to work accounting for errors. There are two parts to consider here.
The intrinsic real numerical types in Fortran are floating point ones. To use a fixed point numeric type, or a system like binary-coded decimal, you will need to resort to non-intrinsic types. Such a topic is beyond the scope of this answer, but pointers are made in that direction by DrOli.
These efforts will not be computationally/programmer-time cheap. You will also need to take care of managing these types in your output and interchange.
Depending on the requirements of your work, you may find simply scaling by (powers of) ten and working on integers suits. In such cases, you will also want to find the corresponding NetCDF attribute in your convention, such as scale_factor.
Relating to our internal representation concerns we have similar rounding issues to output. For example, if my input data has a longitude of 10.17... but I want to round it in my internal representation to (the nearest representable value to) a single decimal digit (say 10.2/10.1999998) and then work through with that, how do I manage that?
We've seen how nint(10.17*10)/10. gives us this, but we've also learned something about how numeric edit descriptors do this nicely for output, including controlling the rounding mode:
character(10) :: intermediate
real :: rounded
write(intermediate, '(RN,F0.1)') 10.17
read(intermediate, *) rounded
print *, rounded ! This may look not "exact"
end
We can track the accumulation of errors here if this is desired.
The `round_x = nint(x*10d0)/10d0' operator rounds x (for abs(x) < 2**31/10, for large numbers use dnint()) and assigns the rounded value to the round_x variable for further calculations.
As mentioned in the answers above, not all numbers with one significant digit after the decimal point have an exact representation, for example, 0.3 does not.
print *, 0.3d0
Output:
0.29999999999999999
To output a rounded value to a file, to the screen, or to convert it to a string with a single significant digit after the decimal point, use edit descriptor 'Fw.1' (w - width w characters, 0 - variable width). For example:
print '(5(1x, f0.1))', 1.30, 1.31, 1.35, 1.39, 345.46
Output:
1.3 1.3 1.4 1.4 345.5
#JohnE, using 'G10.2' is incorrect, it rounds the result to two significant digits, not to one digit after the decimal point. Eg:
print '(g10.2)', 345.46
Output:
0.35E+03
P.S.
For NetCDF, rounding should be handled by NetCDF viewer, however, you can output variables as NC_STRING type:
write(NetCDF_out_string, '(F0.1)') 1.49
Or, alternatively, get "beautiful" NC_FLOAT/NC_DOUBLE numbers:
beautiful_float_x = nint(x*10.)/10. + epsilon(1.)*nint(x*10.)/10./2.
beautiful_double_x = dnint(x*10d0)/10d0 + epsilon(1d0)*dnint(x*10d0)/10d0/2d0
P.P.S. #JohnE
The preferred solution is not to round intermediate results in memory or in files. Rounding is performed only when the final output of human-readable data is issued;
Use print with edit descriptor ‘Fw.1’, see above;
There are no simple and reliable ways to accurately store rounded numbers (numbers with a decimal fixed point):
2.1. Theoretically, some Fortran implementations can support decimal arithmetic, but I am not aware of implementations that in which ‘selected_real_kind(4, 4, 10)’ returns a value other than -5;
2.2. It is possible to store rounded numbers as strings;
2.3. You can use the Fortran binding of GIMP library. Functions with the mpq_ prefix are designed to work with rational numbers;
There are no simple and reliable ways to write rounded numbers in a netCDF file while preserving their properties for the reader of this file:
3.1. netCDF supports 'Packed Data Values‘, i.e. you can set an integer type with the attributes’ scale_factor‘,’ add_offset' and save arrays of integers. But, in the file ‘scale_factor’ will be stored as a floating number of single or double precision, i.e. the value will differ from 0.1. Accordingly, when reading, when calculating by the netCDF library unpacked_data_value = packed_data_value*scale_factor + add_offset, there will be a rounding error. (You can set scale_factor=0.1*(1.+epsilon(1.)) or scale_factor=0.1d0*(1d0+epsilon(1d0)) to exclude a large number of digits '9'.);
3.2. There are C_format and FORTRAN_format attributes. But it is quite difficult to predict which reader will use which attribute and whether they will use them at all;
3.3. You can store rounded numbers as strings or user-defined types;
Use write() with edit descriptor ‘Fw.1’, see above.

If two languages follow IEEE 754, will calculations in both languages result in the same answers?

I'm in the process of converting a program from Scilab code to C++. One loop in particular is producing a slightly different result than the original Scilab code (it's a long piece of code so I'm not going to include it in the question but I'll try my best to summarise the issue below).
The problem is, each step of the loop uses calculations from the previous step. Additionally, the difference between calculations only becomes apparent around the 100,000th iteration (out of approximately 300,000).
Note: I'm comparing the output of my C++ program with the outputs of Scilab 5.5.2 using the "format(25);" command. Meaning I'm comparing 25 significant digits. I'd also like to point out I understand how precision cannot be guaranteed after a certain number of bits but read the sections below before commenting. So far, all calculations have been identical up to 25 digits between the two languages.
In attempts to get to the bottom of this issue, so far I've tried:
Examining the data type being used:
I've managed to confirm that Scilab is using IEEE 754 doubles (according to the language documentation). Also, according to Wikipedia, C++ isn't required to use IEEE 754 for doubles, but from what I can tell, everywhere I use a double in C++ it has perfectly match Scilab's results.
Examining the use of transcendental functions:
I've also read from What Every Computer Scientist Should Know About Floating-Point Arithmetic that IEEE does not require transcendental functions to be exactly rounded. With that in mind, I've compared the results of these functions (sin(), cos(), exp()) in both languages and again, the results appear to be the same (up to 25 digits).
The use of other functions and predefined values:
I repeated the above steps for the use of sqrt() and pow(). As well as the value of Pi (I'm using M_PI in C++ and %pi in Scilab). Again, the results were the same.
Lastly, I've rewritten the loop (very carefully) in order to ensure that the code is identical between the two languages.
Note: Interestingly, I noticed that for all the above calculations the results between the two languages match farther than the actual result of the calculations (outside of floating point arithmetic). For example:
Value of sin(x) using Wolfram Alpha = 0.123456789.....
Value of sin(x) using Scilab & C++ = 0.12345yyyyy.....
Where even once the value computed using Scilab or C++ started to differ from the actual result (from Wolfram). Each language's result still matched each other. This leads me to believe that most of the values are being calculated (between the two languages) in the same way. Even though they're not required to by IEEE 754.
My original thinking was one of the first three points above are implemented differently between the two languages. But from what I can tell everything seems to produce identical results.
Is it possible that even though all the inputs to these loops are identical, the results can be different? Possibly because a very small error (past what I can see with 25 digits) is occurring that accumulates over time? If so, how can I go about fixing this issue?
No, the format of the numbering system does not guarantee equivalent answers from functions in different languages.
Functions, such as sin(x), can be implemented in different ways, using the same language (as well as different languages). The sin(x) function is an excellent example. Many implementations will use a look-up table or look-up table with interpolation. This has speed advantages. However, some implementations may use a Taylor Series to evaluate the function. Some implementations may use polynomials to come up with a close approximation.
Having the same numeric format is one hurdle to solve between languages. Function implementation is another.
Remember, you need to consider the platform as well. A program that uses an 80-bit floating point processor will have different results than a program that uses a 64-bit floating point software implementation.
Some architectures provide the capability of using extended precision floating point registers (e.g. 80 bits internally, versus 64-bit values in RAM). So, it's possible to get slightly different results for the same calculation, depending on how the computations are structured, and the optimization level used to compile the code.
Yes, it's possible to have a different results. It's possible even if you are using exactly the same source code in the same programming language for the same platform. Sometimes it's enough to have a different compiler switch; for example -ffastmath would lead the compiler to optimize your code for speed rather than accuracy, and, if your computational problem is not well-conditioned to begin with, the result may be significantly different.
For example, suppose you have this code:
x_8th = x*x*x*x*x*x*x*x;
One way to compute this is to perform 7 multiplications. This would be the default behavior for most compilers. However, you may want to speed this up by specifying compiler option -ffastmath and the resulting code would have only 3 multiplications:
temp1 = x*x; temp2 = temp1*temp1; x_8th = temp2*temp2;
The result would be slightly different because finite precision arithmetic is not associative, but sufficiently close for most applications and much faster. However, if your computation is not well-conditioned that small error can quickly get amplified into a large one.
Note that it is possible that the Scilab and C++ are not using the exact same instruction sequence, or that one uses FPU and the other uses SSE, so there may not be a way to get them to be exactly the same.
As commented by IInspectable, if your compiler has _control87() or something similar, you can use it to change the precision and/or rounding settings. You could try combinations of this to see if it has any effect, but again, even you manage to get the settings identical for Scilab and C++, differences in the actual instruction sequences may be the issue.
http://msdn.microsoft.com/en-us/library/e9b52ceh.aspx
If SSE is used, I"m not sure what can be adjusted as I don't think SSE has an 80 bit precision mode.
In the case of using FPU in 32 bit mode, and if your compiler doesn't have something like _control87, you could use assembly code. If inline assembly is not allowed, you would need to call an assembly function. This example is from an old test program:
static short fcw; /* 16 bit floating point control word */
/* ... */
/* set precision control to extended precision */
__asm{
fnstcw fcw
or fcw,0300h
fldcw fcw
}

Rounding error using the floor function in C++

I was asked what will be the output of the following code:
floor((0.7+0.6)*10);
It returns 12.
I know that the floating point representation does not allow to represent all numbers with infinite precision and that I should expect some discrepancies.
My questions are:
How should I know that this piece of code returns 12, not 13? Why is (0.7+0.6)*10 a bit less than 13, not a bit more?
When can I expect the floor function to work incorrectly and when it works correctly for sure?
Note: I'm not asking how floating representation looks like or why the output isn't exactly 13. I'd like to know how should I infer that (0.7+0.6)*10 is a bit less than 13.
How should I know that this piece of code returns 12, not 13? Why is (0.7+0.6)*10 a bit less than 13, not a bit more?
Assume that your compilation platform uses strictly the IEEE 754 standard formats and operations. Then, convert all the constants involved to binary, keeping 53 significant digits, and apply the basic operations, as defined in IEEE 754, by computing the mathematical result and rounding to 53 significant binary digits at each step. A computer does not need to be involved at any stage, but you can make your life easier by using C99's hexadecimal floating-point format for input and output.
When can I expect the floor function to work incorrectly and when it works correctly for sure?
floor() is exact for all positive arguments. It is working correctly in your example. The behavior that surprises you does not originate with floor and has nothing to do with floor. The surprising behavior starts with the fact that 6/10 and 7/10 are not representable exactly as binary floating-point values, and continues with the fact that since these values have long expansions, floating-point operations + and * can produce a slightly rounded result wrt the mathematical result you could expect from the arguments they are actually applied to. floor() is the only place in your code that does not involve approximation.
Example program to see what is happening:
#include <stdio.h>
#include <math.h>
int main(void) {
printf("%a\n%a\n%a\n%a\n%a\n",
0.7,
0.6,
0.7 + 0.6,
(0.7+0.6)*10,
floor((0.7+0.6)*10));
}
Result:
0x1.6666666666666p-1
0x1.3333333333333p-1
0x1.4ccccccccccccp+0
0x1.9ffffffffffffp+3
0x1.8p+3
IEEE 754 double-precision is really defined with respect to binary, but for conciseness the significand is written in hexadecimal. The exponent after p represents a power of two. For instance the last two results are both of the form <number roughly halfway between 1 and 2>*23.
0x1.8p+3 is 12. The next integer, 13, is 0x1.ap+3, but the computation does not quite reach that value, and so the behavior of floor() is to round down to 12.
How should I know that this piece of code returns 12, not 13?
You should know that it can and may be either 12 or 13. You can verify by testing on a given cpu.
You can not know what the value will be, in general, because the C++ standard does not specify the representation of floating point numbers. If you know the format on given architecture (let's say IEEE 754), then you can perform the calculation by hand, but that result would only apply to that particular representation.
Why is (0.7+0.6)*10 a bit less than 13, not a bit more?
It's an implementation detail and not useful knowledge to the programmer. All you need to know that it may be either. Relying on the knowledge that it's one or the other, would make you depend on the implementation detail.
When can I expect the floor function to work incorrectly and when it works correctly for sure?
It always works correctly, that is accroding to how it's specified to work.
Now, speaking of the value that you are expecting to see. If you know that your number is very close to an integer, but might be off a little bit due to representation error, you can add 0.5 before flooring.
double calculated_integer = (0.7+0.6)*10;
floor(calculated_integer + 0.5);
That way, you will always get the expected value, unless the error exceeds 0.5, which would be quite a big error.
If you don't know that the result should be an integer, then you simply have to accept the fact that floor and ceil operations increase the maximum error of your calculation to 1.0.
There are standard like the IEEE floating point standard which try to make floating point calculations at least a little bit predictive
by defining rules how operations like additions and rounding should be implemented.
To know the result, you need to compute the expression
according to the standard rules. Then you can be sure, that
it gives the same result on every machine, that implements the standard.
How should I know that this piece of code returns 12, not 13?
Since that depends on the numbers involved, by trying.
Why is (0.7+0.6)*10 a bit less than 13, not a bit more?
Well, because that's the result of the calculation.
When can I expect the floor function to work incorrectly and when it works correctly for sure?
Correctly for sure: on multiples of powers of two only, iff your floating point number is represented in binary.
To really take all the confusion out of this:
You cannot know the result without calculating it; it depends on both the machine/algorithmics involved and the numbers.
Very short answer: You can not. It depends on the platform and the float iso that is used on this platform.
In general, you can't. The fundamental problem is that the conversion from text representation to floating-point value is often not implemented as accurately as it could be. That's in part momentum, and in part because getting the floating-point value that's closest to the value expressed in text can be expensive, in some cases requiring large integer calculations. So conversions are often off by a few ULPs (i.e., low-end bits) from the ideal value, in ways that you can't predict a priori. So the question of what that code will produce is unanswerable. The question of what it should produce may be a bit more tractable, but it's still an exercise in time-wasting.

Is there any way to make sure the output of the float-point the same in different OS?

Here is my code:
int a = 0x451998a0;
float b = *((float *)&a);
printf("coverto float: %f, %.10lf\n", b, b);
In windows the output is:
coverto float: 2457.539063, 2457.5390625000
In linux the output is:
coverto float: 2457.539062, 2457.5390625000
Is there any way to make sure the output is the same?
The behavior you're seeing is just a consequence of the fact that Windows' printf() function is implemented differently from Linux's printf() function. Most likely the difference is in how printf() implements number rounding.
How printf() works under the hood in either system is an implementation detail; thus the system is not likely to provide such fine-grained control on how printf() displays the floating point values.
There are two ways that may work to keep them the same:
Use more precision during calculation than while displaying it. For example, some scientific and graphing calculators use double precision for all internal calculations, but display the results with only float precision.
Use a cross-platform printf() library. Such libraries would most likely have the same behavior on all platforms, as the calculations required to determine what digits to display are usually platform-agnostic.
However, this really isn't as big of a problem as you think it is. The difference between the outputs is 0.000001. That is a ~0.0000000004% difference from either the two values. The display error is really quite negligible.
Consider this: the distance between Los Angeles and New York is 2464 miles, which is of the same order of magnitude as the numbers in your display outputs. A difference of 0.000001 miles is 1.61 millimeters. We of course don't measure distances between cities with anywhere near that kind of precision. :-)
If you use the same printf() implementation, there's a good chance they'll show the same output. Depending on what you're up to, it may be easier to use GNU GCC on both OSes, or to get printf() source code and add it to your project (you should have no trouble googling one).
BTW - have you actually checked what that hex number encodes? Should it round up or down? The 625 thing is likely itself rounded, so you shouldn't assume it should round to 63....
The obvious answer is to use less precision in your output. In general,
if there's any calculation involved, you can't even be sure that the
actual floating point values are identical. And how printf and
ostream round is implementation defined, even if the floating point
values are equal.
In general, C++ doesn't guarantee that two implementations produce the
same results. In this particular case, if it's important, you can do
the rounding by hand, before doing the conversion, but you'll still have
occasional problems because the actual floating point values will be
different. This may, in fact, occur even with different levels of
optimization with the same compiler. So anything you try (other than
writing the entire program in assembler) is bound to be a loosing battle
in the end.

precision differences in matlab and c++

I am trying to make equivalence tests on an algorithm written in C++ and in Matlab.
The algorithm contains some kind of a loop in time and runs more than 1000 times. It has arithmetic operations and some math functions.
I feed the initial inputs to both platforms by hand (like a=1.767, b=6.65, ...) and when i check the hexadecimal representations of those inputs they are the same. So no problem for inputs. And get the outputs of c++ to matlab by a text file with 16 decimal digits. (i use "setprecision(32)" statement)
But here comes the problem; although after the 614'th step of both code, all the results are exactly the same, on the step of 615 I get a difference about 2.xxx..xxe-19? And after this step the error becomes larger and larger, and at the end of the runs it is about 5.xx..xxe-14.
0x3ff1 3e42 a211 6cca--->[C++ function]--->0x3ff4 7619 7005 5a42
0x3ff1 3e42 a211 6cca--->[MATLAB function]--->ans
ans - 0x3ff4 7619 7005 5a42
= 2.xxx..xxe-19
I searched how matlab behaves the numbers and found really interesting things like "denormalized mantissa". While realmin is about e-308, by denormalizing the mantissa matlab has the smallest real number about e-324. Further matlab holds many more digits for "pi" or "exp(1)" than that of c++.
On the other hand, matlab help says that whatever the format it displays, matlab uses the double precision internally.
So,I'd really appreciate if someone explains what the exact reason is for these differences? How can we make equivalence tests on matlab and c++?
There is one thing in x86 CPU about floating points numbers. Internally, the floating point unit uses registers that are 10 bytes, i.e. 80 bits. Furthermore, the CPU has a setting that tells whether the floating point calculations should be made with 32 bits (float), 64 bits (double) or 80 bits precision. Less precision meaning faster executed floating point operations. (The 32 bits mode used to be popular for video games, where speed takes over precision).
From this I remember I tracked a bug in a calculation library (dll) that given the same input did not gave the same result whether it was started from a test C++ executable, or from MatLab.. Furthermore, this did not happen in Debug mode, only in Release!
The final conclusion was that MatLab did set the CPU floating point precision to 80 bits, whereas our test executable did not (and leave the default 64 bits precision). Furthermore, this calculation mismatch did not happen Debug mode because all the variables were written to memory into 64 bits double variables, and reloaded from there afterward, nullifying the additional 16 bits. In Release mode, some variables were optimized out (not written to memory), and all calculations were done with floating point registers only, on 80 bits, keeping the additional 16 bits non-zero value.
Don't know if this helps, but maybe worth knowing.
A similar discussion occurred before, the conclusion was that IEEE 754 tolerates error in the last bit for transcendental functions (cos, sin, exp, etc..). So you can't expect exactly same results between MATLAB and C (not even same C code compiled in different compilers).
I may be way off track here and you may already have investigated this possibility but it could be possible that there are differences between C++ and Matlab in the way that the mathematical library functions (sin() cos() and exp() that you mention) are implemented internally. Ultimately, some kind of functional approximation must be being used to generate function values and if there is some difference between these methods then presumably it is possible that this manifests itself in the form of numerical rounding error over a large number of iterations.
This question basically covers what I am trying to suggest How does C compute sin() and other math functions?