c++ and matlab floating point computation - c++

Could someone tell me if c++ and matlab use the same floating point computation implementations? Will I get the same values in C++ as I would in Matlab?
Currently I have these discrepancies from translating my Matlab code into C++:
Matlab: R = 1.0000000001623, I = -3.07178893432791e-010, C = -3.79693498864242e-011
C++: R = 1.00000000340128 I = -3.96890964537988e-009 Z = 2.66864907949582e-009
If not what is the difference and where can I find more about floating point computation implementations?
Thanks!

Although it's not clear what your numbers actually are, the relative difference of the first (and largest) numbers is about 1e-8, which is the relative tolerance of many double precision algorithms.
Floating point numbers are only an approximation of the real number system, and their finite size (64 bits for double precision) limits their precision. Because of this finite precision, operations that involve floating point numbers can incur round-off error, and are thus not strictly associative. What this means is that A+(B+C) != (A+B)+C. The difference between the two is usually small, depending on their relative sizes, but it's not always zero.
What this means is that you should expect small differences in the relative and absolute values when you compare an algorithm coded in Matlab to one in C++. The difference may be in the libraries (i.e., there's no guarantee that Matlab uses the system math library for routines like sqrt), or it may just be that your C++ and Matlab implementations order their operations differently.
The section on floating point comparison tests in Boost::Test discusses this a bit, and has some good references. In particular, you should probably read What Every Computer Scientist Should Know About Floating-Point Arithmetic and consider picking up a copy of Knuth's TAOCP Vol. II.

Matlab by default uses a double floating point precision, a C float uses a single floating point precision.
The representation for floating points is the same between the two, and is either processor or I believe a standard. But as mentioned, floating points are extremely fickle, you always have to allow for some tolerance. If you do a complex operation, such as the one below, you will frequently get a non-zero number, despite algebra telling you otherwise. The stuff under the hood between how operations are done with matlab and c will allow for some differences. Just make sure they are close.
((3*pi+2)*5-9)/2-7.5*pi-3

Related

Are floating point errors deterministic?

One of the big got'chas of floating point numbers is that some of them cannot be exactly represented in binary. This can make them difficult to work with. However what I'm curious about is whether or not subtle or not-so-subtle errors in floating point are deterministic. Can somebody predict them for example? Here's one example of a random number generator that could take advantage of floating point errors:
#include <cmath>
float constant = M_PI;
float generate()
{
static float state = 1;
state = state * constant;
return state;
}
One would have to know the implementation, the hardware, the compiler settings and so on, which makes it quite difficult to predict what the results would be. Or is my thinking flawed?
Floating point "errors" are deterministic. There is a 1:1 mapping between input and output values for a given operation. Your example will produce the same output sequence every time.
That said, there could be a floating-point implementation or ten out there that will produce different sequences, but this is not something you can consider "random" (i.e. a source of entropy).
Every floating point representation defines the composition of a floating point variable (which part is the mantissa, which part is the exponent, which part is the sign, etc) and the behaviour of every operation.
In any implementation you might choose, it is therefore possible to predict the result of every floating point operation, if you know its operand (or operands) That characteristic is the definition of determinism.
So, yes, floating point operations are deterministic.
Different implementations (compilers, host systems, etc) do support different floating point representations. So there is some variation of results between implementations. However, it is still possible to predict the result of any floating point operation, if you know how floating point variables are represented, and how operations work.
The fact not everyone knows enough about floating point types and operations on them does not make them non-deterministic. Nor does the fact that not everyone can describe the complete set of operations in a complex algorithm. The knowledge is readily available and, with enough effort, understandable well enough so effects of all operations on all possible operands can be reliably predicted before doing the operation.
There are buggy implementations of floating point out there, which do not comply with their own documentation. For example, look up the pentium FDIV bug - where some early pentium CPUs implemented floating point division incorrectly. Even those turned out to be deterministic, once it was understood what the operations actually do.

IEEE-754 floating point: Divide first or multiply first for best precision?

What's better if I want to preserve as much precision as possible in a calculation with IEEE-754 floating point values:
a = b * c / d
or
a = b / d * c
Is there a difference? If there is, does it depend on the magnitudes of the input values? And, if magnitude matters, how is the best ordering determined when general magnitudes of the values are known?
It depends on the magnitude of the values. Obviously if one divides by zero, all bets are off, but if a multiplication or division results in a denormal subsequent operations can lose precision.
You may find it useful to study Goldberg's seminal paper What Every Computer Scientist Should Know About Floating-Point Arithmetic which will explain things far better than any answer you're likely to receive here. (Goldberg was one of the original authors of IEEE-754.)
Assuming that none of the operations would yield an overflow or an underflow, and your input values have uniformly distributed significands, then this is equivalent. Well, I suppose that to have a rigorous proof, one should do an exhaustive test (probably not possible in practice for double precision since there are 2^156 inputs), but if there is a difference in the average error, then it is tiny. I could try in low precisions with Sipe.
In any case, in the absence of overflow/underflow, only the exact values of the significands matter, not the exponents.
However if the result a is added to (or subtracted from) another expression and not reused, then starting with the division may be more interesting since you can group the multiplication with the following addition by using a FMA (thus with a single rounding).

Precision of floating point operations

Floating point operations are typically approximations to the corresponding arithmetic operations, because in many cases the precise arithmetic result cannot be represented by the internal number format. But what happens if I have a program where all numbers can actually be represented exactly by IEEE 754 single precision? Consider for example this:
float a = 7;
float b = a / 32.0;
float c = a + b;
float d = c - a;
assert( d == b );
Is it safe to assume that within this and similar cases, the result of the numerical calculation is identical to the exact arithmetic result? I think this sort of code would work on every actual machine. Is this correct?
Edit This question is not about what the standard says, but rather about the real world. I'm well aware that in theory one could create a conforming engine where this would fail. I don't care, but rather I wonder if this works on existing machines.
No as the c++ standard does not require floating point values to be stored in the IEEE 754 format.
"Real" machines are quite careful in implementing the standard exactly (just remember the Pentium division bug). But be careful to check, the i386 line of machines did use some extra bits in its registers, which were cut off when asigning to/from memory, so computations done only in registers gave different results than if some intermediate results where spilled to memory.
Also check out David Goldberg's What every computer scientist should know about floating point arithmetic.
In any case, it doesn't make any sense to "round" (or otherwise mangle) a number that can be represented exactly.

cpp division - how to get most accurate outcome?

I want to divide two ull variables and get the most accurate outcome.
what is the best way to do that?
i.e. 5000034 / 5000000 = 1.0000068
If you want "most accurate precision" - you should avoid floating point arithmetics.
You might want to use some big decimal library [whcih usually implements fixed point arithmetic], and will allow you to define the precision you are seeking.
You should avoid floating point arithmetic because thet are not exact [you have finite number of bits to represent infinite number of numbers in every range, so some slicing must occure...]. Fixed point arithmetic [as usually implemented in big decimal libraries] allows you to allocate more bits "on the fly" to represent the number in the desired accuracy.
More info on the floating point issue can be found in this [a bit advanced] article: What Every Computer Scientist Should Know About Floating-Point Arithmetic
Instead of (double)(N) / D, do 1 + ( (double)(N - D) / D)
I'm afraid that “the most accurate outcome” doesn't mean
much. No finite representation can represent all real numbers exactly;
how precise the representation can be depends on the size of the type
and its internal representation. On most implementations, double will
give about 17 decimal digits precision, which is usually several orders
more precise than the input; for a single multiplicatio or division,
double is usually fine. (Problems occur with addition and subtraction
when the difference between the two values is extreme.) There exist
packages which offer larger precision (BigDecimal, BigFloat and the
like), but they are never exact: in the end, the precision is limited by
the amount of memory you're willing to let them use. They're also much
slower than double, and generally (slightly) more difficult to use
correctly (since they have more options, e.g. just how much precision do
you want). The only real answer to your question is another question:
how much precision do you need? And for what sequence of operations?
Rounding errors accumulate, so while double may be largely sufficient
for a single division, it may cause problems if used naïvely for
iterative procedures. Although in such cases, the solution isn't
usually to increase the precision, but to change the algorithm in a way
to avoid the problems. If double gives you the precision you need,
use it in preference to any extended type. If it doesn't, and you don't
have a choice, then choose one of the existing arbitrary precision
libraries, such as GMP.
(You might also have an issue with the way rounding is handled. For
bookkeeping purposes, for example, most jurisdictions have very strict
laws concerning how to round monitary values, and their rules are based
on decimal arithmetic. In such cases, you'll need a numeric type which
does decimal arithmetic in order for the rounding to conform in all
cases.)
Floating point numbers are probably most accurate for multiplication and division, while integers and fixed point numbers are the best choice for addition and subtraction. This follows from the fact that multiplication and division changes the order of magnitude which floating point numbers handle better, while addition and subtraction is some kind of step, which integers and fixed point numbers handle better.
If you want the best accuracy when dividing integers, implement a RationalNumber class containing the numerator and denominator. This way your reslut will always be exact if you avoid arithmetic overflow. This requires that you accept output in fractional form.

Preventing Rounding Errors

I was just reading about rounding errors in C++. So, if I'm making a math intense program (or any important calculations) should I just drop floats all together and use only doubles or is there an easier way to prevent rounding errors?
Obligatory lecture: What Every Programmer Should Know About Floating-Point Arithmetic.
Also, try reading IEEE Floating Point standard.
You'll always get rounding errors. Unless you use an infinite arbitrary precision library, like gmplib. You have to decide if your application really needs this kind of effort.
Or, you could use integer arithmetic, converting to floats only when needed. This is still hard to do, you have to decide if it's worth it.
Lastly, you can use float or double taking care not to make assumption about values at the limit of representation's precision. I'd wish this Valgrind plugin was implemented (grep for float)...
The rounding errors are normally very insignificant, even using floats. Mathematically-intense programs like games, which do very large numbers of floating-point computations, often still use single-precision.
This might work if your highest number is less than 10 billion and you're using C++ double precision.
if ( ceil(10000*(x + 0.00001)) > ceil(100000*(x - 0.00001))) {
x = ceil(10000*(x + 0.00004)) / 10000;
}
This should allow at least the last digit to be off +/- 9. I'm assuming dividing by 1000 will always just move a decimal place. If not, then maybe it could be done in binary.
You would have to apply it after every operation that is not +, -, *, or a comparison. For example, you can't do two divisions in the same formula because you'd have to apply it to each division.
If that doesn't work, you could work in integers by scaling the numbers up and always use integer division. If you need advanced functions maybe there is a package that does deterministic integer math. Integer division is required in a lot of financial settings because of round off error being subject to exploit like in the movie "The Office".