I know that I can not compare two floating point or double numbers for absolute equality on C++/C. If for some reason, I write a if condition which uses the absolute equality, is it guaranteed that the if condition will return the same result on different runs of the program for same data? Or it is purely non-deterministic and the result can vary?
For the same compiled binary and on the same PC, results should be the same. If you use another compiler or another PC, results may vary.
I once had a unit test that failed on a machine with an Intel CPU but worked OK on an AMD. There was probably some difference in rounding off, and the test moreorless hit the pass/fail criterion head on.
But I would not litter your code with overcomplex tests everywhere just because of that.
Related
I have a question about floating point addition. I understand how compilers and processor architecture can lead to floating point arithmetic values. I have seen many questions on here similar to my question, but they all have some variation such as different compiler, different code, different machine, etc. However, I'm am running into an issue when adding doubles in the exact same way in two different programs calling the identical function with the same arguments and it is leading to different results. Both programs are compiled on the same machine with the same compiler/tags. The code looks similar to this:
void function(double tx, double ty, double tz){
double answer;
double x,y;
x = y = answer = 0;
x = tx - ty;
y = ty - tz;
answer = (tx + ty + tz) * (x*y)
}
The values of:
tx,ty,tz
are on the order of [10e-15,10e-30]. Obviously this is a very simplified version of the functions I am actually using, but, is it possible for two programs, running identical floating point arithmetic (not just the same function, the exact same code), on the same machine, with the same compiler/tags, to get different results for the function?
Some possibilities:
The source code of function is identical in the two programs, but it appears with different context, resulting in the compiler compiling it in different ways. For example, the compiler might inline it in one place and not another, and inlining might lead to some expression reduction due to combination with other expressions at the point of the inlined call, and hence different arithmetic is performed. (To test this, move function to a separate source file, compile it separately, and link it with a linker without cross-module optimization. Also, try compiling with optimization disabled.)
You think there are identical inputs to function because they appear the same when printed or viewed in the debugger, but they are actually different due to small differences in the low digits that are not printed. (To test this, print the full values using the hexadecimal floating-point format. To do that, insert std::hexfloat into the output stream, followed by floating-point values. Alternately, use a C printf using the %a format.)
Something else in the programs changes floating-point state, such as rounding mode.
You think you have used an identical compiler, identical sources, identical compilation switches, and so on, but actually have not.
David Schwartz notes that floating-point values can change when they are stored, as occurs when they are simply spilled to the stack. This occurs because some processors and C++ implementations may store floating-point values with extended precision in registers but less precision in memory. Technically, this fits into either 1. (different computation nominally inside function) or 2. (different values passed to function), but it is insidious enough to warrant separate mention.
Well the answer is quite easy. If your computer behaves deterministic it will always return the same results for the same input. That's the basic idea behind programming languages so far. (Unless we are talking about quantum computers, of course.)
So the question reduces to whether you really have the same input.
Although the above function looks strictly functional, there are often hidden inputs that are not that obvious. E.g. you might adjust the rounding mode of your FPU before calling the function. Or you might setup different exception behavior. In both cases the function may behave differently for certain inputs.
So even if your computer isn't non-deterministic (i.e. buggy) the above function might return different results. Although it is not that likely.
I am starting with dsp programming right now and am writing my first low level classes and functions.
Since I want the functions to be fast (or at last not inefficient), I often wonder what I should use and what I should avoid in functions which get called per sample.
I know that the speed of an instruction varies quite a bit but I think that some of you at least can share a rule of thumb or just experience. :)
conditional statements
If I have to use conditions, switch should be faster than an if / else if block, right?
Are there differences between using two if-statements or an if-else? Somewhere I read that else should be avoided but I don't know why.
Also, compared to a multiplication, is there a rude estimation how much more time an if-block takes? Because in some cases, using multiplications by zero could be used instead of if-statements:
//something could be an int either 1 or 0:
if(something) {
signal += something_else;
}
// or:
signa+ += something*something_else;
functions and function-pointers
Instead of using conditional statements, you could use function-pointer. Instead of using conditions in every call, the pointer could be redirected to a specific function. However, for every call, the pointer had to be interpreted in order to call the right function. So I don't know if this would help or not.
What I also wonder is if calling functions have an impact. If so, boxing functions should be avoided, right?
variables
I would think that defining and using many variables in a function doesn't realy have an impact, at least relative to calculations. Is this true? If not, reusing declared variables would be better than more declaration.
calculations
Is there an order of calculation-types in term of the time they take to execute? I am sure that this highly depends on the context but a rule of thumb would be nice. I often read that people only count the multiplication in an algorithm. Is this because additions are realtively fast?
Does it make a difference between multiplication and division? (*0.5 or /2.0)
I hope that you can share soem experience.
Cheers
here are part of the answers:
calculations (talking about native precision of the processor for example 32bits):
Most DSP microprocessors have single cycle multipliers, that means a
multiply costs exactly the same as an addition in term of cycles.
and multiplication it generally faster then division.
conditional statements:
if/else - when looking in the assembly code you can see that the memory of the if condition is usually loaded by default, so when using if else make sure that the condition that will happen more frequently will be in the if.
but generally if possible you should avoid if/else in a loop to improve the pipe lining.
good luck.
DSP compilers are typically good at optimizing for loops that do not contain function-calls.
Therefore, try to inline every function that you call from within a time-critical for loop.
If your DSP is a fixed-point processor, then floating-point operations are implemented by SW.
This means that every such operation is essentially replaced by the compiler with a library function.
So you should basically avoid performing floating-point operations inside time-critical for loops.
The preprocessor should provide a special #pragma for the number of iterations of a for loop:
Minimum number of iterations
Maximum number of iterations
Multiplicity of the number of iterations
Use this #pragma where possible, in order to help the compiler to perform loop-unrolling where possible.
Finally, DSPs usually support a set of unique operations for enhanced performance.
As an example, consider _dotpu4 on Texas Instruments C64xx, which computes the scalar-product of two integers src1 and src2: For each pair of 8-bit values in src1 and src2, the 8-bit value from src1 is multiplied with the 8-bit value from src2, and the four products are summed together.
Check the data-sheet of your DSP, and see if you can make use of any of these operations.
The compiler should generate an intermediate file, which you can explore in order to analyze the expected performance of each of the optimized for loops in your code.
Based on that, you can try different assembly operations that might yield better results.
We have numerical code written in C++. Rarely but under certain specific inputs, some of the computations result in an 'nan' value.
Is there a standard or recommended method by which we can stop and alert the user when a certain numerical calculation results in an 'nan' being generated? (under debug mode).Checking for each result if it is equal to 'nan' seems impractical given the huge sizes of matrices and vectors.
How do standard numerical libraries handle this situation? Could you throw some light on this?
NaN is propagated, when applied to a numeric operation. So, it is enough to check the final result for being a NaN. As for, how to do it -- if building for >= C++11, there is std::isnan, as Goz noticed. For < C++11 - if want to be bulletproof - I would personally do bit-checking (especially, if there may be an optimization involved). The pattern for NaN is
? 11.......1 xx.......x
sign bit ^ ^exponent^ ^fraction^
where ? may be anything, and at least one x must be 1.
For platform dependent solution, there seams to be yet another possibility. There is the function feenableexcept in glibc (probably with the signal function and the compiler option -fnon-call-exceptions), which turns on a generation of the SIGFPE sinals, when an invalid floating point operation occure. And the function _control87 (probably with the _set_se_translator function and compiler option /EHa), which allows pretty much the same in VC.
Although this is a nonstandard extension originally from glibc, on many systems you can use the feenableexcept routine declared in <fenv.h> to request that the machine trap particular floating-point exceptions and deliver SIGFPE to your process. You can use fedisableexcept to mask trapping, and fegetexcept to query the set of exceptions that are unmasked. By default they are all masked.
On older BSD systems without these routines, you can use fpsetmask and fpgetmask from <ieeefp.h> instead, but the world seems to be converging on the glibc API.
Warning: glibc currently has a bug whereby (the C99 standard routine) fegetenv has the unintended side effect of masking all exception traps on x86, so you have to call fesetenv to restore them afterward. (Shows you how heavily anyone relies on this stuff...)
On many architectures, you can unmask the invalid exception, which will cause an interrupt when a NaN would ordinarily be generated by a computation such as 0*infinity. Running in the debugger, you will break on this interrupt and can examine the computation that led to that point. Outside of a debugger, you can install a trap handler to log information about the state of the computation that produced the invalid operation.
On x86, for example, you would clear the Invalid Operation Mask bit in FPCR (bit 0) and MXCSR (bit 7) to enable trapping for invalid operations from x87 and SSE operations, respectively.
Some individual platforms provide a means to write to these control registers from C, but there's no portable interface that works cross-platform.
Testing f!=f might give problems using g++ with -ffast-math optimization enabled: Checking if a double (or float) is NaN in C++
The only foolproof way is to check the bitpattern.
As to where to implement the checks, this is really dependent on the specifics of your calculation and how frequent Nan errors are i.e. performance penalty of continuing tainted calculations versus checking at certain stages.
Recently I changed some code
double d0, d1;
// ... assign things to d0/d1 ...
double result = f(d0, d1)
to
double d[2];
// ... assign things to d[0]/d[1]
double result = f(d[0], d[1]);
I did not change any of the assignments to d, nor the calculations in f, nor anything else apart from the fact that the doubles are now stored in a fixed-length array.
However when compiling in release mode, with optimizations on, result changed.
My question is, why, and what should I know about how I should store doubles? Is one way more efficient, or better, than the other? Are there memory alignment issues? I'm looking for any information that would help me understand what's going on.
EDIT: I will try to get some code demonstrating the problem, however this is quite hard as the process that these numbers go through is huge (a lot of maths, numerical solvers, etc.).
However there is no change when compiled in Debug. I will double check this again to make sure but this is almost certain, i.e. the double values are identical in Debug between version 1 and version 2.
Comparing Debug to Release, results have never ever been the same between the two compilation modes, for various optimization reasons.
You probably have a 'fast math' compiler switch turned on, or are doing something in the "assign things" (which we can't see) which allows the compiler to legally reorder calculations. Even though the sequences are equivalent, it's likely the optimizer is treating them differently, so you end up with slightly different code generation. If it's reordered, you end up with slight differences in the least significant bits. Such is life with floating point.
You can prevent this by not using 'fast math' (if that's turned on), or forcing ordering thru the way you construct the formulas and intermediate values. Even that's hard (impossible?) to guarantee. The question is really "Why is the compiler generating different code for arrays vs numbered variables?", but that's basically an analysis of the code generator.
no these are equivalent - you have something else wrong.
Check the /fp:precise flags (or equivalent) the processor floating point hardware can run in more accuracy or more speed mode - it may have a different default in an optimized build
With regard to floating-point semantics, these are equivalent. However, it is conceivable that the compiler might decide to generate slightly different code sequences for the two, and that could result in differences in the result.
Can you post a complete code example that illustrates the difference? Without that to go on, anything anyone posts as an answer is just speculation.
To your concerns: memory alignment cannot effect the value of a double, and a compiler should be able to generate equivalent code for either example, so you don't need to worry that you're doing something wrong (at least, not in the limited example you posted).
The first way is more efficient, in a very theoretical way. It gives the compiler slightly more leeway in assigning stack slots and registers. In the second example, the compiler has to pick 2 consecutive slots - except of course if the compiler is smart enough to realize that you'd never notice.
It's quite possible that the double[2] causes the array to be allocated as two adjacent stack slots where it wasn't before, and that in turn can cause code reordering to improve memory access efficiency. IEEE754 floating point math doesn't obey the regular math rules, i.e. a+b+c != c+b+a
For my ColorJizz library, I'm expecting slight errors when you do multiple conversions between different color formats. The errors are all very small (i.e. 0.0001 out).
What do you think I should do about these?
I feel like there are 2 real options:
Leave them as they are, with almost 30% of tests failing
Put some kind of 'error range' in my unit tests and pass them if they're within that range. But how do I judge what level or error I should have?
Here's an example of the kind of failures I'm getting:
http://www.mikeefranklin.co.uk/tests/test/
What would be the best solution?
It seems you are using floating point values, for which rounding errors are a fact of life. I recommend applying an error margin for comparison checks in your unit tests.
Leaving even some of your unit tests failing is not a realistic option - unit tests should pass 100% under normal circumstances. If you let some of them fail regularly, you won't easily notice when there is a new failure, signifying a real bug in your code.
The error range is the standard approach for floating point "equality" tests.
NUnit uses "within":
Assert.That( 2.1 + 1.2, Is.EqualTo( 3.3 ).Within( .0005 );
Ruby's test/unit uses assert_in_delta:
assert_in_delta 0.05, (50000.0 / 10**6), 0.00001
And most other test frameworks have something similar. Apparently qunit is one that does not have something similar, but it would be easy enough to modify the source to include one of your design.
As for the actual delta to use, it depends on your application. I would think that 0.01 would be actually pretty restrictive for humans to visually identify color differences, but it would be a fairly lax requirement mathematically.
Gallio/MbUnit has a dedicated assertion for that specific test case (Assert.AreApproximatelyEqual)
double a = 5;
double b = 4.999;
Assert.AreApproximatelyEqual(a, b, 0.001); // Pass!