strange results with /fp:fast - c++

We have some code that looks like this:
inline int calc_something(double x) {
if (x > 0.0) {
// do something
return 1;
} else {
// do something else
return 0;
}
}
Unfortunately, when using the flag /fp:fast, we get calc_something(0)==1 so we are clearly taking the wrong code path. This only happens when we use the method at multiple points in our code with different parameters, so I think there is some fishy optimization going on here from the compiler (Microsoft Visual Studio 2008, SP1).
Also, the above problem goes away when we change the interface to
inline int calc_something(const double& x) {
But I have no idea why this fixes the strange behaviour. Can anyone explane this behaviour? If I cannot understand what's going on we will have to remove the /fp:fastswitch, but this would make our application quite a bit slower.

I'm not familiar enough with FPUs to comment with any certainty, but my guess would be that the compiler is letting an existing value that it thinks should be equal to x sit in on that comparison. Maybe you go y = x + 20.; y = y - 20; y is already on the FP stack, so rather than load x the compiler just compares against y. But due to rounding errors, y isn't quite 0.0 like it is supposed to be, and you get the odd results you see.
For a better explanation: Why is cos(x) != cos(y) even though x == y? from the C++FAQ lite. This is part of what I'm trying to get across, I just couldn't remember where exactly I had read it until just now.
Changing to a const reference fixes this because the compiler is worried about aliasing. It forces a load from x because it can't assume its value hasn't changed at some point after creating y, and since x is actually exactly 0.0 [which is representable in every floating point format I'm familiar with] the rounding errors vanish.
I'm pretty sure MS provides a pragma that allows you to set the FP flags on a per-function basis. Or you could move this routine to a separate file and give that file custom flags. Either way, it could prevent your whole program from suffering just to keep that one routine happy.

what are the results of calc_something(0L), or calc_something(0.0f) ? It could be linked to the size of the types before casting. An integer is 4 bytes, a double is 8.
Have you tried looking at the asembled code, to see how the aforementioned conversion is done ?
Googling for 'fp fast', I found this post [social.msdn.microsoft.com]

As I've said in other question, compilers suck at generating floating point code. The article Dennis links to explains the problems well. Here's another: An MSDN article.
If the performance of the code is important, you can easily1 out-perform the compiler by writing your own assembler code. If your algoritm is vectorisable then you can make use of SIMD too (with a slight loss of precision though).
Assuming you understand the way the FPU works.

inline int calc_something(double x) will (probably) use an 80 bits register. inline int calc_something(const double& x) would store the double in memory, where it takes 64 bits. That at least explains the difference between the two.
However, I find your test quite fishy to begin with. The results of calc_something are extremely sensitive to rounding of its input. Your FP algorithms should be robust to rounding. calc_something(1.0-(1.0/3.0)*3) should be the same as calc_something(0.0).

I think the behavior is correct.
You never compare a floating point number up to less than the holding type's precision.
Something that comes from zero may be equal, greater or less than another zero.
See http://floating-point-gui.de/

Related

Adding double precision values yield different results between separate programs in C++

I have a question about floating point addition. I understand how compilers and processor architecture can lead to floating point arithmetic values. I have seen many questions on here similar to my question, but they all have some variation such as different compiler, different code, different machine, etc. However, I'm am running into an issue when adding doubles in the exact same way in two different programs calling the identical function with the same arguments and it is leading to different results. Both programs are compiled on the same machine with the same compiler/tags. The code looks similar to this:
void function(double tx, double ty, double tz){
double answer;
double x,y;
x = y = answer = 0;
x = tx - ty;
y = ty - tz;
answer = (tx + ty + tz) * (x*y)
}
The values of:
tx,ty,tz
are on the order of [10e-15,10e-30]. Obviously this is a very simplified version of the functions I am actually using, but, is it possible for two programs, running identical floating point arithmetic (not just the same function, the exact same code), on the same machine, with the same compiler/tags, to get different results for the function?
Some possibilities:
The source code of function is identical in the two programs, but it appears with different context, resulting in the compiler compiling it in different ways. For example, the compiler might inline it in one place and not another, and inlining might lead to some expression reduction due to combination with other expressions at the point of the inlined call, and hence different arithmetic is performed. (To test this, move function to a separate source file, compile it separately, and link it with a linker without cross-module optimization. Also, try compiling with optimization disabled.)
You think there are identical inputs to function because they appear the same when printed or viewed in the debugger, but they are actually different due to small differences in the low digits that are not printed. (To test this, print the full values using the hexadecimal floating-point format. To do that, insert std::hexfloat into the output stream, followed by floating-point values. Alternately, use a C printf using the %a format.)
Something else in the programs changes floating-point state, such as rounding mode.
You think you have used an identical compiler, identical sources, identical compilation switches, and so on, but actually have not.
David Schwartz notes that floating-point values can change when they are stored, as occurs when they are simply spilled to the stack. This occurs because some processors and C++ implementations may store floating-point values with extended precision in registers but less precision in memory. Technically, this fits into either 1. (different computation nominally inside function) or 2. (different values passed to function), but it is insidious enough to warrant separate mention.
Well the answer is quite easy. If your computer behaves deterministic it will always return the same results for the same input. That's the basic idea behind programming languages so far. (Unless we are talking about quantum computers, of course.)
So the question reduces to whether you really have the same input.
Although the above function looks strictly functional, there are often hidden inputs that are not that obvious. E.g. you might adjust the rounding mode of your FPU before calling the function. Or you might setup different exception behavior. In both cases the function may behave differently for certain inputs.
So even if your computer isn't non-deterministic (i.e. buggy) the above function might return different results. Although it is not that likely.

Set all floating point literals to floats MSVC++

I am writing some numeric code in C++ and I want to be able to swap between using double and float. I have therefore added a #define MYFLT which I can make either a float or a double as needed. However, how do I deal with the various numeric literals.
For example
MYFLT someNumber = 1.2;
MYFLT someOtherNumber = 1.5f;
gives compiler warnings for the first line when MYFLT is a float and for the second line when MYFLT is a double. I know this is a trivial example, but there are other cases where I have longer expresions with literals in and floats can end up being converted to doubles then the result back to floats which I think is costing me significant performance. How should I deal with this?
I could do things like
MYFLT someNumber = MYFLT(1.2);
MYFLT someOtherNumber = MYFLT(1.5);
but this is quite tedious. I'm assuming that in that if I do this the compiler is clever enough to just use a float when needed (can anyone confirm that?). What would be better would be if there was a MSVC++ compiler switch or #define that will tell the compiler to treat all floating point literals as floats instead of doubles. Does such a switch exist?
Even when I wrap all my literals as above my code runs 50% slower when I use float rather than double. I was expecting a performance boost through simd type operations, not a penalty!
Phil
What you'd want is #define MYFLTCONST(x) x##f or #define MYFLTCONST(x) x depending on whether you want a f suffix for float appended.
This is a (not quite complete) answer to my own question.
I found that a small function that was called many times (a fast approximation to sin) didn't have its literals cast as MYFLT. The extra computational hit of this also meant that the compiler wasn't inlining it. This function accounted for most of the difference. Some further profiling seemed to indicate that accessing std::vector<float> was slower than std::vector<double> ( I am using [] to do the access if it matters ). Replacing std::vectors with raw fixed sized arrays sped up the double implementation a little and closed the gap significantly for the float implementation. The float version is now only about 10% slower than the double version. But definitely no speed increase due to either RAM access nor vectorization. I guess I need to think more carefully about my loops to get any benefit there.
I guess the conclusion here (yet again) is that the compiler is pretty good at optimising code - it's much better to work with it and do careful profiling than it is to try and do your own blind "optimisations" which might actually have negative effects, like stopping the compiler performing good inlining.

What is floating point speculation and how does it differ from the compiler's floating point model

The Intel C++ compiler provides two options for controlling floating point:
-fp-speculation (fast/safe/strict/off)
-fp-model (precise/fast/strict and source/double/extended)
I think I understand what fp-model does. But what is fp-speculation and how does it relate to fp-model? I have yet to find any intel doc which explains this!
-fp-model influences how floating-point computations are carried out, and can change the numeric result (by licensing unsafe optimizations or by changing the precision at which intermediate results are evaluated).
-fp-speculation does not change the numerical results, but can effect what floating-point flags are raised by an operation (or what traps are taken if floating-point traps are enabled). 99.99% of programmers don't need care about these things, so you can probably run with the default and not worry about it.
Here's a concrete example; suppose you have the following function:
double foo(double x) {
// lots of computation
if (x >= 0) return sqrt(x);
else return x;
}
sqrt is, relatively speaking, slow. It would be nice to hoist the computation of sqrt(x) like this:
double foo(double x) {
const double sqrtx = sqrt(x);
// lots of computation
if (x >= 0) return sqrtx;
else return x;
}
By doing this, we allow the computation of sqrt to proceed simultaneously with other computations, reducing the latency of our function. However, there's a problem; if x is negative, then sqrt(x) raises the invalid flag. In the original program, this could never happen, because sqrt(x) was only computed if x was non-negative. In the modified program, sqrt(x) is computed unconditionally. Thus, if x is negative, the modified program raises the invalid flag, whereas the original program did not.
The -fp-speculation flag gives you a way to tell the compiler whether or not you care about these cases, so it knows whether or not it is licensed to make such transformations.
Out of order execution and speculative execution can result in extraneous exceptions or raise exceptions at the wrong time.
If that matters to you, you can use the fp-speculation option to control speculation of floating-point instructions.
For (a little bit) more information: http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/fortran/lin/compiler_f/copts/common_options/option_fp_speculation.htm
On Windows OS:
1.Intel compiler floating calculation 32 bit application vs 64 bit application , same code Can give to you different result!!!! No matter what flag you choose:)!!!!
2.Visual studio compiler floating calculation 32 bit vs 64 bit application , same code output same result.

How to store doubles in memory

Recently I changed some code
double d0, d1;
// ... assign things to d0/d1 ...
double result = f(d0, d1)
to
double d[2];
// ... assign things to d[0]/d[1]
double result = f(d[0], d[1]);
I did not change any of the assignments to d, nor the calculations in f, nor anything else apart from the fact that the doubles are now stored in a fixed-length array.
However when compiling in release mode, with optimizations on, result changed.
My question is, why, and what should I know about how I should store doubles? Is one way more efficient, or better, than the other? Are there memory alignment issues? I'm looking for any information that would help me understand what's going on.
EDIT: I will try to get some code demonstrating the problem, however this is quite hard as the process that these numbers go through is huge (a lot of maths, numerical solvers, etc.).
However there is no change when compiled in Debug. I will double check this again to make sure but this is almost certain, i.e. the double values are identical in Debug between version 1 and version 2.
Comparing Debug to Release, results have never ever been the same between the two compilation modes, for various optimization reasons.
You probably have a 'fast math' compiler switch turned on, or are doing something in the "assign things" (which we can't see) which allows the compiler to legally reorder calculations. Even though the sequences are equivalent, it's likely the optimizer is treating them differently, so you end up with slightly different code generation. If it's reordered, you end up with slight differences in the least significant bits. Such is life with floating point.
You can prevent this by not using 'fast math' (if that's turned on), or forcing ordering thru the way you construct the formulas and intermediate values. Even that's hard (impossible?) to guarantee. The question is really "Why is the compiler generating different code for arrays vs numbered variables?", but that's basically an analysis of the code generator.
no these are equivalent - you have something else wrong.
Check the /fp:precise flags (or equivalent) the processor floating point hardware can run in more accuracy or more speed mode - it may have a different default in an optimized build
With regard to floating-point semantics, these are equivalent. However, it is conceivable that the compiler might decide to generate slightly different code sequences for the two, and that could result in differences in the result.
Can you post a complete code example that illustrates the difference? Without that to go on, anything anyone posts as an answer is just speculation.
To your concerns: memory alignment cannot effect the value of a double, and a compiler should be able to generate equivalent code for either example, so you don't need to worry that you're doing something wrong (at least, not in the limited example you posted).
The first way is more efficient, in a very theoretical way. It gives the compiler slightly more leeway in assigning stack slots and registers. In the second example, the compiler has to pick 2 consecutive slots - except of course if the compiler is smart enough to realize that you'd never notice.
It's quite possible that the double[2] causes the array to be allocated as two adjacent stack slots where it wasn't before, and that in turn can cause code reordering to improve memory access efficiency. IEEE754 floating point math doesn't obey the regular math rules, i.e. a+b+c != c+b+a

Can cout alter variables somehow?

So I have a function that looks something like this:
float function(){
float x = SomeValue;
return x / SomeOtherValue;
}
At some point, this function overflows and returns a really large negative value. To try and track down exactly where this was happening, I added a cout statement so that the function looked like this:
float function(){
float x = SomeValue;
cout << x;
return x / SomeOtherValue;
}
and it worked! Of course, I solved the problem altogether by using a double. But I'm curious as to why the function worked properly when I couted it. Is this typical, or could there be a bug somewhere else that I'm missing?
(If it's any help, the value stored in the float is just an integer value, and not a particularly big one. I just put it in a float to avoid casting.)
Welcome to the wonderful world of floating point. The answer you get will likely depend on the floating point model you compiled the code with.
This happens because of the difference between the IEEE spec and the hardware the code is running on. Your CPU likely has 80 bit floating point registers that get use to hold the 32-bit float value. This means that there is far more precision while the value stays in a register than when it is forced to a memory address (also known as 'homing' the register).
When you passed the value to cout the compiler had to write the floating point to memory, and this results in a lost of precision and interesting behaviour WRT overflow cases.
See the MSDN documentation on VC++ floating point switches. You could try compiling with /fp:strict and seeing what happens.
Printing a value to cout should not change the value of the paramter in any way at all.
However, I have seen similar behaviour, adding debugging statements causes a change in the value. In those cases, and probably this one as well my guess was that the additional statements were causing the compiler's optimizer to behave differently, so generate different code for your function.
Adding the cout statement means that the vaue of x is used directly. Without it the optimizer could remove the variable, so changing the order of the calculation and therefore changing the answer.
As an aside, it's always a good idea to declare immutable variables using const:
float function(){
const float x = SomeValue;
cout << x;
return x / SomeOtherValue;
}
Among other things this will prevent you from unintentionally passing your variables to functions that may modify them via non-const references.
cout causes a reference to the variable, which often will cause the compiler to force it to spill it to the stack.
Because it is a float, this likely causes its value to be truncated from the double or long double representation it would normally have.
Calling any function (non-inlined) that takes a pointer or reference to x should end up causing the same behavior, but if the compiler later gets smarter and learns to inline it, you'll be equally screwed :)
I dont think the cout has any effect on the variable, the problem would have to be somewhere else.