Different optimization in VS2015 vs VS2013 causes floating point exception - c++

I have a small example of issue which came up during the transition from VS2013 to VS2015. In VS2015 further mentioned code example causes floating-point invalid operation.
int main()
{
unsigned int enableBits = _EM_OVERFLOW | _EM_ZERODIVIDE | _EM_INVALID;
_clearfp();
_controlfp_s(0, ~enableBits, enableBits);
int count = 100;
float array[100];
for (int i = 0; i < count; ++i)
{
array[i] = (float)pow((float)(count - 1 - i) / count, 4); //this causes exception in VS2015
}
return 0;
}
This happens only in release mode so its probably caused by different optimization. Is there something wrong with this code or is this a bug in VS 2015?
Its hard to find issues like these across the whole code base so I am looking for some systematic fix not a workaround (e.g. use different variable instead of i which works)
I also checked generated assembly code and it seems in VS2013 it uses whole 128bit registry to perform 4 float operations in one division. In VS2015 it seems to do only 2 float operations and the rest of registry is zero (or some garbage) which probably introduces this exception.
Instruction which causes exception is marked in picture.
VS2013
and VS2015
Any help will be appreciated.
Thanks.

This looks to be an interaction with you using floating point exceptions but also enabling some floating point optimizations.
What the code is doing is it does 2 iterations at once (loop unrolling) but uses divps which does 4 divides at once (from the 4 floats in an XMM register). The upper 2 floats in the XMM register are not used, and are zero. As the result of the divide of the values in those slots aren't used it doesn't normally matter. However, as you set custom exception handling this raises a invalid op exception that you see even though its generating values which wont be used.
Your choices are, as I see them, to set /fp:strict which will disable optimisations so make this work (but it will obviously make the code slower) or remove the controlfp call.

Related

Same program works on macos but fails on windows [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Ok, so I am working on an interface on qt, and I am using qtcreator as an IDE. The thing is that the algorithm works normally on mac, but on windows the same program gets an error.
the only difference is the compiler. The compiler I am using on windows is the visual c++, and on mac is clang (I think).
Is it possible that the same algorithm works on mac but doesn't on windows? If so, what problem could it be?
EDIT: I see that I got downvoted. I don't know exactly why. I know already what the error means, vector subscription out of range. The thing is that I don't want to waste time trying to find where the error is because it actually works fine on mac. Also the pc is better than the mac one.
EDIT 2: Indeed it looks like the same code works differently on windows than on mac. Tomorrow I will test it on mac to try to understand this, but the code that changes is this one:
vector<double> create_timeVector(double simulationTime, double step) {
vector<double> time;
time.push_back(0);
double i = 0;
do {
++i;
time.push_back(time[i-1] + step);
} while (time[i] < simulationTime);
return time;
}
The size of the vector that is returned is one size bigger on windows than on the mac. The thing is that I didn't make any changes on the code.
The probable reason why it works differently is that you're using a floating point calculation to determine when the loop is to stop (or keep going, depending on how you look at it).
time.push_back(time[i-1] + step);
} while (time[i] < simulationTime);
You have step as a double, simulationTime as a double, and a vector<double> called time being used. That is a recipe for loops running inconsistently across compilers, compiler optimizations, etc.
Floating point is not exact. The way to make the loops consistent is to not use any floating point calculations in the looping condition.
In other words, by hook or by crook, compute the number of calculations you need by using integer arithmetic. If you need to step 100 times, then it's 100, not a value computed from floating point arithmetic:
For example:
for (float i = 0.01F; i <= 1.0F; i+=0.01F)
{
// use i in some sort of calculation
}
The number of times that loop executes can be 99 times, or it can be 100 times. It depends on the compiler and any floating point optimizations that may apply. To fix this:
for (int i = 1; i <= 100; ++i )
{
float dI = static_cast<float>(i) / 100.0F;
// use dI instead of i some sort of calculation
}
As long as i isn't changed in the loop, the loop is guaranteed to always do 100 iterations, regardless of the hardware, compiler optimizations, etc.
See this: Any risk of using float variables as loop counters and their fractional increment/decrement for non "==" conditions?
vector subscript out of range means that you used [n] on a vector, and n was less than 0 or greater than or equal to the number of elements in the vector.
This causes undefined behaviour, so different compilers may react in different ways.
To get reliable behaviour in this case, one way is to use .at(n) instead of [] and make sure you catch exceptions. Another way is to check your index values before applying them, so that you never access out of bounds in the first place.

How can I find out what's changing the return address of a function in c++

I have a program that behaves weirdly and probably has undefined behaviour. Sometimes, the return address of a function seems to be changed, and I don't know what's causing it.
The return address is always changed to the same address, an assertion inside a function the control shouldn't be able to reach. I've been able to stop the program with a debugger to see that when it's supposed to execute a return statement, it jumps straight to the line with the assertion instead.
This code approximates how my function works.
int foo(Vector t)
double sum = 0;
for(unsgined int i=0; i<t.size();++i){
sum += t[i];
}
double limit = bar(); // bar returns a value between 0 and 1
double a=0;
for(double i=0; i<10; i++){
a += f(i)/sum; // f(1)/sum + ... + f(10)/sum = 1.0f
if(a>3)return a;
}
//shoudn'get here
assert(false); // ... then this line is executed
}
This is what I've tried so far:
Switching all std::vector [] operators with .at to prevent accidentily writing into memory
Made sure all return-by-value values are const.
Switched on -Wall and -Werror and -pedantic-errors in gcc
Ran the program with valgrind
I get a couple of invalid read of size 8, but they seem to originate from qt, so I'm not sure what to make of it. Could this be the problem?
The error happens only occasionally when I have run the program for a while and give it certain input values, and more often in a release build than in a debug build.
EDIT:
So I managed to reproduce the problem in a console application (no qt loaded) I then manages to simulate events that caused the problem.
Like some of you suggested, it turns out I misjudged what was actually causing it to reach the assertion, probably due to my lack of experience with qt's debugger. The actual problem was a floating point error in the double i used as a loop condition.
I was implementing softmax, but exp(x) got rounded to zero with particular inputs.
Now, as I have solved the problem, I might rephrase it. Is there a method for checking problems like rounding errors automatically. I.e breaking on 0/0 for instance?
The short answer is:
The most portable way of determining if a floating-point exceptional condition has occurred is to use the floating-point exception facilities provided by C in fenv.h.
although, unfortunately, this is far from being perfect.
I suggest you to read both
https://www.securecoding.cert.org/confluence/display/seccode/FLP04-C.+Check+floating-point+inputs+for+exceptional+values
and
https://www.securecoding.cert.org/confluence/display/seccode/FLP03-C.+Detect+and+handle+floating-point+errors
which concisely address the exact question you are posing:
Is there a method for checking problems like rounding errors automatically.

Visual Studio C++ 2008 / 2010 - break on float NaN

Is there any way to set up Visual Studio (just upgraded from 2008 to 2010) to break, as if an assertion failed, whenever any floating point number becomes NaN, QNAN, INF, etc?
Up until now I have just been using the assert(x == x) trick, but I would rather something implicit, so that I dont have to add assertions everywhere.
Quite surprised I can't find an answer to this via google. Some stuff about 'floating point exceptions', but I'm not sure if they are the same thing, and I've tried enabling them in Visual Studio, but the program doesn't break until something catastrophic happens because of the NaN later on in execution.
1) Go to project option and enable /fp:strict (C/C++ -> Code Generation -> Floating Pint Model).
2) Use _controlfp to set the floating-point control word (see code below).
#include <float.h>
unsigned int fp_control_state = _controlfp(_EM_INEXACT, _MCW_EM);
#include <math.h>
int main () {
sqrtf(-1.0); // floating point exception
double x = 0.0;
double y = 1.0/x; // floating point exception
return 0;
}
Try enabling fp exceptions
At least on x86, when you generate an NaN etc, one of the FPU status register bits is set. There's a way you can set so that it throws a H/W exception on the next subsequent FP operation occurs, but that's not quite as soon as you hoped for. I can't recall the reference though.
I am not sure if this is possible the way you want it, but You could create an macro which wraps the code in the marked line into an assert or which sets a breakpoint for this.
Hope this helps

Visual Studio C++ compiler optimizations breaking code?

I've a peculiar issue here, which is happening both with VS2005 and 2010. I have a for loop in which an inline function is called, in essence something like this (C++, for illustrative purposes only):
inline double f(int a)
{
if (a > 100)
{
// This is an error condition that shouldn't happen..
}
// Do something with a and return a double
}
And then the loop in another function:
for (int i = 0; i < 11; ++i)
{
double b = f(i * 10);
}
Now what happens is that in debug build everything works fine. In release build with all the optimizations turned on this is, according to disassembly, compiled so that i is used directly without the * 10 and the comparison a > 100 turns into a > 9, while I guess it should be a > 10. Do you have any leads as to what might make the compiler think that a > 9 is the correct way? Interestingly, even a minor change (a debug printout for example) in the surrounding code makes the compiler use i * 10 and compare that with the literal value of 100.
I know this is somewhat vague, but I'd be grateful for any old idea.
EDIT:
Here's a hopefully reproducable case. I don't consider it too big to be pasted here, so here goes:
__forceinline int get(int i)
{
if (i > 600)
__asm int 3;
return i * 2;
}
int main()
{
for (int i = 0; i < 38; ++i)
{
int j = (i < 4) ? 0 : get(i * 16);
}
return 0;
}
I tested this with VS2010 on my machine, and it seems to behave as badly as the original code I'm having problems with. I compiled and ran this with the IDE's default empty C++ project template, in release configuration. As you see, the break should never be hit (37 * 16 = 592). Note that removing the i < 4 makes this work, just like in the original code.
For anyone interested, it turned out to be a bug in the VS compiler. Confirmed by Microsoft and fixed in a service pack following the report.
First, it'd help if you could post enough code to allow us to reproduce the issue. Otherwise you're just asking for psychic debugging.
Second, it does occasionally happen that a compiler fails to generate valid code at the highest optimization levels, but more likely, you just have a bug somewhere in your code. If there is undefined behavior somewhere in your code, that means the assumptions made by the optimizer may not hold, and then the compiler can end up generating bad code.
But without seeing your actual code, I can't really get any more specific.
The only famous bug I know with optimization (and only with highest optimization level) are occasional modifications of the orders of priority of the operations (due to change of operations performed by the optimizer, looking for the fastest way to compute). You could look in this direction (and put some parenthesis even though they are not strictly speaking necessary, which is why more parenthesis is never bad), but frankly, those kind of bugs are quite rare.
As stated, it difficult to have any precise idea without more code.
Firstly, inline assembly prevents certain optimizations, you should use the __debugbreak() intrinsic for int3 breakpointing. The compiler sees the inline function having no effect other than a breakpoint, so it divides the 600 by 16(note: this is affected by integer truncation), thus it optimizes to debugbreak to trigger with 38 > i >= 37. So it seems to work on this end

strange results with /fp:fast

We have some code that looks like this:
inline int calc_something(double x) {
if (x > 0.0) {
// do something
return 1;
} else {
// do something else
return 0;
}
}
Unfortunately, when using the flag /fp:fast, we get calc_something(0)==1 so we are clearly taking the wrong code path. This only happens when we use the method at multiple points in our code with different parameters, so I think there is some fishy optimization going on here from the compiler (Microsoft Visual Studio 2008, SP1).
Also, the above problem goes away when we change the interface to
inline int calc_something(const double& x) {
But I have no idea why this fixes the strange behaviour. Can anyone explane this behaviour? If I cannot understand what's going on we will have to remove the /fp:fastswitch, but this would make our application quite a bit slower.
I'm not familiar enough with FPUs to comment with any certainty, but my guess would be that the compiler is letting an existing value that it thinks should be equal to x sit in on that comparison. Maybe you go y = x + 20.; y = y - 20; y is already on the FP stack, so rather than load x the compiler just compares against y. But due to rounding errors, y isn't quite 0.0 like it is supposed to be, and you get the odd results you see.
For a better explanation: Why is cos(x) != cos(y) even though x == y? from the C++FAQ lite. This is part of what I'm trying to get across, I just couldn't remember where exactly I had read it until just now.
Changing to a const reference fixes this because the compiler is worried about aliasing. It forces a load from x because it can't assume its value hasn't changed at some point after creating y, and since x is actually exactly 0.0 [which is representable in every floating point format I'm familiar with] the rounding errors vanish.
I'm pretty sure MS provides a pragma that allows you to set the FP flags on a per-function basis. Or you could move this routine to a separate file and give that file custom flags. Either way, it could prevent your whole program from suffering just to keep that one routine happy.
what are the results of calc_something(0L), or calc_something(0.0f) ? It could be linked to the size of the types before casting. An integer is 4 bytes, a double is 8.
Have you tried looking at the asembled code, to see how the aforementioned conversion is done ?
Googling for 'fp fast', I found this post [social.msdn.microsoft.com]
As I've said in other question, compilers suck at generating floating point code. The article Dennis links to explains the problems well. Here's another: An MSDN article.
If the performance of the code is important, you can easily1 out-perform the compiler by writing your own assembler code. If your algoritm is vectorisable then you can make use of SIMD too (with a slight loss of precision though).
Assuming you understand the way the FPU works.
inline int calc_something(double x) will (probably) use an 80 bits register. inline int calc_something(const double& x) would store the double in memory, where it takes 64 bits. That at least explains the difference between the two.
However, I find your test quite fishy to begin with. The results of calc_something are extremely sensitive to rounding of its input. Your FP algorithms should be robust to rounding. calc_something(1.0-(1.0/3.0)*3) should be the same as calc_something(0.0).
I think the behavior is correct.
You never compare a floating point number up to less than the holding type's precision.
Something that comes from zero may be equal, greater or less than another zero.
See http://floating-point-gui.de/