Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Ok, so I am working on an interface on qt, and I am using qtcreator as an IDE. The thing is that the algorithm works normally on mac, but on windows the same program gets an error.
the only difference is the compiler. The compiler I am using on windows is the visual c++, and on mac is clang (I think).
Is it possible that the same algorithm works on mac but doesn't on windows? If so, what problem could it be?
EDIT: I see that I got downvoted. I don't know exactly why. I know already what the error means, vector subscription out of range. The thing is that I don't want to waste time trying to find where the error is because it actually works fine on mac. Also the pc is better than the mac one.
EDIT 2: Indeed it looks like the same code works differently on windows than on mac. Tomorrow I will test it on mac to try to understand this, but the code that changes is this one:
vector<double> create_timeVector(double simulationTime, double step) {
vector<double> time;
time.push_back(0);
double i = 0;
do {
++i;
time.push_back(time[i-1] + step);
} while (time[i] < simulationTime);
return time;
}
The size of the vector that is returned is one size bigger on windows than on the mac. The thing is that I didn't make any changes on the code.
The probable reason why it works differently is that you're using a floating point calculation to determine when the loop is to stop (or keep going, depending on how you look at it).
time.push_back(time[i-1] + step);
} while (time[i] < simulationTime);
You have step as a double, simulationTime as a double, and a vector<double> called time being used. That is a recipe for loops running inconsistently across compilers, compiler optimizations, etc.
Floating point is not exact. The way to make the loops consistent is to not use any floating point calculations in the looping condition.
In other words, by hook or by crook, compute the number of calculations you need by using integer arithmetic. If you need to step 100 times, then it's 100, not a value computed from floating point arithmetic:
For example:
for (float i = 0.01F; i <= 1.0F; i+=0.01F)
{
// use i in some sort of calculation
}
The number of times that loop executes can be 99 times, or it can be 100 times. It depends on the compiler and any floating point optimizations that may apply. To fix this:
for (int i = 1; i <= 100; ++i )
{
float dI = static_cast<float>(i) / 100.0F;
// use dI instead of i some sort of calculation
}
As long as i isn't changed in the loop, the loop is guaranteed to always do 100 iterations, regardless of the hardware, compiler optimizations, etc.
See this: Any risk of using float variables as loop counters and their fractional increment/decrement for non "==" conditions?
vector subscript out of range means that you used [n] on a vector, and n was less than 0 or greater than or equal to the number of elements in the vector.
This causes undefined behaviour, so different compilers may react in different ways.
To get reliable behaviour in this case, one way is to use .at(n) instead of [] and make sure you catch exceptions. Another way is to check your index values before applying them, so that you never access out of bounds in the first place.
Related
I have a small example of issue which came up during the transition from VS2013 to VS2015. In VS2015 further mentioned code example causes floating-point invalid operation.
int main()
{
unsigned int enableBits = _EM_OVERFLOW | _EM_ZERODIVIDE | _EM_INVALID;
_clearfp();
_controlfp_s(0, ~enableBits, enableBits);
int count = 100;
float array[100];
for (int i = 0; i < count; ++i)
{
array[i] = (float)pow((float)(count - 1 - i) / count, 4); //this causes exception in VS2015
}
return 0;
}
This happens only in release mode so its probably caused by different optimization. Is there something wrong with this code or is this a bug in VS 2015?
Its hard to find issues like these across the whole code base so I am looking for some systematic fix not a workaround (e.g. use different variable instead of i which works)
I also checked generated assembly code and it seems in VS2013 it uses whole 128bit registry to perform 4 float operations in one division. In VS2015 it seems to do only 2 float operations and the rest of registry is zero (or some garbage) which probably introduces this exception.
Instruction which causes exception is marked in picture.
VS2013
and VS2015
Any help will be appreciated.
Thanks.
This looks to be an interaction with you using floating point exceptions but also enabling some floating point optimizations.
What the code is doing is it does 2 iterations at once (loop unrolling) but uses divps which does 4 divides at once (from the 4 floats in an XMM register). The upper 2 floats in the XMM register are not used, and are zero. As the result of the divide of the values in those slots aren't used it doesn't normally matter. However, as you set custom exception handling this raises a invalid op exception that you see even though its generating values which wont be used.
Your choices are, as I see them, to set /fp:strict which will disable optimisations so make this work (but it will obviously make the code slower) or remove the controlfp call.
I am implementing a genetic algorithm to numerically estimate some coefficients in a system of ODEs based on experimental data. I am just learning Fortran along as I implement the algorithms. My config is a Intel Fortran 2015 running on Windows 7/Visual Studio 2013, on an old i7 processor.
I have the following piece among a multitude of lines of code:
DO i = 1, N_CROMOSOMES
IF (population(9,i) < 0.0_DOUBLE) population(9, i) = square_error(population(1:8, i))
END DO
Where I just defined DOUBLE to be:
INTEGER, PARAMETER :: DOUBLE = 16
N_CROMOSOMES is an INTEGER argument to the function, that defines the size of the array population, which in turn is a (9 x N_CROMOSOMES) array of type REAL(KIND=DOUBLE). For each column on this array, its first 8 elements represent the 8 coefficients that I am estimating, and the ninth element is the error associated with that particular 8 guesses for the coefficients. square_error is the function that determines it.
In this point of the program, I have marked columns that were just created or that were altered as having an error of -1. Hence, the "IF (population(9,i)<0.0_DOUBLE)": I am checking the columns of the array whose error is -1 in order to compute their error.
The thing is, I just rewrote most of my code, and spent the past few days correcting mysterious bugs. Before this, the code worked just fine with a FORALL instead of DO. Now it says gives an error "stack overflow" when I use FORALL, but works with DO. But it takes a lot more time to do its job.
Does anyone knows the cause of this, and also, how to solve it? It is clear to me that my code can highly benefit from paralellization, but I am not so sure how to do it.
Thanks for your time.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
If I have a number a, would it be slower to add 1 to it b times rather than simply adding a + b?
a += b;
or
for (int i = 0; i < b; i++) {
a += 1;
}
I realize that the second example seems kind of silly, but I have a situation where coding would actually be easier that way, and I am wondering if that would impact performance.
EDIT: Thank you for all your answers. It looks like some posters would like to know what situation I have. I am trying to write a function to shift an inputted character a certain number of characters over (ie. a cipher) if it is a letter. So, I want to say that one char += the number of shifts, but I also need to account for the jumps between the lowercase characters and uppercase characters on the ascii table, and also wrapping from z back to A. So, while it is doable in another way, I thought it would be easiest to keep adding one until I get to the end of a block of letter characters, then jump to the next one and keep going.
If your loop is really that simple, I don't see any reason why a compiler couldn't optimize it. I have no idea if any actually would, though. If your compiler doesn't the single addition will be much faster than the loop.
The language C++ does not describe how long either of those operations take. Compilers are free to turn your first statement into the second, and that is a legal way to compile it.
In practice, many compilers would treat those two subexpressions as the same expression, assuming everything is of type int. The second, however, would be fragile in that seemingly innocuous changes would cause massive performance degradation. Small changes in type that 'should not matter', extra statements nearby, etc.
It would be extremely rare for the first to be slower than the second, but if the type of a was such that += b was a much slower operation than calling += 1 a bunch of times, it could be. For example;
struct A {
std::vector<int> v;
void operator+=( int x ) {
// optimize for common case:
if (x==1 && v.size()==v.capacity()) v.reserve( v.size()*2 );
// grow the buffer:
for (int i = 0; i < x; ++i)
v.reserve( v.size()+1 );
v.resize( v.size()+1 );
}
}
};
then A a; int b = 100000; a+=b; would take much longer than the loop construct.
But I had to work at it.
The overhead (CPU instructions) on having a variable being incremented in a loop is likely to be insignificant compared to the total number of instructions in that loop (unless the only thing you are doing in the loop is incrementing). Loop variables are likely to remain in the low levels of the CPU cache (if not in CPU registries) and is very fast to increment as in doesn't need to read from the RAM via the FSB. Anyway, if in doubt just make a quick profile and you'll know if it makes sense to sacrifice code readability for speed.
Yes, absolutely slower. The second example is beyond silly. I highly doubt you have a situation where it would make sense to do it that way.
Lets say 'b' is 500,000... most computers can add that in a single operation, why do 500,000 operations (not including the loop overhead).
If the processor has an increment instruction, the compiler will usually translate the "add one" operation into an increment instruction.
Some processors may have an optimized increment instructions to help speed up things like loops. Other processors can combine an increment operation with a load or store instruction.
There is a possibility that a small loop containing only an increment instruction could be replaced by a multiply and add. The compiler is allowed to do so, if and only if the functionality is the same.
This kind of operation, generally produces negligible results. However, for large data sets and performance critical applications, this kind of operation may be necessary and the time gained would be significant.
Edit 1:
For adding values other than 1, the compiler would emit processor instructions to use the best addition operations.
The add operation is optimized in hardware as a different animal than incrementing. Arithmetic Logic Units (ALU) have been around for a long time. The basic addition operation is very optimized and a lot faster than incrementing in a loop.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 8 years ago.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Improve this question
My code is as follows:
double a,b; //These variable are inputs to the function
double *inArr; //This is also an iput to the function whose size is NumElements
double *arr = new double[numElements]; //NumElements is ~ 10^6
double sum = 0.0;
for(unsigned int i=0;i<numElements;++i)
{
double k = a*inArr[i] + b; //This doesn't take any time
double el = arr[i]; //This doesn't take any time
el *= k; //This doesn't take any time
sum += el; //This takes a long time!!!
}
This code goes over the elements of an array each time calculating a value k, for each element it adds k times that element to sum. I separated the code into so many steps so that when my profiler tells me which line takes a long time I will know exactly which calculation is the culprit. My profiler tells me that adding el to sum is what's slowing down my program (this might seem a little strange that a simple addition would be slow but I call this function hundreds of times and each time it performs millions of calculations). My only theory is that because sum is in a different scope operations using it take longer. So I edited the code to be:
double a,b; //These variable are inputs to the function
double *inArr; //This is also an iput to the function whose size is NumElements
double *arr = new double[numElements]; //NumElements is ~ 10^6
double sum = 0.0;
for(unsigned int i=0;i<numElements;++i)
{
double k = a*inArr[i] + b; //This doesn't take any time
double el = arr[i]; //This doesn't take any time
el *= k; //This doesn't take any time
double temp = sum + el; //This doesn't take any time
sum = el; //This takes a long time!!!
}
And now the sum operation takes very little time even though it accesses the sum variable. The assignment takes a long time now. Is my theory correct that the reason this happens is that it takes longer to assign to variables that aren't in the current scope? If so why is that true? Is there any way to make this assignment work quickly? I know I can optimize this using parallelization, I want to know if I can do any better serially. I am using VS 2012 running in release mode, I am using the VS performance analyzer as a profiler.
Edit:
Once I removed the optimization it turns out that the access to inArr is what is taking the most time.
Is my theory correct that the reason this happens is that it takes longer to assign to variables that aren't in the current scope?
No.
Your profiler is lying to you, and pinpointing the wrong source for the delay. Short of parallelisation this code cannot be optimised meaningfully without any knowledge of someQuickCalc. All the other operations are quite elementary.
There are limits to what a profiler can do. If you've compiled
with optimization, the compiler has probably rearranged a fair
bit of code, so the profiler can't necessarily tell which line
number is associated with any one particular instruction. And
both the compiler and the hardware will allow for a good deal of
overlapping; in many cases, the hardware will just go on to the
next instruction, even if the preceding one hasn't finished,
leaving a number of operations in the pipeline (and the compiler
will arrange the code so that the hardware can do this most
effectively). Thus, for example, the sub-expression inArr[i]
involves a memory access, which is probably significantly slower
than anything else. But the execution doesn't wait for it; the
execution doesn't wait until it actually needs the results. (If
the compiler is really clever, it may remark that arr[i]
accesses uninitialized memory, which is undefined behavior, so
it can skip the access, and give you any old random value.)
In your case, the compiler is probably only doing full
optimization within the loop, so the execution is only stalling
for the pipelined operations to finish when you write to
a variable outside the loop. And the profiler thus attributes
all of the time to this write.
(I've simplified greatly: for more details, I'd have to know
more about the actual processor, and look at the generated code
with and without optimization.)
I have a program that behaves weirdly and probably has undefined behaviour. Sometimes, the return address of a function seems to be changed, and I don't know what's causing it.
The return address is always changed to the same address, an assertion inside a function the control shouldn't be able to reach. I've been able to stop the program with a debugger to see that when it's supposed to execute a return statement, it jumps straight to the line with the assertion instead.
This code approximates how my function works.
int foo(Vector t)
double sum = 0;
for(unsgined int i=0; i<t.size();++i){
sum += t[i];
}
double limit = bar(); // bar returns a value between 0 and 1
double a=0;
for(double i=0; i<10; i++){
a += f(i)/sum; // f(1)/sum + ... + f(10)/sum = 1.0f
if(a>3)return a;
}
//shoudn'get here
assert(false); // ... then this line is executed
}
This is what I've tried so far:
Switching all std::vector [] operators with .at to prevent accidentily writing into memory
Made sure all return-by-value values are const.
Switched on -Wall and -Werror and -pedantic-errors in gcc
Ran the program with valgrind
I get a couple of invalid read of size 8, but they seem to originate from qt, so I'm not sure what to make of it. Could this be the problem?
The error happens only occasionally when I have run the program for a while and give it certain input values, and more often in a release build than in a debug build.
EDIT:
So I managed to reproduce the problem in a console application (no qt loaded) I then manages to simulate events that caused the problem.
Like some of you suggested, it turns out I misjudged what was actually causing it to reach the assertion, probably due to my lack of experience with qt's debugger. The actual problem was a floating point error in the double i used as a loop condition.
I was implementing softmax, but exp(x) got rounded to zero with particular inputs.
Now, as I have solved the problem, I might rephrase it. Is there a method for checking problems like rounding errors automatically. I.e breaking on 0/0 for instance?
The short answer is:
The most portable way of determining if a floating-point exceptional condition has occurred is to use the floating-point exception facilities provided by C in fenv.h.
although, unfortunately, this is far from being perfect.
I suggest you to read both
https://www.securecoding.cert.org/confluence/display/seccode/FLP04-C.+Check+floating-point+inputs+for+exceptional+values
and
https://www.securecoding.cert.org/confluence/display/seccode/FLP03-C.+Detect+and+handle+floating-point+errors
which concisely address the exact question you are posing:
Is there a method for checking problems like rounding errors automatically.