Why is bounds checking changing the behavior of my program? - fortran

I have a thermal hydraulics code written in Fortran that I work on. For my debug version, I use the -check bounds option in ifort 11.1 during compile time. I have caught array bounds errors in the past in this way. Recently, though, I was seeing that the solution was quickly blowing up for a given case. The peculiar thing was that it was converging nicely for the release version of the code. Sure enough, removing the -check bounds flag from my debug makefile cleared up the problem.
The strange thing is that the debug version was working fine for many other test cases I used before and it wasn't throwing up any errors on going outside of any array bounds in my code. This behavior seems very strange to me and I have no idea if there is some kind of bug in my code or what. Anybody have any ideas what could be causing this sort of behavior?
As requested, the flags I use for release and debug are:
Release: -c -r8 -traceback -extend-source -override-limits -zero -unroll -O3
Debug: -c -r8 -traceback -extend-source -override-limits -zero -g -O0
Of course, as my original question indicates, I toggle the -check bounds flag on and off for the debug case.

I would suspect your numerical algorithm here more than the Fortran code. Have you ensured that all of convergence and stability criteria have been met?
What it sounds like is that round-off error is causing the solution to fail to converge. If you are on the edges of safe convergence, compiler optimizations can definitely tip things one way or another.
I use gfortran more than ifort, so I don't know all the specifics of the -unroll option, but unrolling loops can change some rounding even though the calculations seem like they should remain the same. Also, debug will definitely change the exact order of memory and register access. If the number is in the processor in some internal representation, then is written to memory and read back again, the value can change. This can be alleviated to some extent by careful selection of kind. By it's nature, this will be processor specific rather than portable.
In theory, full compliance with IEEE 754 would make floating point operations reproducible, but this is not always the case. If debug is actually causing these problems as opposed to some other bug in your code, then other mysterious things related to the inner workings of the processor could also cause it to blow up.
I would add write statements at various key points in the code to output your data matrices (or whatever data structures you are using). Be sure to use binary output. Open with form='unformatted' and access='direct'.

Related

Can valgrind/callgrind work on a release executable C++ program?

I understand that valgrind can call memcheck to perform memory leak check, and in this case the compiled C++ executable program must contain debug information. Then, if I want to use valgrind/callgrind to perform profiling, must the executable contain debug information? I have run a small test, and it seems that valgrind/callgrind can work on release executable programs without debug information. Can anyone confirm it?
From Official Valgrind documentation link, following information can be found:
2.2. Getting started
First off, consider whether it might be beneficial to recompile your application and supporting libraries with debugging info enabled (the -g option).
Without debugging info, the best Valgrind tools will be able to do is guess which function a particular piece of code belongs to, which makes both error messages and profiling output nearly useless. With -g, you'll get messages which point directly to the relevant source code lines.
Another option you might like to consider, if you are working with C++, is -fno-inline. That makes it easier to see the function-call chain, which can help reduce confusion when navigating around large C++ apps. For example, debugging OpenOffice.org with Memcheck is a bit easier when using this option. You don't have to do this, but doing so helps Valgrind produce more accurate and less confusing error reports. Chances are you're set up like this already, if you intended to debug your program with GNU GDB, or some other debug.
Hence the recommended step is to recompile your program with -g option to get maximum information from the Valgrind.
According to the valgrind manual:
http://valgrind.org/docs/manual/manual-core.html
If you are planning to use Memcheck: On rare occasions, compiler optimisations (at -O2 and above, and sometimes -O1) have been observed to generate code which fools Memcheck into wrongly reporting uninitialised value errors, or missing uninitialised value errors. We have looked in detail into fixing this, and unfortunately the result is that doing so would give a further significant slowdown in what is already a slow tool. So the best solution is to turn off optimisation altogether. Since this often makes things unmanageably slow, a reasonable compromise is to use -O. This gets you the majority of the benefits of higher optimisation levels whilst keeping relatively small the chances of false positives or false negatives from Memcheck. Also, you should compile your code with -Wall because it can identify some or all of the problems that Valgrind can miss at the higher optimisation levels. (Using -Wall is also a good idea in general.) All other tools (as far as we know) are unaffected by optimisation level, and for profiling tools like Cachegrind it is better to compile your program at its normal optimisation level.

Compiler optimization makes program crash

I'm writing a program in C++/Qt which contains a graph file parser. I use g++ to compile the project.
While developing, I am constantly comparing the performance of my low level parser layer between different compiler flags regarding optimization and debug information, plus Qt's debug flag (turning on/off qDebug() and Q_ASSERT()).
Now I'm facing a problem where the only correctly functioning build is the one without any optimization. All other versions, even with -O1, seem to work in another way. They crash due to unsatisfied assertions, which are satisfied when compiled without a -O... flag. The code doesn't produce any compiler warning, even with -Wall.
I am very sure that there is a bug in my program, which seems to be only harmful with optimization being enabled. The problem is: I can't find it even when debugging the program. The parser seems to read wrong data from the file. When I run some simple test cases, they run perfectly. When I run a bigger test case (a route calculation on a graph read directly from a file), there is an incorrect read in the file which I can't explain.
Where should I start tracking down the problem of this undefined behavior? Which optimization methods are possibly involved within this different behavior? (I could enable all flags one after the other, but I don't know that much compiler flags but -O... and I know that there are a lot of them, so this would need a very long time.) As soon as I know which type the bug is of, I am sure I find it sooner or later.
You can help me a lot if you can tell me which compiler optimization methods are possible candidates for such problems.
There are a few classes of bugs that commonly arise in optimized builds, that often don't arise in debug builds.
Un-initialized variables. The compiler can catch some but not all. Look at all your constructors, look at global variables. etc. Particularly look for uninitialized pointers. In a debug build memory is reset to zero, but in a release build it isn't.
Use of temporaries that have gone out of scope. For example when you return a reference to a local temporary in a function. These often work in debug builds because the stack is padded out more. The temporaries tend to survive on the stack a little longer.
array overruns writing of temporaries. For example if you create an array as a temporary in a function and then write one element beyond the end. Again, the stack will have extra space in debug ( for debugging information ) and your overrun won't hit program data.
There are optimizations you can disable from the optimized build to help make debugging the optimized version easier.
-g -O1 -fno-inline -fno-loop-optimize -fno-if-conversion -fno-if-conversion2 \
-fno-delayed-branch
This should make stepping through your code in the debugger a little easier to follow.
Another suggestion is that if the assertions you have do not give you enough information about what is causing the problem, you should consider adding more assertions. If you are afraid of performance issues, or assertion clutter, you can wrap them in a macro. This allows you to distinguish the debugging assertions from the ones you originally added, so they can be disabled or removed from your code later.
1) Use valgrind on the broken version. (For that matter, try valgrind on the working version, maybe you'll get lucky.)
2) Build the system with "-O1 -g" and step through your program with gdb. At the crash, what variable has an incorrect value? Re-run your program and note when that variable is written to (or when it isn't and should have been.)

gcc optimizations cause app to fail

I'm having a real strange problem using GCC for ARM with the optimizations turned on.
Compiling my C++ application without the optimizations produces an executable that
at runtime outputs the expected results. As soon as I turn on the
optimizations - that is -O1 - my application fails to produce the expected results.
I tried for a couple of days to spot the problem but I'm clueless.
I eliminated any uninitialized variables from my code, I corrected the spots where
strict aliasing could cause problems but still I do not have the proper results.
I'm using GCC 4.2.0 for ARM(the processor is an ARM926ej-s) and running the app
on a Montavista Linux distribution.
Below are the flags I'm using:
-O1 -fno-unroll-loops fno-merge-constants -fno-omit-frame-pointer -fno-toplevel-reorder \
-fno-defer-pop -fno-function-cse -Wuninitialized -Wstrict-aliasing=3 -Wstrict-overflow=3 \
-fsigned-char -march=armv5te -mtune=arm926ej-s -ffast-math
As soon as I strip the -O1 flag and recompile/relink the application I get the proper output results. As you can see from the flags I tried to disable any optimization I thought it might cause problems but still no luck.
Does anyone have any pointers on how I could further tackle this problem?
Thanks
Generally speaking, if you say "optimization breaks my program", it is 99.9% your programm that is broken. Enabling optimizations only uncovers the faults in your code.
You should also go easy on the optimization options. Only in very specific circumstances will you need anything else beyond the standard options -O0, -O2, -O3 and perhaps -Os. If you feel you do need more specific settings than that, heed the mantra of optimizations:
Measure, optimize, measure.
Never go by "gut feeling" here. Prove that a certain non-standard optimization option does significantly benefit your application, and understand why (i.e., understand exactly what that option does, and why it affects your code).
This is not a good place to navigate blindfolded.
And seeing how you use the most defensive option (-O1), then disable half a dozen optimizations, and then add -ffast-math, leads me to assume you're currently doing just that.
Well, perhaps one-eyed.
But the bottom line is: If enabling optimization breaks your code, it's most likely your code's fault.
EDIT: I just found this in the GCC manual:
-ffast-math: This option should never be turned on by any -O option
since it can result in incorrect output for programs which depend on
an exact implementation of IEEE or ISO rules/specifications for math
functions.
This does say, basically, that your -O1 -ffast-math could indeed break correct code. However, even if taking away -ffast-math removes your current problem, you should at least have an idea why. Otherwise you might merely exchange your problem now with a problem at a more inconvenient moment later (like, when your product breaks at your client's location). Is it really -ffast-math that was the problem, or do you have broken math code that is uncovered by -ffast-math?
-ffast-math should be avoided if possible. Just use -O1 for now and drop all the other optimisation switches. If you still see problems then it's time to start debugging.
Without seeing your code, it's hard to get more specific than "you probably have a bug".
There are two scenarios where enabling optimizations changes the semantics of the program:
there is a bug in the compiler, or
there is a bug in your code.
The latter is probably the most likely. Specifically, you probably rely on Undefined Behavior somewhere in your program. You rely on something that just so happen to be true when you compile using this compiler on this computer with these compiler flags, but which isn't guaranteed by the language. And so, when you enable optimizations, GCC is under no obligation to preserve that behavior.
Show us your code. Or step through it in the debugger until you get to the point where things go wrong.
I can't be any more specific. It might be a dangling pointer, uninitialized variables, breaking the aliasing rules, or even just doing one of the many things that yield undefined results (like i = i++)
Try to make a minimal test case. Rewrite the program, removing things that don't affect the error. It's likely that you'll discover the bug yourself in the process, but if you don't, you should have a one-screen example program you can post.
Incidentally, if, as others have speculated, it is -ffast-math which causes your trouble (i.e. compiling with just -O1 works fine), then it is likely you have some math in there you should rewrite anyhow. It's a bit of an over-simplification, but -ffast-math permits the compiler to essentially rearrange computations as you could abstract mathematical numbers - even though doing so on real hardware may cause slightly different results since floating point numbers aren't exact. Relying on that kind of floating point detail is likely to be unintentional.
If you want to understand the bug, a minimal test-case is critical in any case.

Is there a (Linux) g++ equivalent to the /fp:precise and /fp:fast flags used in Visual Studio?

Background:
Many years ago, I inherited a codebase that was using the Visual Studio (VC++) flag '/fp:fast' to produce faster code in a particular calculation-heavy library. Unfortunately, '/fp:fast' produced results that were slightly different to the same library under a different compiler (Borland C++). As we needed to produce exactly the same results, I switched to '/fp:precise', which worked fine, and everything has been peachy ever since. However, now I'm compiling the same library with g++ on uBuntu Linux 10.04 and I'm seeing similar behavior, and I wonder if it might have a similar root cause. The numerical results from my g++ build are slightly different from the numerical results from my VC++ build. This brings me to my question:
Question:
Does g++ have equivalent or similar parameters to the 'fp:fast' and 'fp:precise' options in VC++? (and what are they? I want to activate the 'fp:precise' equivalent.)
More Verbose Information:
I compile using 'make', which calls g++. So far as I can tell (the make files are a little cryptic, and weren't written by me) the only parameters added to the g++ call are the "normal" ones (include folders and the files to compile) and -fPIC (I'm not sure what this switch does, I don't see it on the 'man' page).
The only relevant parameters in 'man g++' seem to be for turning optimization options ON. (e.g. -funsafe-math-optimizations). However, I don't think I'm turning anything ON, I just want to turn the relevant optimization OFF.
I've tried Release and Debug builds, VC++ gives the same results for release and debug, and g++ gives the same results for release and debug, but I can't get the g++ version to give the same results as the VC++ version.
From the GCC manual:
-ffloat-store
Do not store floating point variables in registers, and inhibit other options that might change whether a floating point value is taken from a register or memory.
This option prevents undesirable excess precision on machines such as the 68000 where the floating registers (of the 68881) keep more precision than a double is supposed to have. Similarly for the x86 architecture. For most programs, the excess precision does only good, but a few programs rely on the precise definition of IEEE floating point. Use -ffloat-store for such programs, after modifying them to store all pertinent intermediate computations into variables.
To expand a bit, most of these discrepancies come from the use of the x86 80-bit floating point registers for calculations (vs. the 64-bits used to store double values). If intermediate results are kept in the registers without writing back to memory, you effectively get 16 bits of extra precision in your calculations, making them more precise but possibly divergent from results generated with write/read of intermediate values to memory (or from calculations on architectures that only have 64-bit FP registers).
These flags (both in GCC and MSVC) generally force truncation of each intermediate result to 64-bits, thereby making calculations insensitive to the vagaries of code generation and optimization and platform differences. This consistency generally comes with a slight runtime cost in addition to the cost in terms of accuracy/precision.
Excess register precision is an issue only on FPU registers, which compilers (with the right enabling switches) tend to avoid anyway. When floating point computations are carried out in SSE registers, the register precision equals the memory one.
In my experience most of the /fp:fast impact (and potential discrepancy) comes from the compiler taking the liberty to perform algebraic transforms. This can be as simple as changing summands order:
( a + b ) + c --> a + ( b + c)
can be - distributing multiplications like a*(b+c) at will, and can get to some rather complex transforms - all intended to reuse previous calculations.
In infinite precision such transforms are benign, of course - but in finite precision they actually change the result. As a toy example, try the summand-order-example with a=b=2^(-23), c = 1. MS's Eric Fleegal describes it in much more detail.
In this respect, the gcc switch nearest to /fp:precise is -fno-unsafe-math-optimizations. I think it's on by default - perhaps you can try setting it explicitly and see if it makes a difference. Similarly, you can try explicitly turning off all -ffast-math optimizations: -fno-finite-math-only, -fmath-errno, -ftrapping-math, -frounding-math and -fsignaling-nans (the last 2 options are non default!)
I don't think there's an exact equivalent. You might try -mfpmath=sse instead of the default -mfpmath=387 to see if that helps.
This is definitely not related to optimization flags, assuming by "Debug" you mean "with optimizations off." If g++ gives the same results in debug as in release, that means it's not an optimization-related issue.
Debug builds should always store each intermediate result in memory, thereby guaranteeing the same results as /fp:precise does for MSVC.
This likely means there is (a) a compiler bug in one of the compilers, or more likely (b) a math library bug. I would drill into individual functions in your calculation and narrow down where the discrepancy lies. You'll likely find a workaround at that point, and if you do find a bug, I'm sure the relevant team would love to hear about it.
-mpc32 or -mpc64?
But you may need to recompile C and math libraries with the switch to see the difference... This may apply to options others suggested as well.

What are efficient ways to debug an optimized C/C++ program?

Many times I work with optimized code (sometimes even involving vectorized loops), which contain bugs and such. How would one debug such code? I'm looking for any kind of tools or techniques. I use the following (possibly outdated) tools, so I'm looking to upgrade.
I use the following:
Since with ddd, you cannot see the code, I use gdb+ dissambler command and see the produced code; I can't really step through the program using this.
ndisasm
Thanks
It is always harder to debug optimised programs, but there are always ways. Some additional tips:
Make a debug build, and see if you get the same bug in a debug build. No point debugging an optimised version if you don't have to.
Use valgrind if on a platform that supports it. The errors you see may be harder to understand, but catching the problem early often simplifies debugging.
printf debugging is primitive, but sometimes it is the simplest way if you have a complex issue that only shows up in optimised builds.
If you suspect a timing issue (especially in a multithreaded program), roll your own version of assert which aborts or prints if the condition is violated, and use it in a few select places, to rule out possible problems.
See if you can reproduce the problem without using -fomit-frame-pointers, since that makes code very hard to debug, and with -O2 or -O3 enabled. That might give you enough information to find the cause of your problem.
Isolate parts of your code, build a test-suite, and see if you can identify any testcases which fail. It is much easier to debug one function than the whole program.
Try turning off optimisations one by one with the -fno-X options. This might help you find common problems like strict aliasing problems.
Turn on more compiler warnings. Some things, like strict aliasing problems, can generate compiler warnings if they create a difference in behaviour between different optimisation levels.
When debugging release builds you can put in __asm nops; as a placeholder for breakpoints (int 3). This is nice as you can guarantee breakpoint locations without messing up compiler optimizations or writing printf/cout statements.
It's always easier to debug a non-optimized version, of course. Failing that, disassembly of the code can be helpful. Other techinques I've used include partially de-optimizing the code by forcing intermediate results to be printed or logged, or changing a critical variable to "volatile" so I can at least look at that value in the debugger.
Chances are what you call optimized code is scrambled to shave cycles (which makes debugging hard) but is not really very optimized. Here is an example of what I mean.
I would turn off the compiler optimization, debug and tune it yourself, and then turn compiler optimization back on if the code has hotspots that are actually in code the compiler sees (not in outside libraries). (I define a hotspot as a part of code where the PC is often found. That automatically exempts loops containing function calls because they steal away the PC.)