I'm having a real strange problem using GCC for ARM with the optimizations turned on.
Compiling my C++ application without the optimizations produces an executable that
at runtime outputs the expected results. As soon as I turn on the
optimizations - that is -O1 - my application fails to produce the expected results.
I tried for a couple of days to spot the problem but I'm clueless.
I eliminated any uninitialized variables from my code, I corrected the spots where
strict aliasing could cause problems but still I do not have the proper results.
I'm using GCC 4.2.0 for ARM(the processor is an ARM926ej-s) and running the app
on a Montavista Linux distribution.
Below are the flags I'm using:
-O1 -fno-unroll-loops fno-merge-constants -fno-omit-frame-pointer -fno-toplevel-reorder \
-fno-defer-pop -fno-function-cse -Wuninitialized -Wstrict-aliasing=3 -Wstrict-overflow=3 \
-fsigned-char -march=armv5te -mtune=arm926ej-s -ffast-math
As soon as I strip the -O1 flag and recompile/relink the application I get the proper output results. As you can see from the flags I tried to disable any optimization I thought it might cause problems but still no luck.
Does anyone have any pointers on how I could further tackle this problem?
Thanks
Generally speaking, if you say "optimization breaks my program", it is 99.9% your programm that is broken. Enabling optimizations only uncovers the faults in your code.
You should also go easy on the optimization options. Only in very specific circumstances will you need anything else beyond the standard options -O0, -O2, -O3 and perhaps -Os. If you feel you do need more specific settings than that, heed the mantra of optimizations:
Measure, optimize, measure.
Never go by "gut feeling" here. Prove that a certain non-standard optimization option does significantly benefit your application, and understand why (i.e., understand exactly what that option does, and why it affects your code).
This is not a good place to navigate blindfolded.
And seeing how you use the most defensive option (-O1), then disable half a dozen optimizations, and then add -ffast-math, leads me to assume you're currently doing just that.
Well, perhaps one-eyed.
But the bottom line is: If enabling optimization breaks your code, it's most likely your code's fault.
EDIT: I just found this in the GCC manual:
-ffast-math: This option should never be turned on by any -O option
since it can result in incorrect output for programs which depend on
an exact implementation of IEEE or ISO rules/specifications for math
functions.
This does say, basically, that your -O1 -ffast-math could indeed break correct code. However, even if taking away -ffast-math removes your current problem, you should at least have an idea why. Otherwise you might merely exchange your problem now with a problem at a more inconvenient moment later (like, when your product breaks at your client's location). Is it really -ffast-math that was the problem, or do you have broken math code that is uncovered by -ffast-math?
-ffast-math should be avoided if possible. Just use -O1 for now and drop all the other optimisation switches. If you still see problems then it's time to start debugging.
Without seeing your code, it's hard to get more specific than "you probably have a bug".
There are two scenarios where enabling optimizations changes the semantics of the program:
there is a bug in the compiler, or
there is a bug in your code.
The latter is probably the most likely. Specifically, you probably rely on Undefined Behavior somewhere in your program. You rely on something that just so happen to be true when you compile using this compiler on this computer with these compiler flags, but which isn't guaranteed by the language. And so, when you enable optimizations, GCC is under no obligation to preserve that behavior.
Show us your code. Or step through it in the debugger until you get to the point where things go wrong.
I can't be any more specific. It might be a dangling pointer, uninitialized variables, breaking the aliasing rules, or even just doing one of the many things that yield undefined results (like i = i++)
Try to make a minimal test case. Rewrite the program, removing things that don't affect the error. It's likely that you'll discover the bug yourself in the process, but if you don't, you should have a one-screen example program you can post.
Incidentally, if, as others have speculated, it is -ffast-math which causes your trouble (i.e. compiling with just -O1 works fine), then it is likely you have some math in there you should rewrite anyhow. It's a bit of an over-simplification, but -ffast-math permits the compiler to essentially rearrange computations as you could abstract mathematical numbers - even though doing so on real hardware may cause slightly different results since floating point numbers aren't exact. Relying on that kind of floating point detail is likely to be unintentional.
If you want to understand the bug, a minimal test-case is critical in any case.
Related
Currently using VSCode, g++, C++20, Ubuntu 20.04 Lts.
What compiler flags can I use for release builds and debug builds separately? Do I turn off every optimization flag for debug builds? Or does it not really matter? I would appreciate any advice, recommendations, or feedback as I couldn't find much on my own.
Do I turn off every optimization flag for debug builds?
Yes, I would say that is the best way to go, and it does really matter! Depending on your code, your understanding of the compiler/debugger and level of optimisation chosen, the experience of debugging it will vary from mildly annoying to frustrating and almost useless. This answer gives a synopsis of the different levels for gcc and this question has several answers going into more detail about optimisations.
As a summary, the compiler is in general allowed to modify your code in any way it sees fit, as long as it still behaves as if all your statements were executed exactly as written. In practice, -O1 already enables dozens of techniques and -O2 and -O3 will probably leave almost nothing untouched, which makes it harder to pinpoint issues because:
Stepping through code may visit statements in a different order or skip them entirely, also hindering the use of breakpoints;
Function calls may disappear because they were inlined, and no longer be callable from the debugging prompt;
Local variables tend to have shorter lifetimes than in your source code, so you can't always query their values.
I personally build with CMake and primarily use two of its build types:
Debug (-g): No optimisations, compiles runtime assert statements;
RelWithDebInfo (-O2 -g -DNDEBUG): Fast code without these assertions that is harder to debug, but suitable for performance analysis once your program is working correctly.
I see this thread, and I had the same question, but this one isn't really answered: GCC standard optimizations behavior
I'm trying to figure out exactly what flag is causing an incredible boost in performance, in O1. I first found out which flags are set, using g++ -O1 -Q --help=optimizers and then got each of the enabled ones and used them to compile with g++. But the output results were different (the binary itself was of difference sizes).
How do I handpick optimizations for g++ or is this not possible?
Not all optimizations have individual flags, so no combination of them will generate the same code as using -O1 or any other of the general optimization enabling options (-Os, -O2, etc...). Also I imagine that a lot of the specific optimization options are ignored when you use -O0 (the default) because they require passes that are skipped if optimization hasn't generally enabled.
To try to narrow down your performance increase you can try using -O1 and then selectively disabling optimizations. For example:
g++ -O1 -fno-peephole -fno-tree-cselim -fno-var-tracking ...
You still might not have better luck this way though. It might be multiple optimizations in combination are producing your performance increase. It could also be the result of optimizations not covered by any specific flag.
I also doubt that better cache locality resulted in your "incredible boost in performance". If so it was likely a coincidence, especially at -O1. Big performance increases usually come about because GCC was able eliminate a chunk of your code either because it didn't actually have any net effect, always resulted in the same value being computed or it invoked undefined behaviour.
I have a thermal hydraulics code written in Fortran that I work on. For my debug version, I use the -check bounds option in ifort 11.1 during compile time. I have caught array bounds errors in the past in this way. Recently, though, I was seeing that the solution was quickly blowing up for a given case. The peculiar thing was that it was converging nicely for the release version of the code. Sure enough, removing the -check bounds flag from my debug makefile cleared up the problem.
The strange thing is that the debug version was working fine for many other test cases I used before and it wasn't throwing up any errors on going outside of any array bounds in my code. This behavior seems very strange to me and I have no idea if there is some kind of bug in my code or what. Anybody have any ideas what could be causing this sort of behavior?
As requested, the flags I use for release and debug are:
Release: -c -r8 -traceback -extend-source -override-limits -zero -unroll -O3
Debug: -c -r8 -traceback -extend-source -override-limits -zero -g -O0
Of course, as my original question indicates, I toggle the -check bounds flag on and off for the debug case.
I would suspect your numerical algorithm here more than the Fortran code. Have you ensured that all of convergence and stability criteria have been met?
What it sounds like is that round-off error is causing the solution to fail to converge. If you are on the edges of safe convergence, compiler optimizations can definitely tip things one way or another.
I use gfortran more than ifort, so I don't know all the specifics of the -unroll option, but unrolling loops can change some rounding even though the calculations seem like they should remain the same. Also, debug will definitely change the exact order of memory and register access. If the number is in the processor in some internal representation, then is written to memory and read back again, the value can change. This can be alleviated to some extent by careful selection of kind. By it's nature, this will be processor specific rather than portable.
In theory, full compliance with IEEE 754 would make floating point operations reproducible, but this is not always the case. If debug is actually causing these problems as opposed to some other bug in your code, then other mysterious things related to the inner workings of the processor could also cause it to blow up.
I would add write statements at various key points in the code to output your data matrices (or whatever data structures you are using). Be sure to use binary output. Open with form='unformatted' and access='direct'.
I know too much optimization doesn't make much sense for debug code.
But what about using -march=native to make better use of the instruction set?
EDIT:
Let's reformulate this. I know enabling optimizations and debug mode at the same time might have disadvantages like:
GCC allows you to use -g with -O. The shortcuts taken by optimized
code may occasionally produce surprising results: some variables you
declared may not exist at all; flow of control may briefly move where
you did not expect it; some statements may not be executed because
they compute constant results or their values were already at hand;
some statements may execute in different places because they were
moved out of loops.
So my question is, does -march=native have similar side effects or is it sensible to use it in debug code as well?
The problem with optimization is aggressive optimization passes that alter control flow can confuse debuggers. -march=native may enable additional optimizations (cmov, for example) if those passes have been enabled with a -O option, but will not in itself confuse the debugger.
Many times I work with optimized code (sometimes even involving vectorized loops), which contain bugs and such. How would one debug such code? I'm looking for any kind of tools or techniques. I use the following (possibly outdated) tools, so I'm looking to upgrade.
I use the following:
Since with ddd, you cannot see the code, I use gdb+ dissambler command and see the produced code; I can't really step through the program using this.
ndisasm
Thanks
It is always harder to debug optimised programs, but there are always ways. Some additional tips:
Make a debug build, and see if you get the same bug in a debug build. No point debugging an optimised version if you don't have to.
Use valgrind if on a platform that supports it. The errors you see may be harder to understand, but catching the problem early often simplifies debugging.
printf debugging is primitive, but sometimes it is the simplest way if you have a complex issue that only shows up in optimised builds.
If you suspect a timing issue (especially in a multithreaded program), roll your own version of assert which aborts or prints if the condition is violated, and use it in a few select places, to rule out possible problems.
See if you can reproduce the problem without using -fomit-frame-pointers, since that makes code very hard to debug, and with -O2 or -O3 enabled. That might give you enough information to find the cause of your problem.
Isolate parts of your code, build a test-suite, and see if you can identify any testcases which fail. It is much easier to debug one function than the whole program.
Try turning off optimisations one by one with the -fno-X options. This might help you find common problems like strict aliasing problems.
Turn on more compiler warnings. Some things, like strict aliasing problems, can generate compiler warnings if they create a difference in behaviour between different optimisation levels.
When debugging release builds you can put in __asm nops; as a placeholder for breakpoints (int 3). This is nice as you can guarantee breakpoint locations without messing up compiler optimizations or writing printf/cout statements.
It's always easier to debug a non-optimized version, of course. Failing that, disassembly of the code can be helpful. Other techinques I've used include partially de-optimizing the code by forcing intermediate results to be printed or logged, or changing a critical variable to "volatile" so I can at least look at that value in the debugger.
Chances are what you call optimized code is scrambled to shave cycles (which makes debugging hard) but is not really very optimized. Here is an example of what I mean.
I would turn off the compiler optimization, debug and tune it yourself, and then turn compiler optimization back on if the code has hotspots that are actually in code the compiler sees (not in outside libraries). (I define a hotspot as a part of code where the PC is often found. That automatically exempts loops containing function calls because they steal away the PC.)