Which optimization compiler switches make debugging very hard? - c++

I was involved in a debugging situation where I had no PDBs at all (unfortunately this still happens). In the particular case I was researching a stack corruption and I tried to do a manual stack walk. However, there was a strong mismatch between the ESP and the EBP registers, so I think the code was compiled with the /Oy optimization (frame pointer omission) turned on. In this case, I had to give up.
My question now is: from the Visual Studio 2015 C++ compiler switches for optmization, which ones will make debugging hard? A short explanation why it becomes hard would be great.
To limit the scope of the question, answers should consider x86 (32 bit) only.
The available options can be found on MSDN, as there are:
O1 - creates small code
Os - favors small code
O2 - creates fast code
Ot - favors fast code
Ob - controls inline expansion
Oi - generate intrinsic functions
The following ones do not need to be considered:
Od - disables optimization. This will obviously cause the least trouble
Og - is deprecated
Ox - is just a combination of others. This will obviously cause the sum of troubles of the individual switches
Oy - omit frame pointers. I already know about it. It makes stack walking really difficult. It much about guessing.

Wow, there are a lot of different types of code optimisation, way more than I have a detailed knowledge of, but I will try to detail different optimisations that significantly affect the debugging experience, knowledge of an optimisation can help guide which compiler switches will enable it.
Reordering instructions to prevent the cpu idling, generally produces more code, the debugger will appear to jump around the code and not execute linearly.
Reducing code to a compile time constant, smaller code and faster code, this code will be skipped over when debugging.
Omitting frame pointers, produces smaller code but makes it more difficult to walk the stack.
Efficient register usage, this will cause variables to be unreadable or wrong before they go out of scope because the compiler has decided the variable can be safely retired early and it's register freed up for usage elsewhere, instead of needlessly pushing it's value to the stack. This produces smaller and faster code.
Inlining, inlining produces fatter code which is faster, the inlined functions will not appear in the call stack.
All the optimisation flags are frowned upon these days, Profile Guided Optimisation is by far the preferable way to use optimisations in release and if you want to debug release-ish code you should use the -Zo flag which will produce better pdb's which can get you more information about inlined functions and variables in registers.
GCC & clang have a per optimisation set of flags the -O flags are just amalgamations of those flags and looking at how GCC distributes the optimisations and details about what each optimisation is will help further understanding about all the different optimisations compilers, in general, do.
EDIT: Also if you want to turn on individual compiler flags and see what it does to the compiled code for GCC and clang god bolt is really useful https://gcc.godbolt.org

As far as I know, debugging problems may be mostly caused by information omission in the compiled binary.
Some of the causes are:
frame pointers omission: /Oy, as you already found out
inlining: /Ob2 is in effect when /O1, /O2 or /Ox is used
code reduction: hard to know how this maps exactly to /O VC++ compiler options
better register usage
(to some degree) intrinsic functions: /Oi
My personal opinion is that a binary without debugging information (.PDB file) is usually "hard enough" to be debugged, at least not worth my time :-)

Related

ASM: Optimization of Pass-by-value vs Pass-by-reference in leading-edge compilers?

(This question is about compiler optimization assembly, not about the use of pointers in code.)
I'm trying to determine if I can take advantage of optimizations built into Clang or GCC in regards to whether the variable is passed by value or reference.
For some background I'm working on a "my own language" to C++ transliterator that does some level of preprocessing to the code. It occurred to me that it should be possible to determine at compile time whether a variable is more efficiently passed around by reference or by value. It would mean writing different versions of functions automatically, etc. but that is all doable.
I suspected that the compilers do this already to some extent. The question is: to what extent?
I've noticed by playing with Godbolt's Compiler Explorer that on higher optimization levels the resulting ASM is nothing like the code. I'm beginning to realize that many best practices and optimizations that are typically done in code no longer matter because the compilers are so damn efficient. Does this expand to pass-by-reference vs. pass-by-value?
And the big question: What if I just made everything pass-by-value with no thought for memory-efficiency or copy speed, and just let the compiler decide... would it correctly optimize?
If so I could design my language so the coder only needs to consider whether a write is done to the parent variable or results in a copy, and the whole world of reference and pointers can be delegated to the compiler. It's a completely different way of thinking.
My initial testing on Compiler Explorer with Clang (widberg) shows a difference in the ASM begins at optimization level -O2 (same ASM before that). You can see the code and ASM here:
Small Struct on Godbolt
Larger Struct on Godbolt
I've been trying to work it out, but I don't know enough assembly to figure out if I've found what I'm looking for.
Is it passing by value in the small struct version and by reference in the larger version?

Can a large number of warnings increase compilation time?

I have heard from someone that big projects with a large number of warnings in code builds significantly slower than ones with a small amount of warnings (of course, it is intended that the compiler's set to have high level warning-sensitivity).
Is there any reasonable explanation, or maybe anyone can share their experience about this topic?
On GCC compiler (e.g. gcc for C or g++ for C++) warnings do take a small amount of CPU time. Use e.g. gcc -ftime-report if you want a detailed report of compiler timing. Warning diagnostics do depend upon the optimization level.
But optimizations (especially at high level, like -O2 or more) take much more time than warnings. Empirically, optimized compilation time is proportional to the compilation unit size and to the square of the size (e.g. in number of Gimple instructions, or in lines of C code) of the biggest function. So if you have huge functions (e.g. some function of ten thousand lines in some generated C code) you may want to split them into smaller pieces.
In the early days of MELT (a GCC plugin and GCC experimental branch -GPLv3+ licensed- implementing a DSL to extend GCC, that I have developed and am still working on), it generated huge initialization functions in C (today it is less the case, the initialization is split in many C++ functions; see e.g. gcc/melt/generated/warmelt-base.cc from the MELT branch of GCC as an example). At that time, I plotted the compilation -O2 time versus the length of that initialization function and measured the compilation time vs its length. You could also experiment with manydl.c code. Again, the square of biggest function length is an experimental measure, but might be explained by register allocation issues. Also, J.Pitrat also observed that huge generated C functions -by his interesting CAIA system- are exhausting the compiler.
Also, warnings are output, and sometimes, the IDE or the terminal reading the compiler output may be slowed down if you have a lot of warnings.
Of course, as commented several times, compiler warnings are your friends (so always compile with e.g. gcc -Wall). So please improve your code to get no warnings at all. (In particular, initialize most of your local variables - I usually initialize all of them; since the compiler could optimize by removing some initializations if it can be proven that they are useless).
BTW, you could customize GCC with e.g. MELT to add your own customized warnings (e.g. to check some coding rules conformance).
Also, in C++ with weird templates, you could write a few dozens of lines which take many hours to be compiled (or even crash the compiler because of lack of memory, see this question).
NB. In 2019, GCC MELT is dead, its domain gcc-melt.org disappeared but the web pages are archived here.
It depends a lot on what the warnings actually are.
For an example, if there are lots of "variable is unused" warnings and "condition in 'if' is always true/false" warnings, then that may mean there's a lot of unnecessary code that the compiler has to parse and then remove during optimisation.
For other warnings there may be other consequences. For example, consider a "variable is self initialising" warning caused by something like int i = i;. I'd imagine this could add a whole pile of complications/overhead (where the compiler attempts to determine if the variable is "live" or can be optimised out).
This will likely depend extensively on the compiler, and how it is implemented.
That being said, there are two sure sources of slow-down:
Printing the warnings themselves is a non-trivial task, it requires extensive formatting, potentially accessing back the file, plus all those notes (macro expansion, template instantiation), and finally pushing that to an I/O device.
Emitting said warnings, with all those macro expansion and template instantiation data, might be non-trivial too. Furthermore, if collected first and only emitted at the end of the compilation process (instead of being streamed as they are produced), then the growing memory consumption will also slow you down (requiring more pages to be provided by the OS, ...)
In general, in terms of engineering, I do not expect compiler writers to worry much about the cost of emitting diagnostics; as long as it is a reasonable cost, there seems to be little incentive in optimizing a couple milli-seconds when a human intervention is going to be required anyway.
Who cares if your build takes 10% longer due to the compiler printing tons of warnings? The problem is the code you're being warned about, not the extra time it takes. Besides, 10% is probably a huge overestimate for the overhead of printing even very large numbers of warnings.

Compiler optimization makes program crash

I'm writing a program in C++/Qt which contains a graph file parser. I use g++ to compile the project.
While developing, I am constantly comparing the performance of my low level parser layer between different compiler flags regarding optimization and debug information, plus Qt's debug flag (turning on/off qDebug() and Q_ASSERT()).
Now I'm facing a problem where the only correctly functioning build is the one without any optimization. All other versions, even with -O1, seem to work in another way. They crash due to unsatisfied assertions, which are satisfied when compiled without a -O... flag. The code doesn't produce any compiler warning, even with -Wall.
I am very sure that there is a bug in my program, which seems to be only harmful with optimization being enabled. The problem is: I can't find it even when debugging the program. The parser seems to read wrong data from the file. When I run some simple test cases, they run perfectly. When I run a bigger test case (a route calculation on a graph read directly from a file), there is an incorrect read in the file which I can't explain.
Where should I start tracking down the problem of this undefined behavior? Which optimization methods are possibly involved within this different behavior? (I could enable all flags one after the other, but I don't know that much compiler flags but -O... and I know that there are a lot of them, so this would need a very long time.) As soon as I know which type the bug is of, I am sure I find it sooner or later.
You can help me a lot if you can tell me which compiler optimization methods are possible candidates for such problems.
There are a few classes of bugs that commonly arise in optimized builds, that often don't arise in debug builds.
Un-initialized variables. The compiler can catch some but not all. Look at all your constructors, look at global variables. etc. Particularly look for uninitialized pointers. In a debug build memory is reset to zero, but in a release build it isn't.
Use of temporaries that have gone out of scope. For example when you return a reference to a local temporary in a function. These often work in debug builds because the stack is padded out more. The temporaries tend to survive on the stack a little longer.
array overruns writing of temporaries. For example if you create an array as a temporary in a function and then write one element beyond the end. Again, the stack will have extra space in debug ( for debugging information ) and your overrun won't hit program data.
There are optimizations you can disable from the optimized build to help make debugging the optimized version easier.
-g -O1 -fno-inline -fno-loop-optimize -fno-if-conversion -fno-if-conversion2 \
-fno-delayed-branch
This should make stepping through your code in the debugger a little easier to follow.
Another suggestion is that if the assertions you have do not give you enough information about what is causing the problem, you should consider adding more assertions. If you are afraid of performance issues, or assertion clutter, you can wrap them in a macro. This allows you to distinguish the debugging assertions from the ones you originally added, so they can be disabled or removed from your code later.
1) Use valgrind on the broken version. (For that matter, try valgrind on the working version, maybe you'll get lucky.)
2) Build the system with "-O1 -g" and step through your program with gdb. At the crash, what variable has an incorrect value? Re-run your program and note when that variable is written to (or when it isn't and should have been.)

What is the scope of optimizations in a compiler?

When compiler optimizes code, what is the scope of the optimizations? Can an optimization
a) span more than a set of nested braces?
b) span more than a function?
c) span more than a file?
We are chasing an obscure bug that seems to derive from optimization. Code crashes in release mode but not debug mode. And of course we are aware this could be heap corruption or other memory problem bur have been chasing this for a while now. One avenue we are considering is to selectively compile our files in debug mode until the problem goes away. In other words:
a) Start with all file compiled in release mode
b) Compile 1/2 the files in debug mode
if crash still seen, take half the release compiled files and compile in debug mode
if crash not seen, talk half the files compiled in debug mode and compile in release mode
repeat until we narrow in on suspect files
this is a binary search to narrow in on problem files
We are aware that if this is a memory issue, simple doing this mixed compilation may make the bug go away, but we are curous if we can narrow in on the problem files.
The outstanding question though is what is the scope of optimizations - can they span more than one file?
An optimization can do literally anything as long as it doesn't change the semantics of the behaviour defined by the language. That means the answers to your first questions (a), (b), and (c) are all yes. In practice, most compilers aren't that ambitious, but there are certainly some examples. Clang and LLVM have flags for link time optimization that allow optimizations to span pretty much the entire program. MSVC has a similar /GL flag that allows whole-program optimization.
Often the causes of these sorts of failures are uninitialized variables. Static analysis tools can be very helpful in finding problems like the one you're describing. I'm not sure your binary searching via optimizations is going to help you track down too much, though it is possible. Your bug could be the result of pretty complicated interactions between modules.
Good luck!
You can approximately identify problem files from call traces of crash in release mode. Then try to rebuild them without optimizations - this is much simpler than binary search.
To know about compiler optimization, one easy resourse could be wikipedia. It nicely explains in brief some important and wide-implemented optimizations by almost all modern compilers.
Maybe, you would like to read the wiki entry first, especially these sections:
Types of optimizations
Specific techniques
I think you are asking the wrong question.
It is unlikely (unless you are doing something special) that the optimizer is your problem.
The problem is most likely an uninitialized variable in your code.
When you compile in debug mode most compilers will initialize all memory to a specific pattern, so that during debugging you can see what has happened to the memory (was it allocated/ was it de-allcoated/ was it from the stack/ Was the stack popped back from etc).
Thus in debug mode any uninitialized memory has the a specific pattern that can be recognized by the debugger. The exact pattern will depend on the compiler. But this can make sloppy code work as the uninitialized pointers are NULL, integer counters start at 0 etc.
When you compile for release mode all the extra initialization is turned off. If you did not explicitly initialize the memory it is in a random state. This is what debug/release versions of an application behave differently.
The easy way to check for this is to check your compiler warnings.
Make sure that all variables are initialized before use (it will be in the warnings).

What are efficient ways to debug an optimized C/C++ program?

Many times I work with optimized code (sometimes even involving vectorized loops), which contain bugs and such. How would one debug such code? I'm looking for any kind of tools or techniques. I use the following (possibly outdated) tools, so I'm looking to upgrade.
I use the following:
Since with ddd, you cannot see the code, I use gdb+ dissambler command and see the produced code; I can't really step through the program using this.
ndisasm
Thanks
It is always harder to debug optimised programs, but there are always ways. Some additional tips:
Make a debug build, and see if you get the same bug in a debug build. No point debugging an optimised version if you don't have to.
Use valgrind if on a platform that supports it. The errors you see may be harder to understand, but catching the problem early often simplifies debugging.
printf debugging is primitive, but sometimes it is the simplest way if you have a complex issue that only shows up in optimised builds.
If you suspect a timing issue (especially in a multithreaded program), roll your own version of assert which aborts or prints if the condition is violated, and use it in a few select places, to rule out possible problems.
See if you can reproduce the problem without using -fomit-frame-pointers, since that makes code very hard to debug, and with -O2 or -O3 enabled. That might give you enough information to find the cause of your problem.
Isolate parts of your code, build a test-suite, and see if you can identify any testcases which fail. It is much easier to debug one function than the whole program.
Try turning off optimisations one by one with the -fno-X options. This might help you find common problems like strict aliasing problems.
Turn on more compiler warnings. Some things, like strict aliasing problems, can generate compiler warnings if they create a difference in behaviour between different optimisation levels.
When debugging release builds you can put in __asm nops; as a placeholder for breakpoints (int 3). This is nice as you can guarantee breakpoint locations without messing up compiler optimizations or writing printf/cout statements.
It's always easier to debug a non-optimized version, of course. Failing that, disassembly of the code can be helpful. Other techinques I've used include partially de-optimizing the code by forcing intermediate results to be printed or logged, or changing a critical variable to "volatile" so I can at least look at that value in the debugger.
Chances are what you call optimized code is scrambled to shave cycles (which makes debugging hard) but is not really very optimized. Here is an example of what I mean.
I would turn off the compiler optimization, debug and tune it yourself, and then turn compiler optimization back on if the code has hotspots that are actually in code the compiler sees (not in outside libraries). (I define a hotspot as a part of code where the PC is often found. That automatically exempts loops containing function calls because they steal away the PC.)