How is gdb stack trace readability of release code influenced on x64? - c++

I am working on a project, where the request "we want more information in release build stack traces" came up.
With "stack trace" I mean basically the output of t a a bt in gdb, which I suppose to be equivalent to the output of gstack for a running process. If this is true would be one of my questions.
My main problem is that availability of stack traces is rather erratic (sometimes you have them, sometimes you don't) and documentation could be more detailed (e.g. gdb documentation states that "-fomit-frame-pointer makes debugging impossible on some machines.", without any clear information about x86_64)
Also, when examining a running program with gstack, I get a quite perfect stack traces. I am unsure, though, if this is exactly what I would get from a core dump with gdb (which would mean that all cases where I get less information, the stack has been really corrupted).
Currently, the code is compiled with -O2. I have seen one stack trace lately, where our own program code's stack frames did not have any function parameter values, but the first (inner) frames, where our code already called a third party library, provided these values. Here, I am not sure if this is a sign that the first party library had better gcc debugging options set, or if these information is just lost at some point iterating down the stack trace.
I guess my questions are:
Which compiler options influence the stack trace quality on x86_64
are stack traces from these origins identical:
output of gstack of a running program
attached gdb to a running program, executed t a a bt
called gcore on a running program, opening core with gdb, then t a a bt
program aborted and core file written by system, opened with gdb
Is there some in-depth documentation which parameters affect stack trace quality on x86_64?
All considerations made under the assumption that the program binary exists for the core dump, and source code is not available.
With "quality of a stack trace" i mean 3 criteria:
called function names are available, not just "??"
The source codes file name and line number is available
function call parameters are available.

Which compiler options influence the stack trace quality on x86_64
The -fomit-frame-pointer is the default on x86_64, and does not cause stack traces to be unusable.
GDB relies on unwind descriptors, and you could strip these with either strip or -fno-unwind-tables (this is ill-advised).
are stack traces from these origins identical:
- output of gstack of a running program
Last I looked, gstack was a trivial shell script that invoked gdb, so yes.
attached gdb to a running program, executed "t a a bt"
Yes.
called gcore on a running program, opening core with gdb, then "t a a bt"
Yes, provided the core is opened with GDB on the same system where gcore was run.
program aborted and core file written by system, opened with gdb
Same as above.
If you are trying to open core on a different system from the one where it was produced, and the binary uses dynamic libraries, you need to set sysroot appropriately. See this question and answer.
Note that there are a few reasons stack may look corrupt or unavailable in GDB:
-fno-unwind-tables or striping mentioned above
code compiled from assembly, and lacking proper .cfi directives
third party libraries that were built with very old compiler, and have incorrect unwind descriptors (anything before gcc-4.4 was pretty bad).
and finally, stack corruption.

Related

GDB back trace from address

I am experiencing an issue with GDB bt. I am in the interrupt context during debugging and therefore I can see only current stack, so back trace will only show few calls which I am not interested in. However in the embedded software we are writing each time the panic happens we have preserving information global structure. It is pointing to the stack before crash.
My question is, can I ask GDB about to do the back trace from my known address, (with the assumption that no remapping is happening in the hardware).
I am using gdb 7.0 with olimex, I am debugging custom ARM based chip.
Best Regards

Locating segmentation fault for multithread program running on cluster

It's quite straightforward to use gdb in order to locate a segmentation fault while running a simple program in interactive mode. But consider we have a multithread program - written by pthread - submitted to a cluster node (by qsub command). So we don't have an interactive operation.
How we can locate the segmentation fault? I am looking for a general approach, a program or test tool. I can not provide a reproducible example as the program is really big and crashes on the cluster in some unknown situations.
I need to find a problem in such hard situation because the program runs correctly on the local machine with any number of threads.
The "normal" approach is to have the environment produce a core file and get hold of those. If this isn't an option, you might want to try installing a signal handler for SIGSEGV which obtains, at least, a stack trace dumped somewhere. Of course, this immediately leads to the question "how to get a stack trace" but this is answered elsewhere.
The easiest approach is probably to get hold of a core file. Assuming you have a similar machine where the core file can be read, you can use gdb program corefile to debug the program program which produced the core file corefile: You should be able to look at the different threads, their data (to some extend), etc. If you don't have a suitable machine it may be necessary to cross-compile gdb matching the hardware of the machine where it was run.
I'm a bit confused about the statement that the core files are empty: You can set the limits for core files using ulimit on the shell. If the size for cores is set to zero it shouldn't produce any core file. Producing an empty one seems odd. However, if you cannot change the limits on your program you are probably down to installing a signal handler and dumping out a stack trace from the offending thread.
Thinking of it, you may be able to put the program to sleep in the signal handler and attach to it using a debugger, assuming you can run a debugger on the corresponding machine. You would determine the process ID (using, e.g., ps -elf | grep program) and then attach to it using
gdb program pid
I'm not sure how to put a program to sleep from within the program, though (possibly installing the handler for SIGSTOP for SIGSEGV...).
That said, I assume you tried running your program on your local machine...? Some problems are more fundamental than needing a distributed system of many threads running on each node. This is, obviously, not a replacement for the approach above.

Creating a Crash Log

When I want to find a segfault or any other error which leads to a crash of a program, i would always inspect the core dump with gdb. This is quite painful when such an application runs on a computer without gdb installed.
So a few days ago I used a program (JDownloader) wich wrote a crash log file, and this file contained a stack trace. I thought this would be a great enhancement to my application. But I haven't found any information on how to write a file which contains the stacktrace just before the crash.
Is it even possible? How would I do this on Linux/Windows?
I'm using C/C++.
I believe JDownloader is written in Java. I think the language allows you to retrieve a full plain text stack trace at any point. C++ is unable to do this, because the compiled executable usually doesn't keep any information about the code used to generate it.
Windows API does allows you to catch fatal exceptions and create a dump of the process (or parts of the process, if you don't want to deal with a huge file). This dump can then be inspected with windbg, Visual Studio, or your debugger of choice.
The downside to this is that you must have the exact source code that was used to build the dumped executable, as well as the symbol database (PDB file) that was generated during the build. On top of that, some code can be optimized in ways that makes it impossible for the debugger to give you an accurate stack trace, even with the symbol data.
See MiniDumpWriteDump for details. If you're going to take this route, the best practice is to not generate the dump in the crashing process, but spawn a child process to take a dump of the parent.
There are also C and C++ libraries that can 'manually' record the call stack and give you a textual representation of it at run time, but I haven't encountered any of these that I would suggest.

Recover from crash with a core dump

A C++ program crashed on FreeBSD 6.2 and OS was kind enough to create a core dump. Is it possible to amputate some stack frames, reset the instruction pointer and restart the process in gdb, and how?
Is it possible to amputate some stack frames, reset the instruction pointer and restart the process in gdb?
I assume you mean: change the process state, and set it to start executing again (as if it never crashed in the first place).
No. For one thing, how do you propose GDB (if it magically had this capability) would handle your file descriptors (which the kernel automatically closed when your process died)?
Yes, gdb can debug core dumps just as well as running programs. Assuming that a.out is the name of your program's executable and that a.core is the name of your core file, invoke gdb like so:
gdb a.out a.core
And then you can debug like normal, except you cannot continue execution in any way (even if you could, the program would just crash again). You can examine the stack trace, registers, memory, etc.
Possible duplicate of this: Best practices for recovering from a segmentation fault
Summary: It is possible but not recommended. The way to do it is to usse setjmp() and longjmp() from a signal handler. (Please look at complete source code example in duplicate post.

GDB (DDD), debugging questions

Some things in GDB (actually using DDD gui) confuse me, when debugging my own C++ codes:
1) Why is there no backtrace available after a HEAP ERROR crash?
2) Why does gdb sometimes stop AFTER the breakpoint rather than AT the breakpoint?
3) Sometimes stepping through commented lines causes execution of some instructions (gdb busy)??
Any explanations greatly appreciated,
Petr
1) I'm not sure for heap error, but for example if you ran out of memory it might not be able to process the backtrace properly. Also if a heap corruption caused a pointer to blow up part of your application's stack that would cause the backtrace to be unavailable.
2) If you have optimization enabled, it's quite possible for this to happen. The compiler can reorder statements, and the underlying assembly upon which the breakpoint was placed may correspond to the later line of code. Don't use optimization when trying to debug such things.
3) This could be caused by either the source code not having been rebuilt before execution (so the binary is different from the actual source, or possibly even from optimization settings again.
Few possible explainations:
1) Why is there no backtrace available after a HEAP ERROR crash?
If the program is generating a core dump file you can run GDB as follows: "gdb program -c corefile" and get a backtrace.
2) Why does gdb sometimes stop AFTER the breakpoint rather than AT the breakpoint?
Breakpoints are generally placed on statements, so watch out for that. The problem here could also be caused by a mismatch between the binary and the code you're using.
3) Sometimes stepping through commented lines causes execution of some instructions (gdb busy)??
Again, see #2.
2) Why does gdb sometimes stop AFTER the breakpoint rather than AT the breakpoint?
Do you have optimization enabled during your compilation? If so the compiler may be doing non-trivial rearrangements of your code... This could conceivable address your number 3 as well.
With g++ use -O0 or no -O at all to turn optimization off.
I'm unclear of what your number 1 is asking.
Regarding the breakpoint and comment/instruction behavior, are you compiling with optimization enabled (e.g, -O3, etc.)? GDB can handle that but the behavior you are seeing sometimes occurs when debugging optimized code, especially with code compiled with aggressive optimizations.
Heap checks are probably done after main returns, try set backtrace past-main in GDB. If it's crashing - the process is gone - you need to load the core file into debugger (gdb prog core).
Optimized code, see #dmckee's answer
Same as 2.