Some things in GDB (actually using DDD gui) confuse me, when debugging my own C++ codes:
1) Why is there no backtrace available after a HEAP ERROR crash?
2) Why does gdb sometimes stop AFTER the breakpoint rather than AT the breakpoint?
3) Sometimes stepping through commented lines causes execution of some instructions (gdb busy)??
Any explanations greatly appreciated,
Petr
1) I'm not sure for heap error, but for example if you ran out of memory it might not be able to process the backtrace properly. Also if a heap corruption caused a pointer to blow up part of your application's stack that would cause the backtrace to be unavailable.
2) If you have optimization enabled, it's quite possible for this to happen. The compiler can reorder statements, and the underlying assembly upon which the breakpoint was placed may correspond to the later line of code. Don't use optimization when trying to debug such things.
3) This could be caused by either the source code not having been rebuilt before execution (so the binary is different from the actual source, or possibly even from optimization settings again.
Few possible explainations:
1) Why is there no backtrace available after a HEAP ERROR crash?
If the program is generating a core dump file you can run GDB as follows: "gdb program -c corefile" and get a backtrace.
2) Why does gdb sometimes stop AFTER the breakpoint rather than AT the breakpoint?
Breakpoints are generally placed on statements, so watch out for that. The problem here could also be caused by a mismatch between the binary and the code you're using.
3) Sometimes stepping through commented lines causes execution of some instructions (gdb busy)??
Again, see #2.
2) Why does gdb sometimes stop AFTER the breakpoint rather than AT the breakpoint?
Do you have optimization enabled during your compilation? If so the compiler may be doing non-trivial rearrangements of your code... This could conceivable address your number 3 as well.
With g++ use -O0 or no -O at all to turn optimization off.
I'm unclear of what your number 1 is asking.
Regarding the breakpoint and comment/instruction behavior, are you compiling with optimization enabled (e.g, -O3, etc.)? GDB can handle that but the behavior you are seeing sometimes occurs when debugging optimized code, especially with code compiled with aggressive optimizations.
Heap checks are probably done after main returns, try set backtrace past-main in GDB. If it's crashing - the process is gone - you need to load the core file into debugger (gdb prog core).
Optimized code, see #dmckee's answer
Same as 2.
Related
I am working on a project, where the request "we want more information in release build stack traces" came up.
With "stack trace" I mean basically the output of t a a bt in gdb, which I suppose to be equivalent to the output of gstack for a running process. If this is true would be one of my questions.
My main problem is that availability of stack traces is rather erratic (sometimes you have them, sometimes you don't) and documentation could be more detailed (e.g. gdb documentation states that "-fomit-frame-pointer makes debugging impossible on some machines.", without any clear information about x86_64)
Also, when examining a running program with gstack, I get a quite perfect stack traces. I am unsure, though, if this is exactly what I would get from a core dump with gdb (which would mean that all cases where I get less information, the stack has been really corrupted).
Currently, the code is compiled with -O2. I have seen one stack trace lately, where our own program code's stack frames did not have any function parameter values, but the first (inner) frames, where our code already called a third party library, provided these values. Here, I am not sure if this is a sign that the first party library had better gcc debugging options set, or if these information is just lost at some point iterating down the stack trace.
I guess my questions are:
Which compiler options influence the stack trace quality on x86_64
are stack traces from these origins identical:
output of gstack of a running program
attached gdb to a running program, executed t a a bt
called gcore on a running program, opening core with gdb, then t a a bt
program aborted and core file written by system, opened with gdb
Is there some in-depth documentation which parameters affect stack trace quality on x86_64?
All considerations made under the assumption that the program binary exists for the core dump, and source code is not available.
With "quality of a stack trace" i mean 3 criteria:
called function names are available, not just "??"
The source codes file name and line number is available
function call parameters are available.
Which compiler options influence the stack trace quality on x86_64
The -fomit-frame-pointer is the default on x86_64, and does not cause stack traces to be unusable.
GDB relies on unwind descriptors, and you could strip these with either strip or -fno-unwind-tables (this is ill-advised).
are stack traces from these origins identical:
- output of gstack of a running program
Last I looked, gstack was a trivial shell script that invoked gdb, so yes.
attached gdb to a running program, executed "t a a bt"
Yes.
called gcore on a running program, opening core with gdb, then "t a a bt"
Yes, provided the core is opened with GDB on the same system where gcore was run.
program aborted and core file written by system, opened with gdb
Same as above.
If you are trying to open core on a different system from the one where it was produced, and the binary uses dynamic libraries, you need to set sysroot appropriately. See this question and answer.
Note that there are a few reasons stack may look corrupt or unavailable in GDB:
-fno-unwind-tables or striping mentioned above
code compiled from assembly, and lacking proper .cfi directives
third party libraries that were built with very old compiler, and have incorrect unwind descriptors (anything before gcc-4.4 was pretty bad).
and finally, stack corruption.
I run valgrind and it comes up with a particular error I am interested in (others are false positives). When running gdb I want to get straight to that error. How would I go about this? Otherwise it would take ages due to the high number of other errors. It's an invalid free error I am interested in, can I suppress other types of error, or can I specify perhaps line numbers or addresses where I am happy to stop the program?
Or am I stuck having to do it the hard way?
I'm using valgrind 3.9.0 and GDB 7.4-2012.04 on Linux Mint 13.
You can instruct valgrind to skip the next 1000 errors like this:
(gdb) monitor v.set vgdb-error 1000
Also set a break point at the start of the code you are interested in testing for memory errors. When the breakpoint is reached then set vgdb-error to 0 before continuing, and gdb will once again stop at each error.
Apparently you can also give valgrind a list of errors to suppress, or valgrind can be used to generate the list automatically. See http://valgrind.org/docs/manual/manual-core.html#manual-core.suppress
Note the poster is talking about a situation where he is debugging a program using valgrind's gdbserver and gdb. This is a powerful technique for adding memory error checking to your gdb session. See http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver
I have some code, that I'm currently porting from OS X to Linux (console tool).
Somewhere in this code, I get a segmentation fault. The problem is, that if I run the program without GDB, I clearly see the segmentation fault, and the program is killed. But when I'm running GDB it just halts, and GDB never returns to the prompt. So I'm not really able to examine what's going on.
C++ is the code. Compiled with the -g option in g++.
Btw. Pretty new to GDB, so excuse me if this is obvious.
Any ideas?
Thanks in advance.
Trenskow
gdb will suspend your program when the seg fault signal is received
type where to see the stack trace and start inspecting what's going on from there.
Also consider enabling core dumps, that way you can load the core dump in GDB and investigate what is going on
you can then load the core dump like this
> gdb your_program the_core_dump
The behaviour you describe is not typical - I suspect the stack may have been trashed.
Try sending various signals directly via the 'kill' command.
Might be worth you running a test program in gdb with an abort() in it so that you can learn what the expected behaviour is for gdb.
I've seen this before when my stack was too large. Try moving stack variables onto the heap (make them globals), recompile, and see if you still get the error.
In my c++ program I'm using a library which will "send?" a Sigtrap on a certain operations when
I'm debugging it (using gdb as a debugger). I can then choose whether I wish to Continue or Stop the program. If I choose to continue the program works as expected, but setting custom breakpoints after a Sigtrap has been caught causes the debugger/program to crash.
So here are my questions:
What causes such a Sigtrap? Is it a leftover line of code that can be removed, or is it caused by the debugger when he "finds something he doesn't like" ?
Is a sigtrap, generally speaking, a bad thing, and if so, why does the program run flawlessly when I compile a Release and not a Debug Version?
What does a Sigtrap indicate?
This is a more general approach to a question I posted yesterday Boost Filesystem: recursive_directory_iterator constructor causes SIGTRAPS and debug problems.
I think my question was far to specific, and I don't want you to solve my problem but help me (and hopefully others) to understand the background.
Thanks a lot.
With processors that support instruction breakpoints or data watchpoints, the debugger will ask the CPU to watch for instruction accesses to a specific address, or data reads/writes to a specific address, and then run full-speed.
When the processor detects the event, it will trap into the kernel, and the kernel will send SIGTRAP to the process being debugged. Normally, SIGTRAP would kill the process, but because it is being debugged, the debugger will be notified of the signal and handle it, mostly by letting you inspect the state of the process before continuing execution.
With processors that don't support breakpoints or watchpoints, the entire debugging environment is probably done through code interpretation and memory emulation, which is immensely slower. (I imagine clever tricks could be done by setting pagetable flags to forbid reading or writing, whichever needs to be trapped, and letting the kernel fix up the pagetables, signaling the debugger, and then restricting the page flags again. This could probably support near-arbitrary number of watchpoints and breakpoints, and run only marginally slower for cases when the watchpoint or breakpoint aren't frequently accessed.)
The question I placed into the comment field looks apropos here, only because Windows isn't actually sending a SIGTRAP, but rather signaling a breakpoint in its own native way. I assume when you're debugging programs, that debug versions of system libraries are used, and ensure that memory accesses appear to make sense. You might have a bug in your program that is papered-over at runtime, but may in fact be causing further problems elsewhere.
I haven't done development on Windows, but perhaps you could get further details by looking through your Windows Event Log?
While working in Eclipse with minGW/gcc compiler, I realized it's reacting very bad with vectors in my code, resulting to an unclear SIGTRAP signal and sometimes even showing abnormal debugger behavior (i.e. jumping somewhere up in the code and continuing execution of the code in reverse order!).
I have copied the files from my project into the VisualStudio and resolved the issues, then copied the changes back to eclipse and voila, worked like a charm. The reasons were like vector initialization differences with reserve() and resize() functions, or trying to access elements out of the bounds of the vector array.
Hope this will help someone else.
I received a SIGTRAP from my debugger and found out that the cause was due to a missing return value.
string getName() { printf("Name!");};
Is there a way for my code to be instrumented to insert a break point or watch on a memory location that will be honored by gdb? (And presumably have no effect when gdb is not attached.)
I know how to do such things as gdb commands within the gdb session, but for certain types of debugging it would be really handy to do it "programmatically", if you know what I mean -- for example, the bug only happens with a particular circumstance, not any of the first 11,024 times the crashing routine is called, or the first 43,028,503 times that memory location is modified, so setting a simple break point on the routine or watch point on the variable is not helpful -- it's all false positives.
I'm concerned mostly about Linux, but curious about if similar solutions exist for OS X (or Windows, though obviously not with gdb).
For breakpoints, on x86 you can break at any location with
asm("int3");
Unfortunately, I don't know how to detect if you're running inside gdb (doing that outside a debugger will kill your program with a SIGTRAP signal)
GDB supports a scripting language that can help in situations like this. For example, you can trigger a bit of custom script on a breakpoint that (for example) may decided to "continue" because some condition hasn't been met.
Not directly related to your question, but may be helpful. Have you looked at backtrace and backtrace_symbol calls in execinfo.h
http://linux.die.net/man/3/backtrace
This can help you log a backtrace whenever your condition is met. It isn't gdb, so you can't break and step through your program, but may be useful as a quick diagnostic.
The commonly used approach is to use a dummy function with non-obvious name. Then, you can augment your .gdbinit or use whatever other technique to always break on that symbol name.
Trivial dummy function:
void my_dummy_breakpoint_loc(void) {}
Code under test (can be an assert-like macro):
if (rare_condition)
my_dummy_breakpoint_loc();
gdb session (obvious, eh?):
b my_dummy_breakpoint_loc
It is important to make sure that "my_dummy_breakpoint_loc" is not optimized away by compiler for this technique to work.
In the fanciest of cases, the actual assembler instruction that calls my_dummy_breakpoint_loc can be replaced by "nops" and enabled on site by site basis by a bit of code self-modification in run-time. This technique is used by Linux kernel development instrumentation, to name a one example.