When debugging a C++ program emit a SIGSEGV with gdb,it is possible to handle the signal and asked to nostop.
How gdb handles this kind of scenario ??
Have searched gdb source code and couldn't find a starting point.
You cannot automatically ignore SIGSEGV. I also wouldn't recommend doing that anyway. Although you can make gdb ignore the signal and not pass it to the program, the kernel will attempt to re-run the offending instruction once the signal handler returns and results in an infinite loop. See this answer for more information.
One way to work around it is to the skip the instruction or change register values so that it does not segfault. The link shows an example of setting a register. You can also use the jump command to skip over an instruction.
is possible to handle the signal and asked to nostop.
It's unclear whether you want GDB to handle the signal, or the program itself.
If the latter, gdb handle SIGSEGV nostop noprint pass will do exactly that.
This is actually something the OS does. On Windows, if a program has a debugger attached and an exception is thrown, Windows will ask the debugger if it wants to handle it. If/when it declines, it passes it to the program. If the program doesn't handle it, Windows passes it to the debugger again.
Related
I'm wondering if there is a way to catch all segment fault/core dump, and print its callstack? Catching all signals look like a doable way, but I'm not pretty sure how it works, according to some of my experience, it did not always give the outcome I want, sometimes it just failed to catch that core dump, maybe I did something wrong.
The reason I ask this is that I usually debug with a very complicated system, and many segmentfault errors are hard to reproduce, and not practical to run it with gdb line by line. So if I can catch all segmentfault and print some call stack or other information, that would be great to help me debugging.
I'm wondering if there is a way to catch all segment fault ...
Certainly. This can be achieved by registering a signal handler using std::signal.
... and print its callstack?
This is a quite a lot trickier. There is no standard way to inspect that call stack in C++. Linux has several ways, but unfortunately the most conventional way of using the POSIX standard backtrace function is not async-signal safe, so there is no guarantee that it would work in a signal handler.
A simpler approach is to not do this in a signal handler, but instead configure Linux to generate a core dump. You'll get much more information out of that.
... core dump, and print its callstack?
Certainly. You can use a debugger:
gdb /path/to/executable /path/to/corefile
(gdb) bt
and not practical to run it with gdb line by line.
Then run it in gdb but not line by line. Simply run the program within gdb until it segfaults, then print the backtrace.
So I am trying to use GDB. I compile my code with -g, then gdb ./a.out
GNU gdb starts, but after I type r to start, the program runs like it normally would if I just called ./a.out.
Do you know what could cause this? I don't know much about gdb and I use it lightly , I've been using it the same way for a while and never encountered this type of behavior.
Edit: It works when I set up breakpoints. But I am still confused as to why I was able to use it for months without setting any breakpoints before.
Do you know what could cause this?
This is intended behavior. The run command starts the execution of the inferior (being debugged) program.
That program may encounter a bug (e.g. a crash), in which case GDB will be notified and would stop the execution of the inferior, and let you look around.
The program may also encounter a breakpoint that you have inserted earlier, again allowing you to look around at the current state.
Or the program may run to completion (if it doesn't execute any code in which you have set breakpoints, or if you didn't set any, and if it doesn't have any bugs that manifest in a fatal signal). If that happens, you'll get a 'program exited normally' message.
I am still confused as to why I was able to use it for months without setting any breakpoints before.
Your program was probably crashing, and now doesn't.
By control transfer, I mean, after the tracee executing a function and return, which signal is generated so that GDB can wait*() on it and seize control again? It is not SIGTRAP though many people claim that ...
after the tracee executing a function and return, which signal is generated so that GDB can wait*() on it and seize control again?
The tracee is stopped, and control is transferred back to GDB, only when one of "interesting" events happens.
The interesting events are:
A breakpoint fires,
The tracee encounters a signal (e.g. SIGSEGV or SIGFPE as a result of performing invalid memory access or invalid floating-point operation),
The tracee disappears altogether (such as getting SIGKILLed by an outside program),
[There might be other "interesting" events, but I can't think of anything else right now.]
Now, a technically correct answer to "what signal does GDB use ..." is: none at all. The control isn't transferred, unless one of above events happen.
Perhaps your question is: how does control get back to GDB after executing something like finish command (which steps out of the current function)?
The answer to that is: GDB sets a temporary breakpoint on the instruction immediately after the CALL instruction that got us into the current function.
Finally, what causes the kernel to stop tracee and make waitpid in GDB to return upon execution of the breakpoint instruction?
On x86, GDB uses the INT3 (opcode 0xCC) instruction to set breakpoints (there is an alternate mechanism using debug registers, but it is limited to 4 simultaneous breakpoints, and usually reserved for hardware watchpoints instead). When the tracee executes INT3 instruction, SIGTRAP is indeed the signal that the kernel generates (i.e. other answers you've found are correct).
Without knowing what led you to believe it isn't SIGTRAP, it's hard to guess how you convinced yourself that it isn't.
Update:
I try to manually send a SIGTRAP signal to the tracee, trying to causing a spuriously wake-up of GDB, but fail.
Fail in what way?
What I expect you observe is that GDB stops with Program received signal SIGTRAP .... That's because GDB knows where it has placed breakpoints.
When GDB receives SIGTRAP and the tracee instruction pointer matches one of its breakpoints, then GDB "knows" that is's the breakpoint that has fired, and acts accordingly.
But when GDB receives SIGTRAP and the tracee IP doesn't match any of the breakpoints, then GDB treats it as any other signal: prints a message and waits for you to tell it what to do next.
"GDB sets a temporary breakpoint ... that means GDB has to modify tracee's code area, which may be read-only. So, how does GDB cope with that?
You are correct: GDB needs to modify (typically non-writable) .text section to insert any breakpoint using INT3 method. Fortunately, that is one of the "superpowers" granted to it by the kernel via ptrace(POKE_TEXT, ...).
P.S. It's a fun exercise to white a program that checksums code bytes of one of its own functions. You can then perform the checksum before and after placing a breakpoint on the "to be checksummed" function, and observe that the checksum differs when a breakpoint is present.
P.P.S. If you are curious about what GDB is doing, setting maintenance debug inferior will provide a lot of clues.
I have a program that depends on an external shared library, but after a function inside the library gets executed I lose the ability to use breakpoints.
I can break and step like normal up until I execute this function, but afterwards it is finished it never breaks. It won't even break on main if I try and use start for the second time executing the program. It's not an inlined function problem, because I've broken on these functions before and when I comment out this particular function everything starts to work again.
Has anyone ever encountered anything like this before? What can I do?
Using gdb 7.1 with gcc 3.2.3
Edit:
After some hints from users I figured out that the process is forking inside the library call. I'm not sure what it's doing (and I really don't care). Can I somehow compensate for this? I've been experimenting with the follow-fork-mode as child, but I'm really confused what happens once it forks and I can't seem to figure out how to continue execution or do anything of use.
Edit:
Further investigation. The nearest I can tell, gdb is losing all its symbol information somewhere. After the 2nd run, all symbols resolve to the #plt address and not to the actual address that they resolved to on the first run. Like somehow the second loading of the process loses all the info it gained the first time around and refuses to reload it. I'm so confused!!
Edit:
So I traced down the problem to the vfork of a popen call. Apparently gdb doesn't play nice with popen? As soon as I detach from the popen'd vforked process, I lose all my symbols. I've read some reports online about this as well. Is there any hope?
It's the combination of vfork(2) and exec(2) that's messing things up. Quoting from gdb manual (debugging forks):
On some systems, when a child process is spawned by vfork, you cannot debug the child or parent until an exec call completes.
...
By default, after an exec call executes, gdb discards the symbols of the previous executable image. You can change this behaviour with the set follow-exec-mode command.
Keep follow-fork-mode set to parent and set follow-exec-mode to same.
Alternatively, if your installation supports multi-process debugging (it should, since your gdb version is 7.1), try using info inferiors to find your original process and switching to it with inferior <num>.
I want to be able to detect when a write to memory address occurs -- for example by setting a callback attached to an interrupt. Does anyone know how?
I'd like to be able to do this at runtime (possibly gdb has this feature, but my particular
application causes gdb to crash).
If you want to intercept writes to a range of addresses, you can use mprotect() to mark the memory in question as non-writeable, and install a signal handler using sigaction() to catch the resulting SIGSEGV, do your logging or whatever and mark the page as writeable again.
What you need is access to the X86 debug registers: http://en.wikipedia.org/wiki/Debug_register
You'll need to set the breakpoint address in one of DR0 to DR3, and then the condition (data write) in DR7. The interrupt will occur and you can run your debug code to read DR6 and find what caused the breakpoint.
If GDB doesn't work, you might try a simpler/smaller debugger such as http://sourceforge.net/projects/minibug/ - if that isn't working, you can at least go through the code and understand how to use the debugging hardware on the processor yourself.
Also, there's a great IBM developer resource on mastering linux debugging techniques which should provide some additional options:
http://www.ibm.com/developerworks/linux/library/l-debug/
A reasonably good article on doing this is windows is here (I know you're running on linux, but others might come along to this question wanting to do it in windows):
http://www.codeproject.com/KB/debug/hardwarebreakpoint.aspx
-Adam
GDB does have that feature: it is called hardware watchpoints, and it is very well supported on Linux/x86:
(gdb) watch *(int *)0x12345678
If your application crashes GDB, build current GDB from CVS Head.
If that GDB still fails, file a GDB bug.
Chances are we can fix GDB faster than you can hack around SIGSEGV handler (provided a good test case), and fixes to GDB help you with future problems as well.
mprotect does have a disadvantage: your memory must be page-boundary aligned. I had my problematic memory on the stack and was not able to use mprotect().
As Adam said, what you want is to manipulate the debug registers. On windows, I used this: http://www.morearty.com/code/breakpoint/ and it worked great. I also ported it to Mach-O (Mac OS X), and it worked great, too. It was also easy, because Mach-O has thread_set_state(), which is equivalent to SetThreadContext().
The Problem with linux is that it doesn't have such equivalents. I found ptrace, but I thought, this can't be it, there must be something simpler. But there isn't. Yet. I think they are working on a hw_breakpoint API for both kernel and user space. (see http://lwn.net/Articles/317153/)
But when I found this: http://blogs.oracle.com/nike/entry/memory_debugger_for_linux I gave it a try and it wasn't that bad. The ptrace method works by some "outside process" acting as a "debugger", attaching to your program, injecting new values for the debug registers, and terminating with your program continuing with a new hw breakpoint set. The thing is, you can create this "outside process" yourself by using fork(), (I had no success with a pthread), and doing these simple steps inline in your code.
The addwatchpoint code must be adapted to work with 64 bit linux, but that's just changing USER_DR7 etc. to offsetof(struct user, u_debugreg[7]). Another thing is that after a PTRACE_ATTACH, you have to wait for the debuggee to actually stop. But instead of retrying a POKEUSER in a busy loop, the correct thing to do would be a waitpid() on your pid.
The only catch with the ptrace method is that your program can have only one "debugger" attached at a time. So a ptrace attach will fail if your program is already running under gdb control. But just like the example code does, you can register a signal handler for SIGTRAP, run without gdb, and when you catch the signal, enter a busy loop waiting for gdb to attach. From there you can see who tried to write your memory.