My code on a cluster running PBS fails with a "floating point exception". That means, when I submit the code via qsub, it returns a fail status and aborts. I would like to attach GDB to the program and configure it in a way that when it hits an exception, prints the stack trace before abortion. However, I haven't found out how to do that.
Any thoughts on that?
Related
I have a double-free bug. I am able to reproduce it using a debug build with Address Sanitizer (AS) detects but when I run under GDB, AS kills the GDB session.
I found this Address Sanitizer page with instructions how to keep GDB:
https://github.com/google/sanitizers/wiki/AddressSanitizerAndDebugger
but when I do:
(gdb) break __asan::ReportGenericError
at the beginning of the session, the GDB state still disappears after the problem is detected:
(gdb) bt
No stack.
the GDB state still disappears after the problem is detected
There are several possible reasons for this:
Somehow you didn't set the breakpoint correctly
It's actually a child process that is dying
Somehow the thread in which the error is detected is not attached by GDB.
To eliminate 1, use catch syscall exit_group (and possibly also catch syscall exit) -- this way GDB is sure to stop before the process disappears.
For 2, AddressSanitizer message should indicate the thread id in which the error is detected, and that id should match one of the threads in GDB info thread output.
For 3, we'd need to understand more about how that thread was created.
My C++ application is giving me some strange output but it runs to completion, I'd like to examine the stack trace but since it isn't segfaulting it is harder to pinpoint where it is. I've tried setting break points on exit, _exit, and abort, but when I call the stack I get something like this
#0 0x00002aaaab1a7620 in exit () from /lib64/libc.so.6
#1 0x000000000041f19e in main ()
This is probably because my application has a perl front end wrapped with sig, is there another way to generate a stack upon completion?
That's what the stack trace should look like. main calls exit at the end and you print the stack trace inside exit. You cannot find where something happens with a stack trace. Once you find where something happens, you can get the stack trace there and find out how the execution got there.
Whatever output you're looking for happened before exit and that function has already returned by the time you get the stack trace. So, you need to put your break point before the output happens. Then you can go over the code line by line to find out which line does the output.
did you build it in debug?
add a breakpoint with "b function_name"
press c to continue
type bt to print the backtrace
I will try to be as specific as I can, but so far I have worded this problem so poorly that Google failed to return any useful results (hence my question here).
I am attaching gdb to a multi-threaded c++ server process. All I can say is that strange things have been happening while trying to do the usual set-breakpoint-break-investigate.
First, while waiting for the breakpoint to be hit (in 'Continuing' mode), I suddenly got back the (gdb) prompt with the message:
Continuing.
[Thread 0x54d5b940 (LWP 28503) exited]
[New Thread 0x54d5b940 (LWP 28726)]
Cannot get thread event message: debugger service failed
Second, also while waiting for the breakpoint to be hit, I'm suddenly told the program has received SIGSEGV and - back to the (gdb) prompt - backtrace tells me the segfault happened in pthread_cancel(). Note the process under investigation does not normally segfault.
I clearly lack enough information about how gdb works to even begin guessing what is happening. Am I doing anything wrong? The steps I take are the same each time:
gdb attach
break 'MyFunction()'
continue
Thoughts? Thanks.
I fought with similar gdb issues for a while. My case was having lots of threads spawned that executed few functions and then exited.
It appears if a thread exits too fast and there's lots of these happening sometimes gdb cannot keep up and when it fails, it fails with style as in crashes :) I think it tries to attach to a thread that is already done as per the error message.
I see this as an issue in gdb 6.5 to 7.6 and still happening. Did not try with older versions.
My advice is look for this use case or similar. Once I changed my design to have a thread serving a queue of requests gdb works flawlessly.
Design wise is healthier to have already created threads that digest actions than always spawning new threads.
Still same code debugs without a problem on Visual Studio so I do have to say that is a small disappointment to me with regards to gdb.
I use Eclipse and looking at the GDB traces (usually enabled by default) will give you a better hint of where GDB fails. One of the buttons on the console shows you the GDB trace.
I am trying to catch floating point exception (SIGFPE) in GDB, not pass it to the process and continue debugging onwards.
I have given gdb this:
handle SIGFPE stop nopass
When a SIGFPE occurs GDB stops at the correct place. The problem is I can't and don't know how can I continue debugging.
I have tried giving GDB
continue
or
signal 0
but it still hangs on the offending line and refuses to continue.
Is there a way to continue debugging after receiving a signal?
I am using GDB 7.5.1, which I have compiled myself and I have also tried with GDB 7.4, which comes with my 12.04 Ubuntu distribution. Both have the same behaviour.
The problem is that when you continue a program after a synchronous signal, it reexecutes the same instruction that caused the signal, which means you'll just get the signal again. If you tell it to ignore the signal (either directly or via gdb) it will go into a tight loop reexecuting that instruction repeatedly.
If you want to actually continue the program somewhere after the instruction that causes the signal, you need to manually set the $pc register to the next (or some other) instruction before issuing the continue command.
I'm trying to debug proftpd in order to understand better this exploit http://www.phrack.org/issues.html?issue=67&id=7. The vulnerable section is in mod_sql.c, I have tried to breakpoint the sql_prepare_where function (that is where the heap overflow is done) and then call the USER ... and PASS ... command but it is never triggered.
To find out why I have breakpoints all the hundreds line of mod_sql.c and then launch the program (with full debugging option), some breakpoints are triggered (sql_setuserinfo, set_sqlauthenticate, get_auth_entry...) but only at the very beginning of the launching process, then when the program goes in it main loop nothing else breakpoint related happens (while the log of proftpd mentions that the USER and PASS commands are dispatched to mod_sql.c)..
Would anyone know what I'm missing?
[ It's possible I am missing something essential of GDB, I am learning on the roll :) ]
Server programs often use a "separate program for each connection" method, where after successful accept, the parent forks a child to handle current connection, and goes back to accepting more connections.
I don't know for sure, but if proftpd used that model, it would explain exactly the symptoms you've described.
You can ask GDB to debug the child instead of the parent, by using (gdb) set follow-fork-mode child.