How does gdb attach to multi-threaded process? - c++

I will try to be as specific as I can, but so far I have worded this problem so poorly that Google failed to return any useful results (hence my question here).
I am attaching gdb to a multi-threaded c++ server process. All I can say is that strange things have been happening while trying to do the usual set-breakpoint-break-investigate.
First, while waiting for the breakpoint to be hit (in 'Continuing' mode), I suddenly got back the (gdb) prompt with the message:
Continuing.
[Thread 0x54d5b940 (LWP 28503) exited]
[New Thread 0x54d5b940 (LWP 28726)]
Cannot get thread event message: debugger service failed
Second, also while waiting for the breakpoint to be hit, I'm suddenly told the program has received SIGSEGV and - back to the (gdb) prompt - backtrace tells me the segfault happened in pthread_cancel(). Note the process under investigation does not normally segfault.
I clearly lack enough information about how gdb works to even begin guessing what is happening. Am I doing anything wrong? The steps I take are the same each time:
gdb attach
break 'MyFunction()'
continue
Thoughts? Thanks.

I fought with similar gdb issues for a while. My case was having lots of threads spawned that executed few functions and then exited.
It appears if a thread exits too fast and there's lots of these happening sometimes gdb cannot keep up and when it fails, it fails with style as in crashes :) I think it tries to attach to a thread that is already done as per the error message.
I see this as an issue in gdb 6.5 to 7.6 and still happening. Did not try with older versions.
My advice is look for this use case or similar. Once I changed my design to have a thread serving a queue of requests gdb works flawlessly.
Design wise is healthier to have already created threads that digest actions than always spawning new threads.
Still same code debugs without a problem on Visual Studio so I do have to say that is a small disappointment to me with regards to gdb.
I use Eclipse and looking at the GDB traces (usually enabled by default) will give you a better hint of where GDB fails. One of the buttons on the console shows you the GDB trace.

Related

Attaching to gdb interupts and won't continue the process

got some big real time project to deal with (multiple processes (IPCs), multi Everything in short).
My working on process is started as service on Linux. I have the root access.
Here is the problem:
I'm trying to attach to a running proc, tried starting it through/with gdb but the result is the same: it stops the executable once I "touched" it with gdb or sometimes it throws:
Program received signal SIGUSR1, User defined signal 1. [Switching to Thread 0x7f9fe869f700 (LWP 2638)]
of course from there nothing can be done.
Tried:
handle all nostop
attach to launched as service (daemon) or launched as regular proc
started from gdb
thought maybe forking/multi-threaded problem - implemented in the very beginning sleep for 10 seconds - attached to it with "continue"
Guys, all I want it is to debug, hit the breakpoints, etc.
Please help! Share ideas.
Editing actual commands:
1) gdb attach myProcId. Then after reading symbols, I hit "c" which results:
Program received signal SIGUSR1, User defined signal 1.
[Switching to Thread 0x7f9fe869f700 (LWP 2638)]
0x00007f9fec09bf73 in select () from /lib64/libc.so.6
2) If I make the first line 10 seconds sleep in the code, attaching to the process, hit "c", result: it runs, shows info threads, backtrace of main, but never hits the breakpoint (for sure the code runs there - I get logs and different behaviour if I change code there), meaning the process is stuck.
3) All other combinations like gdb path/to/my/proc args list, then start. Where arg list played with different related options gdb gives us.
Maybe worth to mention: process network packets related, timers driven also.
But for me the important thing is a current snapshot on break, i don't care what will happen to the system after timers expired.
Since you mentioned that you are debugging a multiprocessing program, I think the underlying program you have is to set the breakpoint in the correct subprocess.
Try break fork and set follow-fork-mode child/parent. What you want to achieve is have gdb attached to the process that is running the code you want to debug.
Refer to this link.
Another thought is to generate a crash, since you can compile the programe. For example add a int i = *(int*)NULL and that will generate a core dump. You can then debug the core dump with gdb <program> <core dump>. You can refer to this page for how to configure core dump.

How can I get GDB to stop tracing a detached process?

I'm debugging a C++ application which creates trees of forks. Using GDB defaults, the child processes will be detached on the fork and as a result I see only one inferior shown afterwards.
I tried to attach to one of the child processes and despite it not being listed as an inferior for the other GDB process, in the new GDB session I get an error that the process is already being traced (by the first GDB session).
Is this expected behavior? What steps can I take to debug the forked process in a separate GDB session? What steps can I take to debug the problem further?

How to debug a rare deadlock?

I'm trying to debug a custom thread pool implementation that has rarely deadlocks. So I cannot use a debugger like gdb because I have click like 100 times "launch" debugger before having a deadlock.
Currently, I'm running the threadpool test in an infinite loop in a shell script, but that means I cannot see variables and so on. I'm trying to std::cout data, but that slow down the thread and reduce the risk of deadlocks meaning that I can wait like 1hour with my infinite before getting messages. Then I don't get the error, and I need more messages, which means waiting one more hour...
How to efficiently debug the program so that its restart over and over until it deadlocks ? (Or maybe should I open another question with all the code for some help ?)
Thank you in advance !
Bonus question : how to check everything goes fine with a std::condition_variable ? You cannot really tell which thread are asleep or if a race condition occurs on the wait condition.
There are 2 basic ways:
Automate the running of program under debugger. Using gdb program -ex 'run <args>' -ex 'quit' should run the program under debugger and then quit. If the program is still alive in one form or another (segfault, or you broke it manually) you will be asked for confirmation.
Attach the debugger after reproducing the deadlock. For example gdb can be run as gdb <program> <pid> to attach to running program - just wait for deadlock and attach then. This is especially useful when attached debugger causes timing to be changed and you can no longer repro the bug.
In this way you can just run it in loop and wait for result while you drink coffee. BTW - I find the second option easier.
If this is some kind of homework - restarting again and again with more debug will be a reasonable approach.
If somebody pays money for every hour you wait, they might prefer to invest in a software that supports replay-based debugging, that is, a software that records everything a program does, every instruction, and allows you to replay it again and again, debugging back and forth. Thus instead of adding more debug, you record a session during which a deadlock happens, and then start debugging just before the deadlock happened. You can step back and forth as often as you want, until you finally found the culprit.
The software mentioned in the link actually supports Linux and multithreading.
Mozilla rr open source replay based debugging
https://github.com/mozilla/rr
Hans mentioned replay based debugging, but there is a specific open source implementation that is worth mentioning: Mozilla rr.
First you do a record run, and then you can replay the exact same run as many times as you want, and observe it in GDB, and it preserves everything, including input / output and thread ordering.
The official website mentions:
rr's original motivation was to make debugging of intermittent failures easie
Furthermore, rr enables GDB reverse debugging commands such as reverse-next to go to the previous line, which makes it much easier to find the root cause of the problem.
Here is a minimal example of rr in action: How to go to the previous line in GDB?
You can run your test case under GDB in a loop using the command shown in https://stackoverflow.com/a/8657833/341065: gdb --eval-command=run --eval-command=quit --args ./a.out.
I have used this myself: (while gdb --eval-command=run --eval-command=quit --args ./thread_testU ; do echo . ; done).
Once it deadlocks and does not exit, you can just interrupt it by CTRL+C to enter into the debugger.
An easy quick debug to find deadlocks is to have some global variables that you modify where you want to debug, and then print it in a signal handler. You can use SIGINT (sent when you interrupt with ctrl+c) or SIGTERM (sent when you kill the program):
int dbg;
int multithreaded_function()
{
signal(SIGINT, dbg_sighandler);
...
dbg = someVar;
...
}
void dbg_sighandler(int)
{
std::cout << dbg1 << std::endl;
std::exit(EXIT_FAILURE);
}
Like that you just see the state of all your debug variables when you interrupt the program with ctrl+c.
In addition you can run it in a shell while loop:
$> while [ $? -eq 0 ]
do
./my_program
done
which will run your program forever until it fails ($? is the exit status of your program and you exit with EXIT_FAILURE in your signal handler).
It worked well for me, especially for finding out how many thread passed before and after what locks.
It is quite rustic, but you do not need any extra tool and it is fast to implement.

What is gdb/dbx doing when ddd is "waiting for it to get ready"?

I use ddd as a front-end for both gdb and dbx for C++ programs.
Quite often, without any apparent cause, I will try to next and it will hang with the message "Waiting for gdb to get ready" or "Waiting for dbx to get ready".
Does anybody know what it is that they're doing that takes forever and produces no apparent results? And can I stop it from happening?
Bear in mind that enough stuff has already been loaded that I have quite happily been stepping/nexting a minute earlier in the same process (and in the same function), so whatever they're doing doesn't seem to have been necessary for that. Also the fact that both ddd and dbx have the same pattern of behaviour (in many different executables and on different platforms) makes me think it's something in the data rather than a bug in either debugger.
GDB (and the same applies for DBX) communicate with DDD with the MI protocol, which is a standardize and unambiguous equivalent of the command-line interface.
Remark: the default in my system (Fedora 15) seems to be that they communicate directly using the CLI, but I only noticed the problem you describe with --interpret=mi.
For instance, here are the respective output to get the thread list:
(gdb) info threads
Id Target Id Frame
2 Thread 0x7ffff7fd2700 (LWP 9191) "philosophers" 0x00000037dcc0b4c5 in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
1 Thread 0x7ffff7fd3720 (LWP 9182) "philosophers" 0x00000037dc8df461 in clone () from /lib64/libc.so.6
(gdb) -thread-info
^done,threads=[
{id="2",target-id="Thread 0x7ffff7fd2700 (LWP 9525)",name="philosophers",
frame={level="0",addr="0x0000000000400b31",
func="chopsticks_put",
args=[{name="i",value="0"}],
file="chopsticks.c",fullname="philosphers/chopsticks.c",line="70"},
state="stopped",core="2"},
{id="1",target-id="Thread 0x7ffff7fd3720 (LWP 9522)",name="philosophers",
frame={...},
state="stopped",core="1"
}],current-thread-id="3"
So what you will see in DDD will be quite similar to what is available in the CLI, only the 'presentation layer' is different.
From my experience, most of GDB commands are very fast, at least when they don't depend on the debuggee execution (like a next over a sleep(5)). So there are two possibilities for your problem:
a bug in the communication: for instance a ^done tag is missed by DDD or forgotten by GDB, so DDD waits in vain for the termination of its request
DDD asks GDB for a lot of data, like the definition of structures, function locations or memory contents, etc. (for instance because of the elements you want to watch), so it will take some time for the information to be computed by GDB and transferred to DDD.
At the bottom of DDD you have the GDB console. Try typing some GDB commands in there. If GDB responds correctly (my case) it means that DDD is not synchronized with GDB anymore. (DDD is getting old, 2009/02/11, and MI is extensively used by Eclise, so I think we know who has to be blamed...!)

Help with a cryptic error message with KGDB - Bogus trace status reply from target: E22

I'm using gdb to connect to a 2.6.31.13 linux kernel patched with KGDB over Ethernet, and when I try to detach the debugger I get this:
(gdb) quit
A debugging session is active.
Inferior 1 [Remote target] will be killed.
Quit anyway? (y or n) y
Bogus trace status reply from target: E22
after that the session is still open, I can keep going on and on with ctrl+d, and the debugger doesn't exit.
I've searched for that message in google and there are just 5 results (and none of them are useful :-/ ).
Any idea of what could it be and how to fix it?
If you cleared all breakpoints on the target and "C"ontinued from the latest breakpoint (assuming that the target code didn't crash, etc.), I think you'll be safe: when running, kgdb won't be talking to your gdb anyway, since if I recall, it only handles the link when stopped (in a breakpoint or exception) awaiting for commands.
A few Ctrl-C in a fast sequence if needed to get control back in gdb, then "q", and that's it.
That's at least my experience when debugging ko's...
I suspect gdb is saying this because it doesn't realize that it is talking to a kgdb rather than to a remote gdb server. I don't imagine kgdb accepting to kill a kernel thread because the debugger was exited, anyway!
Hmmm, afterthought:
You're talking about kgdb 'lite', the one now part of the kernel tree, are you? Because that's the only one I have experience with...
PS on June, 3:
I had never seen the exact message you mentioned until I moved to the 2.6.32 kernel (as part of the migration of my dev and target machines to Lucid). And then, surprise, I ran into it too. Here, it seems to happen in hopeless situations (like a segfault or kgbd seemingly running away after missing a breakpoint or single step). The only cure I have found so far was to pkill ddd (gdb) on the dev machine and reboot the target. But the good news is that the kgdb in 2.6.32 seems to be quite more stable than the one in Karmic (2.6.31).
ctrl + z should help you quit.