Why is GDB not able to see my local variable i ?
I am debugging a multi-process application running on an embedded linux device.
I am having a hard time attaching to a child process (note that I cannot start the child process by itself, it needs the parent to run it. )
Using set follow-fork-mode child did not work (gdb hangs while loading dynamic libraries).
My strategy is to have this in my code to stop the child
#ifdef DEBUG
int i = 0;
while (i == 0)
{
usleep(100000); // sleep for 0.1 seconds
}
#endif // DEBUG
To have time to attach to the child using gdbserver --attach :1234 <child-pid> and then, after setting a break point where I want (line 200 in main.cpp), to issue the following command in gdb : set var i = 1 to break out of the while loop.
This isn't working and I am wondering why. Here is the output from gdb:
(gdb) bt
#0 0x00007ff3feab7ff0 in nanosleep () from target:/lib/libc.so.6
#1 0x00007ff3feae1c14 in usleep () from target:/lib/libc.so.6
#2 0x000055cc4e56908a in main (argc=<optimized out>, argv=<optimized out>)
at platform/aas/intc/ucmclient/src/main.cpp:195
(gdb) break main.cpp:200
Breakpoint 1 at 0x55cc4e56908c: file platform/aas/intc/ucmclient/src/main.cpp, line 200.
(gdb) set var i=1
No symbol "i" in current context.
The backtrace seems to show I am in the while loop. Why then is the variable i not in the locals?
Related
My app is randomly (once a day) crashed and I have tried several ways to find out the reason but no luck.
With other core dump or segmentation fault cases, I can locate where does it happen by gdb, but for this case, gdb don't give me too much hint.
I need some advice for my continuous debugging, please help.
GDB output when my app crashed
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/greystone/myapp/myapp'.
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0 0x00007f5d3a435afb in g_logv () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
[Current thread is 1 (Thread 0x7f5cea3d4700 (LWP 14353))]
(gdb) bt full
#0 0x00007f5d3a435afb in g_logv () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
No symbol table info available.
#1 0x00007f5d3a435c6f in g_log () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
No symbol table info available.
#2 0x00007f5d3a472742 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
No symbol table info available.
#3 0x00007f5d3a42cab3 in g_main_context_new () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
No symbol table info available.
#4 0x00007f5d3f4894c9 in QEventDispatcherGlibPrivate::QEventDispatcherGlibPrivate(_GMainContext*) () from /opt/Qt5.9.2/5.9.2/gcc_64/lib/libQt5Core.so.5
No symbol table info available.
#5 0x00007f5d3f4895a1 in QEventDispatcherGlib::QEventDispatcherGlib(QObject*) () from /opt/Qt5.9.2/5.9.2/gcc_64/lib/libQt5Core.so.5
No symbol table info available.
#6 0x00007f5d3f266870 in ?? () from /opt/Qt5.9.2/5.9.2/gcc_64/lib/libQt5Core.so.5
No symbol table info available.
#7 0x00007f5d3f267758 in ?? () from /opt/Qt5.9.2/5.9.2/gcc_64/lib/libQt5Core.so.5
No symbol table info available.
#8 0x00007f5d3efa76ba in start_thread (arg=0x7f5cea3d4700) at pthread_create.c:333
__res =
pd = 0x7f5cea3d4700
now =
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140037043603200, 4399946704104667801, 0, 140033278038543, 8388608, 140037073195984, -4344262468029171047, -4344357617020880231}, mask_was_saved = 0}},
priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call =
pagesize_m1 =
sp =
freesize =
__PRETTY_FUNCTION__ = "start_thread"
#9 0x00007f5d3e43c41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
Solutions I have tried
Search topic related with SIGTRAP
People said it is in debug mode and there are somewhere in the code set break point. However, my app is compiled in release mode without break point.
Catch signal handler and ignore SIGTRAP
No success, I can only ignore SIGTRAP sent by "kill -5 pid". With the SIGTRAP occurs randomly in runtime, my app is still crashed
Fix memory leak in code
Initialize pointer with nullptr
Double check mysql C API race conditions
Double check delete array action and double check assign value for the index out of array boundaries
Check signals and slots
My app is built on Qt frameworks as a GUI application, there are many signals and slots I have checked but no ideas how are they related to SIGTRAP core dump.
Check exceptions for opencv
I use opencv for image processing tasks. I have checked for exception cases
Shared memory
Memory shared between main process and sub processes were carefully checked
Example code
A lot of code in my app, but because gdb don't give me exactly where does it happen, so I don't know which code I should share. If you need it for checking for suggestion, please tell me which part of the code you would like to check. My app have these following parts.
Mysql in C api mysql 5.7.29
User interface (alot) by Qt framework 5.9.2
Image processing with opencv 2.4.9
Process flow in multi threading by Qt framework 5.9.2
If there is any ideas, please share me some keywords then I could search about it and apply to my app. Thanks for your help.
for this case, gdb don't give me too much hint
GDB tells you exactly what happened, you just didn't understand it.
What's happening is that some code in libglib called g_logv(..., G_LOG_FLAG_FATAL, ...), which eventually calls _g_log_abort(), which executes int3 (debug breakpoint) instruction.
You should be able to (gdb) x/i 0x00007f5d3a435afb and see that instruction.
It looks like g_main_context_new() may have failed to allocate memory.
In any case, you should look in the application stderr logs for the reason libglib is terminating your program (effectively, libglib calls an equivalent of abort, because some precondition has failed).
I have a process which uses almost 16 threads. On of the thread id getting exited by itself after working for some time (please note process didn't crash). I am not sure how to look for reasons for thread exiting ? I tried using print statement but that doesn't seems to help. Tried capturing through gdb didn't help. If this is some sort of memory corruption then the process should crash and core file should have tell everything (most of the time) but process remains running just the thread exiting is making my job difficult..
Can you please suggest some way to debug this issue ?
-Arpit
I am not sure how to look for reasons for thread exiting
There are only two ways that I can think of for this to happen:
The thread function returns
Something the thread function calls executes syscall(SYS_exit, ...)
You can use (gdb) catch syscall exit and then where to find out where this is happening.
If where shows something similar to this:
(gdb) where
#0 0x00007ffff7bc3556 in __exit_thread () at ../sysdeps/unix/sysv/linux/exit-thread.h:36
#1 start_thread (arg=0x7ffff781c700) at pthread_create.c:478
#2 0x00007ffff7905a8f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
then you have case 1, and you need to step through your thread function to find out where it is returning (https://rr-project.org/ may help a lot).
If instead you see something like this:
(gdb) where
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x0000555555554746 in do_work () at t.c:9
#2 0x0000555555554762 in fn (p=0x0) at t.c:15
#3 0x00007ffff7bc3494 in start_thread (arg=0x7ffff781c700) at pthread_create.c:333
#4 0x00007ffff7905a8f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
then the culprit is whatever routine called syscall.
I have written a Qt5/C++ program which forks and runs in the background, and stops in response to a signal and shuts down normally. All sounds great, but when I "ps ax | grep myprog" I see a bunch of my programs still running; eg:
29244 ? Ss 149:47 /usr/local/myprog/myprog -q
30913 ? Ss 8:37 /usr/local/myprog/myprog -q
32484 ? Ss 0:11 /usr/local/myprog/myprog -q
If I run the program in the foreground then the process does NOT hang around on the process list - it dies off as expected. This only happens when in the background. Why?
Update: I found that my program is in futex_wait_queue_me state (queue_me and wait for wakeup, timeout, or signal). I do have 3 seperate threads - and that may be related. So I attached a debugger to one of the waiting processes and found this:
(gdb) bt
#0 0x000000372460b575 in pthread_cond_wait##GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f8990fb454b in QWaitCondition::wait(QMutex*, unsigned long) ()
from /opt/Qt/5.1.1/gcc_64/lib/libQt5Core.so.5
#2 0x00007f8990fb3b3e in QThread::wait(unsigned long) () from /opt/Qt/5.1.1/gcc_64/lib/libQt5Core.so.5
#3 0x00007f8990fb0402 in QThreadPoolPrivate::reset() () from /opt/Qt/5.1.1/gcc_64/lib/libQt5Core.so.5
#4 0x00007f8990fb0561 in QThreadPool::waitForDone(int) () from /opt/Qt/5.1.1/gcc_64/lib/libQt5Core.so.5
#5 0x00007f89911a4261 in QMetaObject::activate(QObject*, int, int, void**) ()
from /opt/Qt/5.1.1/gcc_64/lib/libQt5Core.so.5
#6 0x00007f89911a4d5f in QObject::destroyed(QObject*) () from /opt/Qt/5.1.1/gcc_64/lib/libQt5Core.so.5
#7 0x00007f89911aa3ee in QObject::~QObject() () from /opt/Qt/5.1.1/gcc_64/lib/libQt5Core.so.5
#8 0x0000000000409d8b in main (argc=1, argv=0x7fffba44c8f8) at ../../src/main.cpp:27
(gdb)
Update:
I commented out my 2 threads, so only the main thread runs now, and the problem is the same.
Is there a special way to cause a background process to exit? Why won't the main thread shutdown?
Update:
Solved - Qt does not like Fork. (See another StackExchane questoin). I had to move my fork to the highest level (before Qt does anything), and then Qt doesn't hang on exit.
http://man7.org/linux/man-pages/man1/ps.1.html#PROCESS_STATE%20CODES
PROCESS STATE CODES
Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will display to describe the state of a process:
D uninterruptible sleep (usually IO)
R running or runnable (on run queue)
S interruptible sleep (waiting for an event to complete)
T stopped, either by a job control signal or because it is being traced.
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z defunct ("zombie") process, terminated but not reaped by its parent.
For BSD formats and when the stat keyword is used, additional characters may be displayed:
< high-priority (not nice to other users)
N low-priority (nice to other users)
L has pages locked into memory (for real-time and custom IO)
s is a session leader
l is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)
+ is in the foreground process group.
So your processes are all in "S: interruptible sleep." That is, they are all waiting for blocking syscalls.
You might have better hints on what your programs are waiting for from this command:
$ ps -o pid,stat,wchan `pidof zsh`
PID STAT WCHAN
4490 Ss rt_sigsuspend
4814 Ss rt_sigsuspend
4861 Ss rt_sigsuspend
4894 Ss+ n_tty_read
5744 Ss+ n_tty_read
...
"wchan (waiting channel)" shows a kernel function (=~ syscall) which is blocking.
See also
https://askubuntu.com/questions/19442/what-is-the-waiting-channel-of-a-process
I have a strange problem that I can't solve. Please help!
The program is a multithreaded c++ application that runs on ARM Linux machine. Recently I began testing it for the long runs and sometimes it crashes after 1-2 days like so:
*** glibc detected ** /root/client/my_program: free(): invalid pointer: 0x002a9408 ***
When I open core dump I see that the main thread it seems has a corrupt stack: all I can see is infinite abort() calls.
GNU gdb (GDB) 7.3
...
This GDB was configured as "--host=i686 --target=arm-linux".
[New LWP 706]
[New LWP 700]
[New LWP 702]
[New LWP 703]
[New LWP 704]
[New LWP 705]
Core was generated by `/root/client/my_program'.
Program terminated with signal 6, Aborted.
#0 0x001c44d4 in raise ()
(gdb) bt
#0 0x001c44d4 in raise ()
#1 0x001c47e0 in abort ()
#2 0x001c47e0 in abort ()
#3 0x001c47e0 in abort ()
#4 0x001c47e0 in abort ()
#5 0x001c47e0 in abort ()
#6 0x001c47e0 in abort ()
#7 0x001c47e0 in abort ()
#8 0x001c47e0 in abort ()
#9 0x001c47e0 in abort ()
#10 0x001c47e0 in abort ()
#11 0x001c47e0 in abort ()
And it goes on and on. I tried to get to the bottom of it by moving up the stack: frame 3000 or even more, but eventually core dump runs out of frames and I still can't see why this has happened.
When I examine the other threads everything seems normal there.
(gdb) info threads
Id Target Id Frame
6 LWP 705 0x00132f04 in nanosleep ()
5 LWP 704 0x001e7a70 in select ()
4 LWP 703 0x00132f04 in nanosleep ()
3 LWP 702 0x00132318 in sem_wait ()
2 LWP 700 0x00132f04 in nanosleep ()
* 1 LWP 706 0x001c44d4 in raise ()
(gdb) thread 5
[Switching to thread 5 (LWP 704)]
#0 0x001e7a70 in select ()
(gdb) bt
#0 0x001e7a70 in select ()
#1 0x00057ad4 in CSerialPort::read (this=0xbea7d98c, string_buffer=..., delimiter=..., timeout_ms=1000) at CSerialPort.cpp:202
#2 0x00070de4 in CScanner::readResponse (this=0xbea7d4cc, resp_recv=..., timeout=1000, delim=...) at PidScanner.cpp:657
#3 0x00071198 in CScanner::sendExpect (this=0xbea7d4cc, cmd=..., exp_str=..., rcv_str=..., timeout=1000) at PidScanner.cpp:604
#4 0x00071d48 in CScanner::pollPid (this=0xbea7d4cc, mode=1, pid=12, pid_str=...) at PidScanner.cpp:525
#5 0x00072ce0 in CScanner::poll1 (this=0xbea7d4cc)
#6 0x00074c78 in CScanner::Poll (this=0xbea7d4cc)
#7 0x00089edc in CThread5::Thread5Poll (this=0xbea7d360)
#8 0x0008c140 in CThread5::run (this=0xbea7d360)
#9 0x00088698 in CThread::threadFunc (p=0xbea7d360)
#10 0x0012e6a0 in start_thread ()
#11 0x001e90e8 in clone ()
#12 0x001e90e8 in clone ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(Classes and functions names are a bit wierd because I changed them -:)
So, thread #1 is where the stack is corrupt, backtrace of every other (2-6) shows
Backtrace stopped: previous frame identical to this frame (corrupt stack?).
It happends because threads 2-6 are created in the thread #1.
The thing is that I can't run the program in gdb because it runs on an embedded system. I can't use remote gdb server. The only option is examining core dumps that occur not very often.
Could you please suggest something that could move me forward with this? (Maybe something else I can extract from the core dump or maybe somehow to make some hooks in the code to catch abort() call).
UPDATE: Basile Starynkevitch suggested to use Valgrind, but turns out it's ported only for ARMv7. I have ARM 926 which is ARMv5, so this won't work for me. There are some efforts to compile valgrind for ARMv5 though: Valgrind cross compilation for ARMv5tel, valgrind on the ARM9
UPDATE 2: Couldn't make Electric Fence work with my program. The program uses C++ and pthreads. The version of Efence I got, 2.1.13 crashed in a arbitrary place after I start a thread and try to do something more or less complicated (for example to put a value into an STL vector). I saw people mentioning some patches for Efence on the web but didn't have time to try them. I tried this on my Linux PC, not on the ARM, and other tools like valgrind or Dmalloc don't report any problems with the code. So, everyone using version 2.1.13 of efence be prepared to have problems with pthreads (or maybe pthread + C++ + STL, don't know).
My guess for the "infinite' aborts is that either abort() causes a loop (e.g. abort -> signal handler -> abort -> ...) or that gdb can't correctly interpret the frames on the stack.
In either case I would suggest manually checking out the stack of the problematic thread. If abort causes a loop, you should see a pattern or at least the return address of abort repeating every so often. Perhaps you can then more easily find the root of the problem by manually skipping large parts of the (repeating) stack.
Otherwise, you should find that there is no repeating pattern and hopefully the return address of the failing function somewhere on the stack. In the worst case such addresses are overwritten due to a buffer overflow or such, but perhaps then you can still get lucky and recognise what it is overwritten with.
One possibility here is that something in that thread has very, very badly smashed the stack by vastly overwriting an on-stack data structure, destroying all the needed data on the stack in the process. That makes postmortem debugging very unpleasant.
If you can reproduce the problem at will, the right thing to do is to run the thread under gdb and watch what is going on precisely at the moment when the the stack gets nuked. This may, in turn, require some sort of careful search to determine where exactly the error is happening.
If you cannot reproduce the problem at will, the best I can suggest is very carefully looking for clues in the thread local storage for that thread to see if it hints at where the thread was executing before death hit.
I was debugging my app with gdb.
I used break main
So it can break when main is called.
Know if I use thread info it shows that thread count is 1.
How a thread is starting before main ?
I don't have any thread call in my call so from where thread is getting created. I am using these libs
sqlite , curl , pcre , c-client
Update
I have written a sample program to test that if all program start with single thread
#include<iostream>
int main(int argc,char *argv[]){
std:: cout<<"Will I have any thread";
return 0;
}
but when I debug it with gdb
(gdb) break main
Breakpoint 1 at 0x400783: file threadtest.cpp, line 3.
(gdb) run
Starting program: /home/vivek/Desktop/a.out
Breakpoint 1, main (argc=1, argv=0x7fffffffe728) at threadtest.cpp:3
3 std:: cout<<"Will I have any thread";
(gdb) info threads
* 1 process 21608 main (argc=1, argv=0x7fffffffe728) at threadtest.cpp:3
(gdb)
it doesn't show the same information. It show 1 process not 1 thread.
When I compile it with -lpthread it show 1 thread.
So program start with one thread when we use lpthread ?
or GDB behaves like that ?
All programs have at least 1 thread, the main thread. The program is started before main since the C++ runtime does some initializing before main() starts, like calling all global objects which have constructors.
The operating system creates a process space with one thread and calls the application loader to execute the application in that thread, which in turns performs some initial setup (gathering command line arguments into argc and argv, for example) and calls main.
For the sample App when I compile it with -lpthread it shows 1 thread is running.
So lpthread is playing key point here.
(gdb) break main
Breakpoint 1 at 0x400793: file threadtest.cpp, line 3.
(gdb) run
Starting program: /home/vivek/Desktop/a.out
[Thread debugging using libthread_db enabled]
Breakpoint 1, main (argc=1, argv=0x7fffffffe728) at threadtest.cpp:3
3 std:: cout<<"Will I have any thread";
(gdb) info threads
* 1 Thread 0x2aaaaaac8bb0 (LWP 21649) main (argc=1, argv=0x7fffffffe728)
at threadtest.cpp:3
(gdb)