I'm learning how to use GDB now. And I'm having some trouble diagnosing what went wrong.
When I run a bt full, I have the following result:
(gdb) bt full
#0 0x43675798 in memcpy ()
from /armv7a-vfp-neon-oe-linux-gnueabi/lib/libc.so.6
No symbol table info available.
#1 0x0001e8c4 in write_utf8_string ()
No symbol table info available.
#2 0x0001dd80 in connect ()
No symbol table info available.
#3 0x0000f940 in connect_to_broker ()
No symbol table info available.
#4 0x0000fe24 in network ()
No symbol table info available.
#5 0x0000c0a8 in configuration_client ()
No symbol table info available.
#6 0x0000f1b4 in connection_threaded ()
No symbol table info available.
#7 0x437761b0 in start_thread ()
from /lib/libpthread.so.0
No symbol table info available.
#8 0x436cddb0 in ?? ()
from armv7a-vfp-neon-oe-linux-gnueabi/lib/libc.so.6
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I think the problem happened at frame 0 the memcpy function. But since I used memcpy a lot I'm not sure which one is causing the problem. So I go to frame 1 and prints all the arguments I parsed to this function. But The GDB did not find any of the arg I passed:
(gdb) frame 1
#1 0x0001e8c4 in write_utf8_string ()
(gdb) print ptr
No symbol "ptr" in current context.
(gdb) print conn_config->MQTTVersion
No symbol "conn_config" in current context.
(gdb) print MQTTVersion
No symbol "MQTTVersion" in current context.
(gdb) print clientID
No symbol "clientID" in current context.
(gdb) print clientIDLen
No symbol "clientIDLen" in current context.
I'm not sure why this is happening or the arg is corrupted before this frame. But I check all the arguments in all the functions listed in the bt. The GDB can not find any of them.
Also I'm not sure what the ?? mean in the frame 8
Related
My app is randomly (once a day) crashed and I have tried several ways to find out the reason but no luck.
With other core dump or segmentation fault cases, I can locate where does it happen by gdb, but for this case, gdb don't give me too much hint.
I need some advice for my continuous debugging, please help.
GDB output when my app crashed
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/greystone/myapp/myapp'.
Program terminated with signal SIGTRAP, Trace/breakpoint trap.
#0 0x00007f5d3a435afb in g_logv () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
[Current thread is 1 (Thread 0x7f5cea3d4700 (LWP 14353))]
(gdb) bt full
#0 0x00007f5d3a435afb in g_logv () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
No symbol table info available.
#1 0x00007f5d3a435c6f in g_log () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
No symbol table info available.
#2 0x00007f5d3a472742 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
No symbol table info available.
#3 0x00007f5d3a42cab3 in g_main_context_new () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
No symbol table info available.
#4 0x00007f5d3f4894c9 in QEventDispatcherGlibPrivate::QEventDispatcherGlibPrivate(_GMainContext*) () from /opt/Qt5.9.2/5.9.2/gcc_64/lib/libQt5Core.so.5
No symbol table info available.
#5 0x00007f5d3f4895a1 in QEventDispatcherGlib::QEventDispatcherGlib(QObject*) () from /opt/Qt5.9.2/5.9.2/gcc_64/lib/libQt5Core.so.5
No symbol table info available.
#6 0x00007f5d3f266870 in ?? () from /opt/Qt5.9.2/5.9.2/gcc_64/lib/libQt5Core.so.5
No symbol table info available.
#7 0x00007f5d3f267758 in ?? () from /opt/Qt5.9.2/5.9.2/gcc_64/lib/libQt5Core.so.5
No symbol table info available.
#8 0x00007f5d3efa76ba in start_thread (arg=0x7f5cea3d4700) at pthread_create.c:333
__res =
pd = 0x7f5cea3d4700
now =
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140037043603200, 4399946704104667801, 0, 140033278038543, 8388608, 140037073195984, -4344262468029171047, -4344357617020880231}, mask_was_saved = 0}},
priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call =
pagesize_m1 =
sp =
freesize =
__PRETTY_FUNCTION__ = "start_thread"
#9 0x00007f5d3e43c41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
Solutions I have tried
Search topic related with SIGTRAP
People said it is in debug mode and there are somewhere in the code set break point. However, my app is compiled in release mode without break point.
Catch signal handler and ignore SIGTRAP
No success, I can only ignore SIGTRAP sent by "kill -5 pid". With the SIGTRAP occurs randomly in runtime, my app is still crashed
Fix memory leak in code
Initialize pointer with nullptr
Double check mysql C API race conditions
Double check delete array action and double check assign value for the index out of array boundaries
Check signals and slots
My app is built on Qt frameworks as a GUI application, there are many signals and slots I have checked but no ideas how are they related to SIGTRAP core dump.
Check exceptions for opencv
I use opencv for image processing tasks. I have checked for exception cases
Shared memory
Memory shared between main process and sub processes were carefully checked
Example code
A lot of code in my app, but because gdb don't give me exactly where does it happen, so I don't know which code I should share. If you need it for checking for suggestion, please tell me which part of the code you would like to check. My app have these following parts.
Mysql in C api mysql 5.7.29
User interface (alot) by Qt framework 5.9.2
Image processing with opencv 2.4.9
Process flow in multi threading by Qt framework 5.9.2
If there is any ideas, please share me some keywords then I could search about it and apply to my app. Thanks for your help.
for this case, gdb don't give me too much hint
GDB tells you exactly what happened, you just didn't understand it.
What's happening is that some code in libglib called g_logv(..., G_LOG_FLAG_FATAL, ...), which eventually calls _g_log_abort(), which executes int3 (debug breakpoint) instruction.
You should be able to (gdb) x/i 0x00007f5d3a435afb and see that instruction.
It looks like g_main_context_new() may have failed to allocate memory.
In any case, you should look in the application stderr logs for the reason libglib is terminating your program (effectively, libglib calls an equivalent of abort, because some precondition has failed).
My Project got crashed after 5 days of testing , when I analyze the dump file its showing as BUS Error
Here the below chunk of code i got from the backtrace
Program terminated with signal SIGBUS, Bus error.
#0 0x0000000000000531 in ?? ()
[Current thread is 1 (LWP 902)]
(gdb) bt
#0 0x0000000000000531 in ?? ()
#1 0x000000000041a294 in CUtilsTimer::forgetTimer() ()
#2 0x0000000000415160 in CEMPLinkMonitor::monitor_ethernet_link_status() ()
#3 0x0000000000413fc8 in CEMPTransport::recvEMPData(Emp_Packet*) ()
#4 0x000000000041313c in CEMPRxTransport::run() ()
#5 0x00000000004190a8 in CUtilsThread::runLoop(void*) ()
#6 0x0000007fac289fb8 in ?? () from /lib/libpthread.so.0
#7 0x0000007fa74bdc98 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
I tried to find the root cause but not able get any clue to crack , Please help
I got this weird crash and I have no idea how to debug the core dump since the call stack is missing symbols info for some reason, except for the last function:
#0 BIH::intersectRay<VMAP::MapRayCallback> (this=0x7f47b8339608, r=..., intersectCallback=..., maxDist=#0x7f493af8383c: 0, stopAtFirst=true, los=<optimized out>) at ../BIH.h:223
#1 0x000000307ff00000 in ?? ()
#2 0x7ff0000000000000 in ?? ()
#3 0x0000000000000030 in ?? ()
#4 0x000000307ff00000 in ?? ()
#5 0x7ff0000000000000 in ?? ()
#6 0x0000000000000030 in ?? ()
#7 0x000000307ff00000 in ?? ()
#8 0x7ff0000000000000 in ?? ()
#9 0x0000000000000030 in ?? ()
#10 0x000000307ff00000 in ?? ()
#11 0x7ff0000000000000 in ?? ()
#12 0x0000000000000030 in ?? ()
#13 0x000000307ff00000 in ?? ()
#14 0x7ff0000000000000 in ?? ()
#15 0x0000000000000030 in ?? ()
#16 0x000000307ff00000 in ?? ()
#17 0x7ff0000000000000 in ?? ()
#18 0x0000000000000030 in ?? ()
#19 0x000000307ff00000 in ?? ()
#20 0x7ff0000000000000 in ?? ()
#21 0x0000000000000030 in ?? ()
#22 0x000000307ff00000 in ?? ()
....
#749 0x7ff0000000000000 in ?? ()
#750 0x0000000000000030 in ?? ()
#751 0x000000307ff00000 in ?? ()
#752 0x7ff0000000000000 in ?? ()
#753 0x0000000000000030 in ?? ()
#754 0x000000307ff00000 in ?? ()
#755 0x7ff0000000000000 in ?? ()
#756 0x0000000000000030 in ?? ()
#757 0x000000307ff00000 in ?? ()
#758 0x7ff0000000000000 in ?? ()
#759 0x0000000000000030 in ?? ()
#760 0x000000307ff00000 in ?? ()
#761 0x7ff0000000000000 in ?? ()
#762 0x0000000000000030 in ?? ()
#763 0x000000307ff00000 in ?? ()
#764 0x03010102464c457f in ?? ()
#765 0x0000000000000000 in ?? ()`
(gdb) info frame 0
Stack frame at 0x7f493af83830:
rip = 0x930f0b in BIH::intersectRay<VMAP::MapRayCallback> (../BIH.h:223); saved rip = 0x307ff00000
called by frame at 0x7f493af83838
source language c++.
Arglist at 0x7f493af83438, args: this=0x7f47b8339608, r=..., intersectCallback=..., maxDist=#0x7f493af8383c: 0, stopAtFirst=true, los=<optimized out>
Locals at 0x7f493af83438, Previous frame's sp is 0x7f493af83830
Saved registers:
rbx at 0x7f493af837f8, rbp at 0x7f493af83800, r12 at 0x7f493af83808, r13 at 0x7f493af83810, r14 at 0x7f493af83818, r15 at 0x7f493af83820, rip at 0x7f493af83828
#1 0x000000307ff00000 in ?? ()
No symbol table info available.
(gdb) info frame 1
Stack frame at 0x7f493af83838:
rip = 0x307ff00000; saved rip = 0x7ff0000000000000
called by frame at 0x7f493af83840, caller of frame at 0x7f493af83830
Arglist at 0x7f493af83828, args:
Locals at 0x7f493af83828, Previous frame's sp is 0x7f493af83838
Saved registers:
rip at 0x7f493af83830
#2 0x7ff0000000000000 in ?? ()
No symbol table info available.
(gdb) info frame 2
Stack frame at 0x7f493af83840:
rip = 0x7ff0000000000000; saved rip = 0x30
called by frame at 0x7f493af83848, caller of frame at 0x7f493af83838
Arglist at 0x7f493af83830, args:
Locals at 0x7f493af83830, Previous frame's sp is 0x7f493af83840
Saved registers:
rip at 0x7f493af83838
#3 0x0000000000000030 in ?? ()
No symbol table info available.
(gdb) info frame 3
Stack frame at 0x7f493af83848:
rip = 0x30; saved rip = 0x307ff00000
called by frame at 0x7f493af83850, caller of frame at 0x7f493af83840
Arglist at 0x7f493af83838, args:
Locals at 0x7f493af83838, Previous frame's sp is 0x7f493af83848
Saved registers:
rip at 0x7f493af83840
#4 0x000000307ff00000 in ?? ()
No symbol table info available.
(gdb) info frame 4
Stack frame at 0x7f493af83850:
rip = 0x307ff00000; saved rip = 0x7ff0000000000000
called by frame at 0x7f493af83858, caller of frame at 0x7f493af83848
Arglist at 0x7f493af83840, args:
Locals at 0x7f493af83840, Previous frame's sp is 0x7f493af83850
Saved registers:
rip at 0x7f493af83848
The code is compiled with -g -fvar-tracking -O2 -march=native.
I had various dumps for various crashes that all had the symbol tables working and gave relevant call stacks and infos, but for some reason this specific crash is cryptic.
One thing I noticed is the same address numbers repeating over and over again, could it be some infinite loop or recursion of some sorts that is corrupting or overflowing the stack?
If that so, is there any way to get the top most functions in the call stack (e.g. any way to go above frame #765 or get the functions called before the overflow was triggered)?
I cannot set $sp or jump to any address due to that I cannot debug and step through the live program, just analyse the core dump.
I cannot replicate this crash, it happens on production from time to time. Also valgrind is out of the question.
Are there any g++ compiler options or gdb flags that could help me with this?
Any pointers on how to debug such an issue is appreciated (if possible at all).
I have no idea how to debug the core dump since the call stack is missing symbols info for some reason
Part 1:
The most usual reason for this kind of meaningless call stack is a mismatch between the binary that produced the core dump, and binary you use to actually analyze the core.
If you used --build-id at link time, or if your GCC is configured to use that linker flag by default, then you can verify the the binary matches (or doesn't match) the core using this procedure:
readelf -n /path/to/binary
This should produce output similar to:
$ readelf -n /bin/sleep
Displaying notes found at file offset 0x00000254 with length 0x00000020:
Owner Data size Description
GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
OS: Linux, ABI: 2.6.24
Displaying notes found at file offset 0x00000274 with length 0x00000024:
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: c266a51e4b85b16ca17bff8328f3abeafb577b29
The build-id string c266a51e4b85b16ca17bff8328f3abeafb577b29 is the output you care about. Assuming your binary has it, install elfutils package, then use
eu-unstrip -n --core /path/to/core
to see which binaries were used at the time the core dump was produced.
The output should look like this:
$ eu-unstrip -n --core /tmp/core
0x400000+0x208000 c266a51e4b85b16ca17bff8328f3abeafb577b29#0x400284 - - [exe]
0x7ffca5721000+0x1000 9c7cbcf6c957d8fc8e55b45a3c7a1556b38a3097#0x7ffca5721340 . - linux-vdso.so.1
0x7f491ad5a000+0x2241c8 d0f537904076d73f29e4a37341f8a449e2ef6cd0#0x7f491ad5a1d8 /lib64/ld-linux-x86-64.so.2 /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.19.so ld-linux-x86-64.so.2
0x7f491a995000+0x3c42c0 cf699a15caae64f50311fc4655b86dc39a479789#0x7f491a995280 /lib/x86_64-linux-gnu/libc.so.6 /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.19.so libc.so.6
Above you can see that this core dump was in fact produced by /bin/sleep.
If the executable build-id in core does not match your binary, you'll need to locate the binary with build-id matching your core before you can extract correct crash stack trace in GDB.
Part 2:
If the binary does match the core, then it is very likely that the stack is simply corrupt (due to e.g. stack buffer overflow).
valgrind is out of the question.
Valgrind is exceptionally weak at detecting stack corruption anyway.
The current state of the art in debugging these kinds of problems is Address Sanitizer, which is significantly faster and may be fast enough to run in production.
If sanitized binary is not fast enough for production use, you may be able to set it up such that it processes some subset of inputs in "shadow mode" (the binary runs, but its output is discarded). Any effort you put into such setup will likely uncover 10s of new bugs, and will save you significant future debugging effort.
I am new to this unix and gdb. I have a coredump file generated.I am using gdb to debug, but there is no meaningful information found.
I am getting output as
(gdb) thread apply all bt full
Thread 7 (LWP 12190):
#0 0x00007fa2eae29896 in ?? ()
No symbol table info available.
#1 0x000000000000019a in ?? ()
No symbol table info available.
#2 0x00007fa2e9906ce0 in ?? ()
No symbol table info available.
There are 7 seven threads. and for all I am getting the same. I am not getting the way forward to proceed.Please help me. OR please explain me what does this mean.
This means that there is not a symbol table loaded for the coredump. Chances are you invoked gdb directly on the coredump rather than like this:
gdb <executable> <coredump>
My program crashes in string assign. I cannot corner down the exact cause of it. Multiple threads execute the same code.
This is my code.
char* cTemp = new char[5];
memset(cTemp,'\0', 5);
snprintf(cTemp , 5 , "%04x" , iParameter);
string sVar1 = cTemp;
delete[] cTemp;
if(sVar1 == "0")
sVar1 = "0000";
pSharedLib->setVar1(sVar1);
The set Function(in shared library)
bool A::setVar1(CString& temp)
{
m_sVar1= temp;
return true;
}
The crash bt shows the error as
#0 0x48194444 in raise () from /lib/libc.so.6
#0 0x48194444 in raise () from /lib/libc.so.6
No symbol table info available.
#1 0x48199694 in abort () from /lib/libc.so.6
No symbol table info available.
#2 0x481d4ecc in ?? () from /lib/libc.so.6
No symbol table info available.
#3 0x481e14d4 in ?? () from /lib/libc.so.6
No symbol table info available.
#4 0x481e32b0 in free () from /lib/libc.so.6
No symbol table info available.
#5 0x480df8b8 in operator delete(void*) () from /usr/lib/libstdc++.so.6
No symbol table info available.
#6 0x480b136c in std::string::_Rep::_M_destroy(std::allocator<char> const&)
() from /usr/lib/libstdc++.so.6
No symbol table info available.
#7 0x480b35f4 in std::string::assign(std::string const&) ()
from /usr/lib/libstdc++.so.6
No symbol table info available.
I don't see any synchronization objects protecting the set of m_sVar1. You mentioned that setVar1 could be called from multiple threads simultaneously, the threading guarantees for STL don't guarantee that the assignment is safe from multiple threads.
I suspect the key to this problem is
Multiple threads execute the same code.
If there's a single string m_sVar1, and multiple threads are assigning to it simultaneously, then the chances are rather good that a race condition will lead to corruption. You need to properly protect that variable with a critical section.