Strange stack of a thread - c++

I faced with crash of my application when it stops. Gdb shows following stack (app is built with -g -O0):
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007f254ea99700 in ?? ()
#2 0x0000000000000000 in ?? ()
Short investigation has shown that crash happens during stopping a thread which is started the same way as many others in the app:
// mListener is std::thread and member of class UA
std::thread thr(&UA::run, this);
mListener = std::move(thr);
Then I ran gdb on app before stopping and saw the difference between stacks of thread caused crash and other threads.
All threads looks like:
...
#8 0x000000000043a70a in std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (UI::Keyboard::*)()> (UI::Keyboard*)> >::_M_run() (this=0xa88fd0)
at /usr/include/c++/4.9/thread:115
#9 0x00007fb6055c3970 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x00007fb6083ff0a4 in start_thread (arg=0x7fb604042700) at pthread_create.c:309
#11 0x00007fb604d3304d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
But 'wrong' thread always looks different:
#0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1 0x000000000043317d in Semaphore::wait (this=0x7fb5fc0009e8) at /home/vadius/workspace/iPhone/core/src/Core/env/Semaphore.h:28
#2 0x0000000000432564 in SIP::UA::run (this=0x7fb5fc000980) at /home/vadius/workspace/iPhone/core/src/SIP/UA.cpp:132
#3 0x0000000000000000 in ?? ()
I assume that when thread exits from worker method (SIP::UA::run) it goes to code placed in nullptr.
My question is:
1. Am I right and stack of 'bad' thread is wrong?
2. What can be reason of such behavior and how to avoid it?
Debian jessie x64 /
GCC 4.9 /
Compile flags: set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -DDEBUG -g -O0")

The giveaway is the "address". 432564 is "C%d". The bytes in a normal address are usually not all ASCII. This is a stack buffer overflow.

Related

C++ Multi threaded Program got crashed with BUS error

My Project got crashed after 5 days of testing , when I analyze the dump file its showing as BUS Error
Here the below chunk of code i got from the backtrace
Program terminated with signal SIGBUS, Bus error.
#0 0x0000000000000531 in ?? ()
[Current thread is 1 (LWP 902)]
(gdb) bt
#0 0x0000000000000531 in ?? ()
#1 0x000000000041a294 in CUtilsTimer::forgetTimer() ()
#2 0x0000000000415160 in CEMPLinkMonitor::monitor_ethernet_link_status() ()
#3 0x0000000000413fc8 in CEMPTransport::recvEMPData(Emp_Packet*) ()
#4 0x000000000041313c in CEMPRxTransport::run() ()
#5 0x00000000004190a8 in CUtilsThread::runLoop(void*) ()
#6 0x0000007fac289fb8 in ?? () from /lib/libpthread.so.0
#7 0x0000007fa74bdc98 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
I tried to find the root cause but not able get any clue to crack , Please help

How to determine reason of pthread_raise(sig=6) in core file with gdb

My app crashes sometime and I cant find the cause. My app is multithread (QThread) and use several QUdpSockets. I think it happens due to the simultaneous access to the socket, but I dont know when and where.
There is results of bt from core file:
#0 0x414596e1 in ?? ()
#1 0x412d731b in pthread_kill (thread=1649, signo=6) at signals.c:69
#2 0x412d76a0 in __pthread_raise (sig=6) at signals.c:200
#3 0x41459395 in ?? ()
#4 0x00000006 in ?? ()
#5 0x41546ff4 in ?? ()
#6 0xbd5fd8bc in ?? ()
#7 0x4145a87d in ?? ()
#8 0x00000006 in ?? ()
#9 0x00000020 in ?? ()
#10 0x00000000 in ?? ()
What is sig=6 and when it emited?
How can I determine the reason of this behavior?
How do I know which -dev libraries are missing (??? positions of the stack)?
Signal number 6 on Linux is SIGABRT - the fact that it's being raised with pthread_raise() seems to indicate that the application has directly called abort() or a failed assert().
It's likely that the missing parts of your backtrace are in the QT libraries, so try installing the debugging symbols for all of those.

QT5.2.0; Debian Wheezy: QSocketNotifier::type() segfault

I have a multithreaded app that uses QThreadPool. It crashes after a random amount of time (sometimes minutes, sometimes hours...) with a segfault. I recompiled with debugging symbols and ran through GDB. Here's the backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7231170 in QSocketNotifier::type() const ()
from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
(gdb) where
#0 0x00007ffff7231170 in QSocketNotifier::type() const ()
from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
#1 0x00007ffff724b732 in ?? () from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
#2 0x00007ffff51d713b in g_main_context_check ()
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00007ffff51d75c2 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#4 0x00007ffff51d7744 in g_main_context_iteration ()
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5 0x00007ffff724c023 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
#6 0x00007ffff71fa2cb in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
#7 0x00007ffff71fe33e in QCoreApplication::exec() ()
from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
#8 0x0000000000409bb9 in main (argc=1, argv=<optimized out>) at main.cpp:166
That's the complete backtrace. It references/mentions basically no code within the app itself; it all appears to be Qt library code causing the fault. Not sure what source from the app itself to include in this post since GDB does not reference anything within the app itself. Any ideas?

Make gdb show thread names on 'apply all' operations

I'm debugging an app with many threads, so I've named them using prctl. This works great with gdb's info threads option, but it would be nice if thread * apply all operations showed it as well. Any way to coerce gdb to do this?
(gdb) info threads
Id Target Id Frame
...
3 Thread 0x7ffff6ffe700 (LWP 30048) "poll_uart_threa" 0x00007ffff78eb823 in select ()
at ../sysdeps/unix/syscall-template.S:82
2 Thread 0x7ffff77ff700 (LWP 30047) "signal hander" do_sigwait (set=<optimized out>,
sig=0x7ffff77feed8)
at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:65
* 1 Thread 0x7ffff7fcc700 (LWP 30046) "simulator" __lll_lock_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:132
Pointer, PID {well, thread ID, but LWP threads == processes, ish}, and name
(gdb) thread apply all bt
...
Thread 3 (Thread 0x7ffff6ffe700 (LWP 30048)):
#0 0x00007ffff78eb823 in select () at ../sysdeps/unix/syscall-template.S:82
#1 0x0000000000403bb3 in poll_uart_thread (unused=0x0) at uart.c:96
#2 0x00007ffff7bc4e9a in start_thread (arg=0x7ffff6ffe700) at pthread_create.c:308
#3 0x00007ffff78f24bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#4 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7ffff77ff700 (LWP 30047)):
<call stack>
#2 0x0000000000417a89 in sig_thread (arg=0x7fffffffbb60) at simulator.c:879
#3 0x00007ffff7bc4e9a in start_thread (arg=0x7ffff77ff700) at pthread_create.c:308
#4 0x00007ffff78f24bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5 0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7ffff7fcc700 (LWP 30046)):
<call stack>
#9 0x00000000004182e3 in simulator (flash_file=0x7fffffffe0e4 "../programs/blink.bin")
at simulator.c:1005
#10 0x0000000000401f14 in main (argc=3, argv=0x7fffffffdd48) at cli.c:167
While I can find the name by hunting the call stack, it'd be nice / convenient / etc if it would print in the summary line, which here only has PID and pointer.
There's no easy way, you have to patch GDB. It's a simple patch, you can find it here.
it'd be nice / convenient / etc if it would print in the summary line, which here only has PID and pointer.
Please file an ehnancement request in GDB bugzilla.
If you are using GDB with embedded python, you might be able to script "thread apply" to do what you want, but it really ought to do the right thing already.

Core dump in libc exit call

I am seeing a core dump in solaris at the exit procedure of my program.. How to debug and fix this kind of core dump?
(gdb) where
#0 0xff2cc0c0 in kill () from /usr/lib/libc.so.1
#1 0x0004dac0 in run_before_killed_handler (sig=11) at NdmpServer.cpp:1186
#2 signal handler called
#3 0xfee0ad50 in ?? ()
#4 0x00060a6c in proc_cleanup ()
#5 0xff2421ac in _exithandle () from /usr/lib/libc.so.1
#6 0xff2305d8 in exit () from /usr/lib/libc.so.1
#7 0x0003431c in _start ()
Your program apparently uses atexit(3C) to register an exit handler. The problem is occuring in that handler.
Without knowing the finer details of Solaris memory layouts, 0xfee0ad50 seems to be on the OS side. What OS call are you trying (and failing) to make in proc_cleanup?