gdb - get exactly unexpected error from core file - c++

What exactly I need to do to get the unexpected error code or something similar to it form the core file with GDB or some other tool, to get the idea why my daemon died at operator new?
(gdb) bt
#0 0x48775bd7 in thr_kill () from /lib/libc.so.7
#1 0x48726f46 in pthread_kill () from /lib/libthr.so.3
#2 0x487245da in raise () from /lib/libthr.so.3
#3 0x4880abba in abort () from /lib/libc.so.7
#4 0x4866e65f in __gnu_cxx::__verbose_terminate_handler ()
from /usr/lib/libstdc++.so.6
#5 0x486729aa in std::set_unexpected () from /usr/lib/libstdc++.so.6
#6 0x486729f2 in std::terminate () from /usr/lib/libstdc++.so.6
#7 0x486728ea in __cxa_throw () from /usr/lib/libstdc++.so.6
#8 0x486c77ac in operator new () from /usr/lib/libstdc++.so.6
#9 0x0806ad4c in XXX::process_in (this=0x4b110d40,
map_settings_to_save=#0x7f7fcc98, str_answer=#0x7f7fcf84)
at Click.cpp:2940

Go to line 2940 of Click.cpp; you should find that someone is instantiating a new object. There was some error in the constructor.
From the looks of it, the heap in your application is trashed.
If the program used to work, and you changed something before this, potentially something that could have damaged the heap, check that carefully.

Related

localtime() function is invoking ___lll_lock_wait_private() which is making the thread to go deadlock

I have seen many questions in which ___lll_lock_wait_private () is going into deadlock. But their call stack is different. Their code was calling malloc() or free() or fork().
But in my case, I have a Log class, which is trying to print a log. That log is making my thread to go deadlock.
See below call stack,
#0 0x000000fff4c47b9c in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x000000fff4bf0364 in __tz_convert () from /lib64/libc.so.6
#2 0x000000fff4bee2c0 in localtime () from /lib64/libc.so.6
#3 0x000000fff5167188 in getTimeStr() () from /usr/sbin/sajet/sharedobj/liblibLite.so
#4 0x000000fff516756c in LogClass::logBegin() () from /usr/sbin/sajet/sharedobj/liblibLite.so
#5 0x000000fff5318c90 in DaemonCtrlServer::strtDaemon(daemonInfo&) () from /usr/sbin/sajet/sharedobj/libDaemonCtlServer.so
#6 0x000000fff531abc0 in DaemonCtrlServer::restrtDiedDaemon(daemonInfo&) () from /usr/sbin/sajet/sharedobj/libDaemonCtlServer.so
#7 0x000000fff531ae64 in DaemonCtrlServer::handleChildDeath() () from /usr/sbin/sajet/sharedobj/libDaemonCtlServer.so
#8 0x00000000100080ac in sj_initd::DaemonDeathHandler() ()
#9 0x000000001000b8f8 in sj_initd::SignalHandler(int) ()
#10 0x00000000100080e8 in sj_initd_SigHandler::sj_initdSigHandler(int) ()
#11 <signal handler called>
#12 0x000000fff4bedc00 in __offtime () from /lib64/libc.so.6
#13 0x000000fff4bf02a8 in __tz_convert () from /lib64/libc.so.6
#14 0x000000fff4bee2c0 in localtime () from /lib64/libc.so.6
#15 0x000000fff5167188 in getTimeStr() () from /usr/sbin/sajet/sharedobj/liblibLite.so
#16 0x000000fff516756c in LogClass::logBegin() () from /usr/sbin/sajet/sharedobj/liblibLite.so
#17 0x000000fff52e868c in ConnectionOS::ProcessReadEvent() () from /usr/sbin/sajet/sharedobj/libconnV2.so
#18 0x000000fff52ef354 in ConnectionOSManager::ProcessConns(fd_set*, fd_set*) () from /usr/sbin/sajet/sharedobj/libconnV2.so
#19 0x000000fff52f0a0c in SocketsManager::ProcessFds(bool) () from /usr/sbin/sajet/sharedobj/libconnV2.so
#20 0x000000fff52c51b4 in EventReactorBase::IO (this=0x19a65e80) at EventReactorBase.cpp:361
#21 0x000000fff52c457c in EventReactorBase::React (this=0x19a65e80) at EventReactorBase.cpp:419
#22 0x000000fff52c10cc in Task::Run (this=0x19a65e30) at Task.cpp:222
#23 0x000000fff52c1218 in startTask (t=0x19a65e30) at Task.cpp:152
#24 0x000000001000a9c4 in TaskManager::Start() ()
#25 0x0000000010007538 in main ()
sj_init is a daemon which monitors the live status of other daemons in a system. When a daemon dies(which closes the connection with sj_init), it tries to restart that daemon. Then, startDaemon() is trying to print a log, which is calling getTimeStr() which internally calling ___lll_lock_wait_private
Edit
as localtime is not threadsafe, I tried with localtime_r but it also lead the thread to go into a deadlock. but acc. to localtime_r description this is threadsafe function. What am I doing wrong here?
The programs blocks inside a signal handler in __tz_convert() being called from localtime(). The signal handler interrupted __tz_convert() being called from localtime().
#0 0x000000fff4c47b9c in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x000000fff4bf0364 in __tz_convert () from /lib64/libc.so.6
#2 0x000000fff4bee2c0 in localtime () from /lib64/libc.so.6
...
#11 <signal handler called>
...
#13 0x000000fff4bf02a8 in __tz_convert () from /lib64/libc.so.6
#14 0x000000fff4bee2c0 in localtime () from /lib64/libc.so.6
...
#25 0x0000000010007538 in main ()
localtime() seems to not be reintrant.
A signal handler may only call a very specific set of functions. Scroll down here: https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03
localtime() is not within this set.
Signal handler should be short and simple.
You could set up a thread doing the formatting and logging which for example via a pipe is being fed by the signal handler with all necessary information to format and log. The functions to do so are listed within the set linked above.
#4 0x000000fff516756c in LogClass::logBegin() () from /usr/sbin/sajet/sharedobj/liblibLite.so
#16 0x000000fff516756c in LogClass::logBegin() () from /usr/sbin/sajet/sharedobj/liblibLite.so
Notice that LogClass::logBegin appears in your call stack twice?
You have two core problems.
You are calling localtime from a signal handler. That is not allowed.
You are looking at the wrong thread's stack. This thread got stuck in a deadlock caused by other threads and then was interrupted, getting stuck in the deadlock again. It's a victim of the deadlock (twice!). The perpetrator is another thread.
If you're going to log from a signal handlers, you need very tight control over all the code that runs in the signal handler and you need to pass the "heavy lifting" off to another thread that can safely call non-reentrant functions.

Core dump in zmq library in a multi-threaded application with optimiized binary

This core dump on zmq library happened on field (not reproducible yet) with an optimized binary.
#0 0x00007f44a00801f7 in raise () from /lib64/libc.so.6
#1 0x00007f44a00818e8 in abort () from /lib64/libc.so.6
#2 0x00007f44a1f74759 in zmq::zmq_abort(char const*) () from /lib64/libzmq.so.5
#3 0x00007f44a1fa410d in zmq::tcp_write(int, void const*, unsigned long) () from /lib64/libzmq.so.5
#4 0x00007f44a1f9f417 in zmq::stream_engine_t::out_event() () from /lib64/libzmq.so.5
#5 0x00007f44a1f7437a in zmq::epoll_t::loop() () from /lib64/libzmq.so.5
#6 0x00007f44a1fa83a6 in thread_routine () from /lib64/libzmq.so.5
#7 0x00007f44a1b2ce25 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f44a014334d in clone () from /lib64/libc.so.6enter code here
While I am analyzing my application code and hoping to find some misuse of zmq (probably using same zmq socket by 2 different threads or some other memory corruption), I would like to know what else can i get from this core-dump?
For a start, I can see total 102 threads running at the dump time. A many of them are in the epoll_wait.
#0 0x00007f44a0143923 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f44a1f74309 in zmq::epoll_t::loop() () from /lib64/libzmq.so.5
#2 0x00007f44a1fa83a6 in thread_routine () from /lib64/libzmq.so.5
#3 0x00007f44a1b2ce25 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f44a014334d in clone () from /lib64/libc.so.6
The other threads pointing to application code do not look suspicious yet.
The errno printed is 14 = EFAULT (Bad address).
Can i try to get anything from the disassembly? I have not debugged many disassembly in the past. But in this situation if i can get any clue, i can jump-in.
Any (other) advice/pointer will also be highly appreciated.
Thanks.

How to determine reason of pthread_raise(sig=6) in core file with gdb

My app crashes sometime and I cant find the cause. My app is multithread (QThread) and use several QUdpSockets. I think it happens due to the simultaneous access to the socket, but I dont know when and where.
There is results of bt from core file:
#0 0x414596e1 in ?? ()
#1 0x412d731b in pthread_kill (thread=1649, signo=6) at signals.c:69
#2 0x412d76a0 in __pthread_raise (sig=6) at signals.c:200
#3 0x41459395 in ?? ()
#4 0x00000006 in ?? ()
#5 0x41546ff4 in ?? ()
#6 0xbd5fd8bc in ?? ()
#7 0x4145a87d in ?? ()
#8 0x00000006 in ?? ()
#9 0x00000020 in ?? ()
#10 0x00000000 in ?? ()
What is sig=6 and when it emited?
How can I determine the reason of this behavior?
How do I know which -dev libraries are missing (??? positions of the stack)?
Signal number 6 on Linux is SIGABRT - the fact that it's being raised with pthread_raise() seems to indicate that the application has directly called abort() or a failed assert().
It's likely that the missing parts of your backtrace are in the QT libraries, so try installing the debugging symbols for all of those.

OpenSSL crashes while calling SSL_new() Library function

I am working with OpenSSL Library. When I execute the project I am facing crash issue from this line of the source code:
m_pSslFd = SSL_new(m_pCtx);
Declaration and initialization part is correct. Execution is working fine when this library method is called first time. But it crashes while this library method is called second time.
I am giving gdb back trace for this crash
(gdb) bt
#0 0x0000003dee876285 in malloc_consolidate () from /lib64/libc.so.6
#1 0x0000003dee879415 in _int_malloc () from /lib64/libc.so.6
#2 0x0000003dee87a9a1 in malloc () from /lib64/libc.so.6
#3 0x00000032c1c6abee in CRYPTO_malloc () from /usr/lib64/libcrypto.so.10
#4 0x00000032c202986a in ssl3_new () from /usr/lib64/libssl.so.10
#5 0x00000032c203bfae in dtls1_new () from /usr/lib64/libssl.so.10
#6 0x00000032c204534c in SSL_new () from /usr/lib64/libssl.so.10
#7 0x00007ffff7882bf7 in DTLSCore::DoDTLSClientNegotiation (this=0x858940, iFd=#0x7fff635fd3bc, speer=...)at src/afg/DTLSCore.cpp:236
Any suggestion will be helpful for me. Thank You.

QT5.2.0; Debian Wheezy: QSocketNotifier::type() segfault

I have a multithreaded app that uses QThreadPool. It crashes after a random amount of time (sometimes minutes, sometimes hours...) with a segfault. I recompiled with debugging symbols and ran through GDB. Here's the backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7231170 in QSocketNotifier::type() const ()
from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
(gdb) where
#0 0x00007ffff7231170 in QSocketNotifier::type() const ()
from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
#1 0x00007ffff724b732 in ?? () from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
#2 0x00007ffff51d713b in g_main_context_check ()
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00007ffff51d75c2 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#4 0x00007ffff51d7744 in g_main_context_iteration ()
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5 0x00007ffff724c023 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
#6 0x00007ffff71fa2cb in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
#7 0x00007ffff71fe33e in QCoreApplication::exec() ()
from /opt/Qt/5.2.0/gcc_64/lib/libQt5Core.so.5
#8 0x0000000000409bb9 in main (argc=1, argv=<optimized out>) at main.cpp:166
That's the complete backtrace. It references/mentions basically no code within the app itself; it all appears to be Qt library code causing the fault. Not sure what source from the app itself to include in this post since GDB does not reference anything within the app itself. Any ideas?