Crash due to SIGABRT on Linux C++ PowerPC - c++

My program crashes in string assign. I cannot corner down the exact cause of it. Multiple threads execute the same code.
This is my code.
char* cTemp = new char[5];
memset(cTemp,'\0', 5);
snprintf(cTemp , 5 , "%04x" , iParameter);
string sVar1 = cTemp;
delete[] cTemp;
if(sVar1 == "0")
sVar1 = "0000";
pSharedLib->setVar1(sVar1);
The set Function(in shared library)
bool A::setVar1(CString& temp)
{
m_sVar1= temp;
return true;
}
The crash bt shows the error as
#0 0x48194444 in raise () from /lib/libc.so.6
#0 0x48194444 in raise () from /lib/libc.so.6
No symbol table info available.
#1 0x48199694 in abort () from /lib/libc.so.6
No symbol table info available.
#2 0x481d4ecc in ?? () from /lib/libc.so.6
No symbol table info available.
#3 0x481e14d4 in ?? () from /lib/libc.so.6
No symbol table info available.
#4 0x481e32b0 in free () from /lib/libc.so.6
No symbol table info available.
#5 0x480df8b8 in operator delete(void*) () from /usr/lib/libstdc++.so.6
No symbol table info available.
#6 0x480b136c in std::string::_Rep::_M_destroy(std::allocator<char> const&)
() from /usr/lib/libstdc++.so.6
No symbol table info available.
#7 0x480b35f4 in std::string::assign(std::string const&) ()
from /usr/lib/libstdc++.so.6
No symbol table info available.

I don't see any synchronization objects protecting the set of m_sVar1. You mentioned that setVar1 could be called from multiple threads simultaneously, the threading guarantees for STL don't guarantee that the assignment is safe from multiple threads.

I suspect the key to this problem is
Multiple threads execute the same code.
If there's a single string m_sVar1, and multiple threads are assigning to it simultaneously, then the chances are rather good that a race condition will lead to corruption. You need to properly protect that variable with a critical section.

Related

Core dump in zmq library in a multi-threaded application with optimiized binary

This core dump on zmq library happened on field (not reproducible yet) with an optimized binary.
#0 0x00007f44a00801f7 in raise () from /lib64/libc.so.6
#1 0x00007f44a00818e8 in abort () from /lib64/libc.so.6
#2 0x00007f44a1f74759 in zmq::zmq_abort(char const*) () from /lib64/libzmq.so.5
#3 0x00007f44a1fa410d in zmq::tcp_write(int, void const*, unsigned long) () from /lib64/libzmq.so.5
#4 0x00007f44a1f9f417 in zmq::stream_engine_t::out_event() () from /lib64/libzmq.so.5
#5 0x00007f44a1f7437a in zmq::epoll_t::loop() () from /lib64/libzmq.so.5
#6 0x00007f44a1fa83a6 in thread_routine () from /lib64/libzmq.so.5
#7 0x00007f44a1b2ce25 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f44a014334d in clone () from /lib64/libc.so.6enter code here
While I am analyzing my application code and hoping to find some misuse of zmq (probably using same zmq socket by 2 different threads or some other memory corruption), I would like to know what else can i get from this core-dump?
For a start, I can see total 102 threads running at the dump time. A many of them are in the epoll_wait.
#0 0x00007f44a0143923 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f44a1f74309 in zmq::epoll_t::loop() () from /lib64/libzmq.so.5
#2 0x00007f44a1fa83a6 in thread_routine () from /lib64/libzmq.so.5
#3 0x00007f44a1b2ce25 in start_thread () from /lib64/libpthread.so.0
#4 0x00007f44a014334d in clone () from /lib64/libc.so.6
The other threads pointing to application code do not look suspicious yet.
The errno printed is 14 = EFAULT (Bad address).
Can i try to get anything from the disassembly? I have not debugged many disassembly in the past. But in this situation if i can get any clue, i can jump-in.
Any (other) advice/pointer will also be highly appreciated.
Thanks.

How to determine reason of pthread_raise(sig=6) in core file with gdb

My app crashes sometime and I cant find the cause. My app is multithread (QThread) and use several QUdpSockets. I think it happens due to the simultaneous access to the socket, but I dont know when and where.
There is results of bt from core file:
#0 0x414596e1 in ?? ()
#1 0x412d731b in pthread_kill (thread=1649, signo=6) at signals.c:69
#2 0x412d76a0 in __pthread_raise (sig=6) at signals.c:200
#3 0x41459395 in ?? ()
#4 0x00000006 in ?? ()
#5 0x41546ff4 in ?? ()
#6 0xbd5fd8bc in ?? ()
#7 0x4145a87d in ?? ()
#8 0x00000006 in ?? ()
#9 0x00000020 in ?? ()
#10 0x00000000 in ?? ()
What is sig=6 and when it emited?
How can I determine the reason of this behavior?
How do I know which -dev libraries are missing (??? positions of the stack)?
Signal number 6 on Linux is SIGABRT - the fact that it's being raised with pthread_raise() seems to indicate that the application has directly called abort() or a failed assert().
It's likely that the missing parts of your backtrace are in the QT libraries, so try installing the debugging symbols for all of those.

SIGSEGV on program exit with boost::log

Some time ago we separate our big project with almost static libraries to many projects with dynamic libraries.
Since then we stated seeing problems on shutdown.
Sometimes, the process would not terminate. With gdb I found, that on object destruction a segfault occurs, but the process is blocked in futex_wait.
I've since improved the code, by creating global objects are now created in function, instead of global static data. That reduced the problem: it doesn't happen in my development environment anymore.
However, in test environment (rare) and in production environment (often) processes still get stuck on shutdown. So we need to restart container manually, or have some kind of health check.
We are trying to simulate this kind of situation on standalone docker container running under Kubernetes where we have the process running under circusd and we see following:
#0 malloc_consolidate (av=0xf47fc400 <main_arena>) at malloc.c:4151
#1 0xf46ff1ab in _int_free (av=0xf47fc400 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:4057
#2 0xf48c6e68 in operator delete(void*) () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#3 0xf52d173d in std::_Deque_base<boost::log::v2_mt_posix::record_view, std::allocator<boost::log::v2_mt_posix::record_view> >::~_Deque_base() () from /usr/local/lib/liblog.so.0
#4 0xf52d18b3 in std::deque<boost::log::v2_mt_posix::record_view, std::allocator<boost::log::v2_mt_posix::record_view> >::~deque() () from /usr/local/lib/liblog.so.0
#5 0xf52d1940 in boost::log::v2_mt_posix::sinks::bounded_fifo_queue<4000u, boost::log::v2_mt_posix::sinks::drop_on_overflow>::~bounded_fifo_queue() () from /usr/local/lib/liblog.so.0
#6 0xf52d462e in boost::log::v2_mt_posix::sinks::asynchronous_sink<cout_sink, boost::log::v2_mt_posix::sinks::bounded_fifo_queue<4000u, boost::log::v2_mt_posix::sinks::drop_on_overflow>
>::~asynchronous_sink() () from /usr/local/lib/liblog.so.0
#7 0xf52d47f4 in asynchronous_sink<cout_sink>::~asynchronous_sink() () from /usr/local/lib/liblog.so.0
#8 0xf52c199a in boost::detail::sp_counted_impl_pd<asynchronous_sink<cout_sink>*, boost::detail::sp_ms_deleter<asynchronous_sink<cout_sink> >
>::dispose() () from /usr/local/lib/liblog.so.0
#9 0xf51f3e7b in boost::log::v2_mt_posix::core::~core() () from /usr/lib/libboost_log.so.1.58.0
#10 0xf51f6529 in boost::detail::sp_counted_impl_p<boost::log::v2_mt_posix::core>::dispose() () from /usr/lib/libboost_log.so.1.58.0
#11 0xf51f6160 in boost::shared_ptr<boost::log::v2_mt_posix::core>::~shared_ptr() () from /usr/lib/libboost_log.so.1.58.0
#12 0xf46bcfb3 in __cxa_finalize (d=0xf526fa88) at cxa_finalize.c:56
#13 0xf51eaab3 in ?? () from /usr/lib/libboost_log.so.1.58.0
#14 0xf7769e2c in _dl_fini () at dl-fini.c:252
#15 0xf46bcc21 in __run_exit_handlers (status=status#entry=0, listp=0xf47fc3a4 <__exit_funcs>, run_list_atexit=run_list_atexit#entry=true) at exit.c:82
#16 0xf46bcc7d in __GI_exit (status=0) at exit.c:104
#17 0xf46a572b in __libc_start_main (main=0x8060dc0, argc=5, argv=0xffdd1514, init=0x8088090, fini=0x8088100, rtld_fini=0xf7769c50 <_dl_fini>, stack_end=0xffdd150c) at libc-start.c:321
#18 0x080630cc in ?? ()
I have no ideas how to progress from here. What is happening? Why do we get the segfault in boost::log::core destruction in this environment?
Does anyone have some advice how can I find it, probably, based on experience?

OpenSSL crashes while calling SSL_new() Library function

I am working with OpenSSL Library. When I execute the project I am facing crash issue from this line of the source code:
m_pSslFd = SSL_new(m_pCtx);
Declaration and initialization part is correct. Execution is working fine when this library method is called first time. But it crashes while this library method is called second time.
I am giving gdb back trace for this crash
(gdb) bt
#0 0x0000003dee876285 in malloc_consolidate () from /lib64/libc.so.6
#1 0x0000003dee879415 in _int_malloc () from /lib64/libc.so.6
#2 0x0000003dee87a9a1 in malloc () from /lib64/libc.so.6
#3 0x00000032c1c6abee in CRYPTO_malloc () from /usr/lib64/libcrypto.so.10
#4 0x00000032c202986a in ssl3_new () from /usr/lib64/libssl.so.10
#5 0x00000032c203bfae in dtls1_new () from /usr/lib64/libssl.so.10
#6 0x00000032c204534c in SSL_new () from /usr/lib64/libssl.so.10
#7 0x00007ffff7882bf7 in DTLSCore::DoDTLSClientNegotiation (this=0x858940, iFd=#0x7fff635fd3bc, speer=...)at src/afg/DTLSCore.cpp:236
Any suggestion will be helpful for me. Thank You.

gdb - get exactly unexpected error from core file

What exactly I need to do to get the unexpected error code or something similar to it form the core file with GDB or some other tool, to get the idea why my daemon died at operator new?
(gdb) bt
#0 0x48775bd7 in thr_kill () from /lib/libc.so.7
#1 0x48726f46 in pthread_kill () from /lib/libthr.so.3
#2 0x487245da in raise () from /lib/libthr.so.3
#3 0x4880abba in abort () from /lib/libc.so.7
#4 0x4866e65f in __gnu_cxx::__verbose_terminate_handler ()
from /usr/lib/libstdc++.so.6
#5 0x486729aa in std::set_unexpected () from /usr/lib/libstdc++.so.6
#6 0x486729f2 in std::terminate () from /usr/lib/libstdc++.so.6
#7 0x486728ea in __cxa_throw () from /usr/lib/libstdc++.so.6
#8 0x486c77ac in operator new () from /usr/lib/libstdc++.so.6
#9 0x0806ad4c in XXX::process_in (this=0x4b110d40,
map_settings_to_save=#0x7f7fcc98, str_answer=#0x7f7fcf84)
at Click.cpp:2940
Go to line 2940 of Click.cpp; you should find that someone is instantiating a new object. There was some error in the constructor.
From the looks of it, the heap in your application is trashed.
If the program used to work, and you changed something before this, potentially something that could have damaged the heap, check that carefully.