I have a backtrace with something I haven't seen before. See frame 2 in these threads:
Thread 31 (process 8752):
#0 0x00faa410 in __kernel_vsyscall ()
#1 0x00b0b139 in sigprocmask () from /lib/libc.so.6
#2 0x00b0c7a2 in abort () from /lib/libc.so.6
#3 0x00752aa0 in __gnu_cxx::__verbose_terminate_handler () from /usr/lib/libstdc++.so.6
#4 0x00750505 in ?? () from /usr/lib/libstdc++.so.6
#5 0x00750542 in std::terminate () from /usr/lib/libstdc++.so.6
#6 0x00750c65 in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
#7 0x00299c63 in ApplicationFunction()
Thread 1 (process 8749):
#0 0x00faa410 in __kernel_vsyscall ()
#1 0x00b0ad80 in raise () from /lib/libc.so.6
#2 0x00b0c691 in abort () from /lib/libc.so.6
#3 0x00b4324b in __libc_message () from /lib/libc.so.6
#4 0x00b495b6 in malloc_consolidate () from /lib/libc.so.6
#5 0x00b4b3bd in _int_malloc () from /lib/libc.so.6
#6 0x00b4d3ab in malloc () from /lib/libc.so.6
#7 0x08147f03 in AnotherApplicationFunction ()
When opening it with gdb and getting backtrace it gives me thread 1. Later I saw the weird state that thread 31 is in. This thread is from the library that we had problems with so I'd believe the crash is caused by it.
So what does it mean? Two threads simultaneously doing something illegal? Or it's one of them, causing somehow abort() in the other one?
The OS is Linux Red Hat Enterprise 5.3, it's a multiprocessor server.
It is hard to be sure, but my first suspicion upon seeing these stack traces would be a memory corruption (possibly a buffer overrun on the heap). If that's the case, then the corruption is probably the root cause of both threads ending up in abort.
Can you valgrind your app?
Looks like it could be heap corruption, detected by malloc in thread 1, causing or caused by the error in thread 31.
Some broken piece of code overwriting a.o. the vtable in thread 31 could easily cause this.
It's possible that the reason thread 31 aborted is because it trashed the application heap in some way. Then when the main thread tried to allocate memory the heap data structure was in a bad state, causing the allocation to fail and abort the application again.
Related
I have used 2 threads, but they are getting stuck with following stack trace:
Thread 2:
(gdb) bt
#0 0x00007f9e1d7625bc in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x00007f9e1d6deb35 in _L_lock_17166 () from /lib64/libc.so.6
#2 0x00007f9e1d6dbb73 in malloc () from /lib64/libc.so.6
#3 0x00007f9e1d6c4bad in __fopen_internal () from /lib64/libc.so.6
#4 0x00007f9e1dda2210 in std::__basic_file<char>::open(char const*, std::_Ios_Openmode, int) () from /lib64/libstdc++.so.6
#5 0x00007f9e1dddd5ba in std::basic_filebuf<char, std::char_traits<char> >::open(char const*, std::_Ios_Openmode) () from /lib64/libstdc++.so.6
#6 0x00000000005e1244 in fatalSignalHandler(int, siginfo*, void*) ()
#7 <signal handler called>
#8 0x00007f9e1d6d6839 in malloc_consolidate () from /lib64/libc.so.6
#9 0x00007f9e1d6d759e in _int_free () from /lib64/libc.so.6
_int_free is getting called as a result of default destructor.
Thread 1:
(gdb) bt
#0 0x00007f9e2a4ed54d in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f9e2a4e8e9b in _L_lock_883 () from /lib64/libpthread.so.0
#2 0x00007f9e2a4e8d68 in pthread_mutex_lock () from /lib64/libpthread.so.0
Via Threads getting stuck with few threads at point "in __lll_lock_wait" I get to know that __lll_lock_wait() is called if we are not able to get a lock on the mutex, since something else (In this case I guess the Thread 2) is still locking it.
But Thread 2 is also stuck with given stack trace, and since they are not with debug symbols, I can't check who is the owner of the mutex. So my questions are:
What is the use of / cause of __lll_lock_wait_private ()
Is there any hint what and where could the issue be? Without availability of debug symbols.
Several times I have seen hang in case of malloc_consolidate() on linux.. Is this a well known and yet to be solved issue?
Frames 6 and 7 of thread 2 suggest a custom signal handler was installed. Frame 5 suggests it is trying to do something like write to a file (std::ofstream?).
That is not allowed. Very little is allowed in signal handlers, and definitely not iostreams.
Suppose you are in a function like malloc_consolidate which may have to touch the global arena, and take a lock to do it, and a signal comes along. If you allocate memory in the signal handler, you also need the same lock, which is already being held. Thread 2 is deadlocking itself.
I'm writing c++11 multi-threaded application.
The main thread is reading from database and puts records in std::queue, threads are taking records from queue and process them.
Application is synchronised using std::mutex, std::condition_variable (defined as class members), methods are using std::unique_lock(class member mutex)
After some time (usually few minutes) - my application crashes with
terminate called after throwing an instance of 'std::system_error'
what(): Operation not permitted
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff5464700 (LWP 10242)]
0x00007ffff60b3515 in raise () from /lib64/libc.so.6
backtrace from gdb shows:
#0 0x00007ffff60b3515 in raise () from /lib64/libc.so.6
#1 0x00007ffff60b498b in abort () from /lib64/libc.so.6
#2 0x00007ffff699f765 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libstdc++.so.6
#3 0x00007ffff699d906 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libstdc++.so.6
#4 0x00007ffff699d933 in std::terminate() () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libstdc++.so.6
#5 0x00007ffff69f0a75 in ?? () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.8.3/libstdc++.so.6
#6 0x00007ffff76741a7 in start_thread () from /lib64/libpthread.so.0
#7 0x00007ffff616a1fd in clone () from /lib64/libc.so.6
How can I get more information about this exception?
I am compiling and linking with -pthread option
G++ 4.8.3 at Gentoo Linux machine
-g option is enabled in both compiler and linker
I tried disabling optimization
I'm having problems with C++ code loaded via dlopen() by a C++ CGI server. After a while, the program crashes unexpectedly, but consistently at memory management function call (such as free(), calloc(), etc.) and produces core dump similar to this:
#0 0x0000000806b252dc in kill () from /lib/libc.so.6
#1 0x0000000804a1861e in raise () from /lib/libpthread.so.2
#2 0x0000000806b2416d in abort () from /lib/libc.so.6
#3 0x0000000806abdb45 in _UTF8_init () from /lib/libc.so.6
#4 0x0000000806abdfcc in _UTF8_init () from /lib/libc.so.6
#5 0x0000000806abeb1d in _UTF8_init () from /lib/libc.so.6
... the rest of the stack
Has anyone seen something like this before?
What is _UTF8_init() and why would memory management functions call it?
That smells like a corrupted heap, likely due to a buffer overrun somewhere in your code. Try running your program with Valgrind and look for any errors or warnings it emits.
C++ program is throwing this:
terminate called after throwing an instance of 'St9bad_alloc' what(): std::bad_alloc
Which it appears to be thrown from new, but the stack trace doesn't show any calls to new:
#0 0x0000003174a330c5 in raise () from /lib64/libc.so.6
#1 0x0000003174a34a76 in abort () from /lib64/libc.so.6
#2 0x00007f93b1b7b0b4 in __gnu_cxx::__verbose_terminate_handler ()
at ../../../../gcc-4.3.4/libstdc++-v3/libsupc++/vterminate.cc:98
#3 0x00007f93b1b794f6 in __cxxabiv1::__terminate (handler=0x522b)
at ../../../../gcc-4.3.4/libstdc++-v3/libsupc++/eh_terminate.cc:43
#4 0x00007f93b1b79523 in std::terminate ()
at ../../../../gcc-4.3.4/libstdc++-v3/libsupc++/eh_terminate.cc:53
#5 0x00007f93b1b79536 in __cxxabiv1::__unexpected (handler=0x522b)
at ../../../../gcc-4.3.4/libstdc++-v3/libsupc++/eh_terminate.cc:59
#6 0x00007f93b1b78ec8 in __cxxabiv1::__cxa_call_unexpected (exc_obj_in=0x7f93b1dae770)
at ../../../../gcc-4.3.4/libstdc++-v3/libsupc++/eh_personality.cc:750
#7 0x00007f93b2c356e0 in network::HttpLoader::doLoad (this=0x7f938801ef20) at loaders/HttpLoader.cxx:1071
#8 0x00007f93b2c70971 in network::Loader::load (this=0x522b) at Loader.cxx:899
#9 0x00007f93b2c74a15 in network::Loader::load2 (this=0x522b) at Loader.cxx:925
#10 0x00007f93b2c7b13a in network::LoaderThread::run() ()
#11 0x00007f93b1e60be4 in threads::Thread_startWorker (thr=0x7f938801e460) at Threads.cxx:479
#12 0x00007f93b1e60ead in threads::ThreadPool::run (this=0x1140478, thr=0x7f938801eeb0) at Threads.cxx:727
#13 0x00007f93b1e608e8 in threads::__Thread_startWorker (param=<value optimized out>) at Threads.cxx:520
#14 0x0000003175206ccb in start_thread () from /lib64/libpthread.so.0
#15 0x0000003174ae0c2d in clone () from /lib64/libc.so.6
Added debugging statements at the beginning of doLoad(), but it never gets to that point.
Stumped!
Any thoughts?
The new call may not be in the stack because it has already unwound at the point your application is terminating. I'd try to set a breakpoint at the moment the exception is thrown (e.g., using catch throw under gdb) -- at that point you'll see the cause of the exception in the stack.
Maby the new call was inlined due to optimizations. Try to disable optimizations, at least in loaders/HttpLoader.cxx.
I have a program that brings up and tears down multiple threads throughout its life. Everything works great for awhile, but eventually, I get the following core dump stack trace.
#0 0x009887a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x007617a5 in raise () from /lib/tls/libc.so.6
#2 0x00763209 in abort () from /lib/tls/libc.so.6
#3 0x003ec1bb in __gnu_cxx::__verbose_terminate_handler () from /usr/lib/libstdc++.so.6
#4 0x003e9ed1 in __cxa_call_unexpected () from /usr/lib/libstdc++.so.6
#5 0x003e9f06 in std::terminate () from /usr/lib/libstdc++.so.6
#6 0x003ea04f in __cxa_throw () from /usr/lib/libstdc++.so.6
#7 0x00d5562b in boost::thread::start_thread () from /h/Program/bin/../lib/libboost_thread-gcc34-mt-1_39.so.1.39.0
At first, I was leaking threads, and figured the core was due to hitting some maximum limit of number of current threads, but now it seems that this problems occurs even when I don't. For reference, in the core above there were 13 active threads executing.
I did some searching to try and figure out why start_thread would core, but I didn't come across anything. Anyone have any ideas?
start_thread is throwing an uncaught exception, see which exceptions can start_thread throw and place a catch around it to see what is the problem.
What are the values carried by thread_resource_error? It looks like you can call native_error() to find out.
Since this is a wrapper around pthreads there are only a couple of possibilities - EAGAIN, EINVAL and EPERM. It looks as if boost has exceptions it would likely throw for EINVAL and EPERM - i.e. unsupported_thread_option() and thread_permission_error().
That pretty much leaves EAGAIN so I would double check that you really aren't exceeding the system limits on the number of threads. You are sure you are joining them, or if detached, they are really gone?