We are facing a problem in c++ code.When an exception is thrown the
process is getting terminated,but my code has proper exception
handling.
Core stack is below.
======================
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x00489915 in raise () from /lib/tls/libc.so.6
#2 0x0048b379 in abort () from /lib/tls/libc.so.6
#3 0xf6a1fbdb in __gnu_cxx::__verbose_terminate_handler () from /usr/lib/libstdc++.so.6
#4 0xf6a1d8f1 in __cxa_call_unexpected () from /usr/lib/libstdc++.so.6
#5 0xf6a1d926 in std::terminate () from /usr/lib/libstdc++.so.6
#6 0xf6a1da6f in __cxa_throw () from /usr/lib/libstdc++.so.6
Some of the forums showing that this might occur when there is a stack unwinding or exception raised when handling other exception.
Can you please suggest here to fix the problem.
Terminate can be called if:
An exception is thrown while another exception is propagating
An exception is thrown that violates an exception specification
An exception is thrown and never caught
But your call stack also has unexpected =>__cxa_call_unexpected ()
To me this is an indication that an exception specification has been violated.
This calls unexpected() which by default calls terminate().
Are you throwing an exception in any of your object destructors? A call to terminate generally happens in *nix systems when a new exception is thrown before the current exception is caught.
When an exception gets thrown the objects in your stack frame get destroyed as the stack unwinds. If any of these objects were to throw an exception during destruction, it confuses the exception handling mechanism as it has now two exceptions to catch, the original exception and the new one which gets thrown during the destruction of the object on the stack. So it issues a terminate and aborts.
Can also be caused if/when cannot allocate memory for a dependent exception object which would be thrown. (I imagine this is rare. But is possible reason the app would terminate abnormally.)
Related
I have a small to medium size application which combines Fortran and C++. The main is written in Fortran, but one module is in c++. This module returns pointers to class objects which are stored on the Fortran size. During the creation on one of these pointers the system is throwing the following error:
malloc(): memory corruption
Thread 1 "bc_test" received signal SIGABRT, Aborted.
__GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory
(gdb) bt
#0 __GI_raise (sig=sig#entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff4a60801 in __GI_abort () at abort.c:79
#2 0x00007ffff4aa9897 in __libc_message (action=action#entry=do_abort,
fmt=fmt#entry=0x7ffff4bd6b9a "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x00007ffff4ab090a in malloc_printerr (
str=str#entry=0x7ffff4bd4e0e "malloc(): memory corruption") at malloc.c:5350
#4 0x00007ffff4ab4994 in _int_malloc (av=av#entry=0x7ffff4e0bc40 <main_arena>,
bytes=bytes#entry=44) at malloc.c:3738
#5 0x00007ffff4ab72ed in __GI___libc_malloc (bytes=44) at malloc.c:3065
#6 0x00007ffff50bc298 in operator new(unsigned long) ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x0000555555578967 in My_Class::My_Class(this=0x7fffffffd4e0, n=11)
at /home/.../my_class.cpp:20
Using gdb I have found that the error is thrown during a call to new. More specifically during a call to new within the constructor of an object being created via new (a basic new call works as expected). The line throwing the error is the following:
int* test = new int[n];
in this case n is an integer with n=11.
I don't think that the problem is due to a lack of memory as I have only allocated 2 small class instances and a few basic variables at this point. I also believe this would throw a different error if this were the problem.
Unfortunately I haven't managed to create a MWE. I've now run out of ideas of how to fix this problem. What can cause this error? How can it be debugged beyond finding the line throwing the error?
Other stack overflow results concerning "malloc(): memory corruption" errors are due to accessing unallocated memory however this isn't the case here as it is the allocation call itself which is throwing the error.
Memory corruption errors do not always manifest themselves in the place where the error was committed. As a result the gdb backtrace is often useless for finding the error. Instead a memory analysis/debugging tool such as Valgrind should be used.
My program recently crashed with the following stack;
Program terminated with signal 7, Bus error.
#0 0x00007f0f323beb55 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f0f323beb55 in raise () from /lib64/libc.so.6
#1 0x00007f0f35f8042e in skgesigOSCrash () from /usr/lib/oracle/11.2/client64/lib/libclntsh.so.11.1
#2 0x00007f0f36222ca9 in kpeDbgSignalHandler () from /usr/lib/oracle/11.2/client64/lib/libclntsh.so.11.1
#3 0x00007f0f35f8063e in skgesig_sigactionHandler () from /usr/lib/oracle/11.2/client64/lib/libclntsh.so.11.1
#4 <signal handler called>
What should I check in my code to avoid this? Or is this something Oracle should fix?
Main reasons you could get a bus error revolves around inaccessible memory. This could be due to many reasons:
Accessing through a deleted pointer.
Accessing through an uninitialized pointer.
Accessing through a NULL pointer.
Accessing the address which is not yours. It could be due to overflow errors.
Try adding the following to the $ORACLE_HOME/network/admin/*.ora file:
DIAG_ADR_ENABLED=OFF
DIAG_SIGHANDLER_ENABLED=FALSE
DIAG_DDE_ENABLED=FALSE
This sounds like an Oracle issue.
And also Oracle's libraries seem to be compiled by Intel compilers.
I have a exit handler thread waiting on a condition for the worker thread to do its work. The signalling is done from the worker thread's destructor.
Below is the code of the exit handler thread.
void Class::TaskExitHandler::run() throw()
{
while( ! isInterrupted() ) {
_book->_eot_cond.wait(); // Waiting on this condition
{
CLASS_NAMESPACE::Guard<CLASS_NAMESPACE::FastLock> eguard(_book->_exitlist_lock);
list<TaskGroupExecutor*>::const_iterator itr = _book->_exited_tasks.begin();
for( ; itr != _book->_exited_tasks.end(); itr++ ) {
(*itr)->join();
TRACER(TRC_DEBUG)<< "Deleting exited task:" << (*itr)->getLoc() << ":"
<< (*itr)->getTestID() << ":" << (*itr)->getReportName() << endl;
delete (*itr);
}
_book->_exited_tasks.clear();
}
_book->executeAny();
}
}
}
Now, what has been observed is that when the worker thread catches any exception(raised from a lower layer), this thread is continued, and immediately cores with exit code 134, which is SIGABRT.
The stacktrace is as follows-
#0 0x0000005555f49b4c in raise () from /lib64/libc.so.6
#1 0x0000005555f4b568 in abort () from /lib64/libc.so.6
#2 0x0000005555d848b4 in __gnu_cxx::__verbose_terminate_handler () from /usr/lib64/libstdc++.so.6
#3 0x0000005555d82210 in ?? () from /usr/lib64/libstdc++.so.6
#4 0x0000005555d82258 in std::terminate () from /usr/lib64/libstdc++.so.6
#5 0x0000005555d82278 in ?? () from /usr/lib64/libstdc++.so.6
#6 0x0000005555d81b18 in __cxa_call_unexpected () from /usr/lib64/libstdc++.so.6
#7 0x0000000120047898 in Class::TaskExitHandler::run ()
#8 0x000000012001cd38 in commutil::ThreadBase::thread_proxy ()
#9 0x0000005555c6e438 in start_thread () from /lib64/libpthread.so.0
#10 0x0000005555feed6c in __thread_start () from /lib64/libc.so.6
Backtrace stopped: frame did not save the PC
So it seems that this run() function which specifies that it will not throw any exceptions using "throw()" spec, raises an exception(from Frame 4). As per various references about __cxa_call_unexpected(), the stacktrace depicts the typical behaviour of compiler to abort when exception is raised in a function with "throw()" spec.
Am I right with the analysis of the problem?
To test, I added a try catch in this method, and printed the exception message. Now the process didn't core. The exception message was same as the one caught by worker thread.
My question is, how does this thread get access to the exception caught by the other? Do they share some datastructure related to exception handling?
Please throw some light on this. It is quite puzzling..
Note:- As per stacktrace, the call_unexpected is raised immediately after run() is called. That strengthens my doubt that somehow exception stack or data is shared. But didn't find any references to this behaviour.
I shall answer my own question.
What has happened in this case was there was a destructor being invoked in the TaskExitHandler thread. This destructor was performing the same operation which caused the exception in the main thread.
As the TaskExitHandler thread was designed to not throw(or rather expected), there were no try-catch blocks, and hence process aborted when the exception was raised.
As the destructor's call was implicit, it never displayed in stacktrace making it very difficult to find. Each object had to be tracked down to find this exception leakage.
Thanks everyone for the active participation :) this was my first question to get some active responses..
I'll take a stab - hopefully this will give you enough to continue your research.
I suspect the thread running TaskExitHandler is the parent thread for all of the worker threads. TEH would have a hard(er) time joining up with the children otherwise.
The child / worker threads are not handling the exceptions thrown to them. However, an exception must be handled somewhere or the entire process will get shut down. The parent thread (aka TEH) is the last stop in the process's stack / chain for handling exceptions. Your sample code shows that TEH's exception handling is to simply throw / not handle the exception. So it cores out.
It's not necessarily a data structure that's being shared, but rather the process / thread IDs and memory space. The child threads do share global memory / heap space with the parent and each other, hence the need for semaphores and / or mutexes for locking purposes.
Good encapsulation dictates that the worker threads should be smart enough to handle any / all exceptions they might see. That way, the individual worker thread can be killed off instead of bringing down the parent thread and the rest of the process tree. OTW, you can continue catching the exception(s) in TEH, but it's really unlikely that thread has (or should have) the knowledge of what to do with the exception.
Add a comment if the above isn't clear, I'm happy to explain further.
I did a little research and confirmed that exceptions are generated against heap memory, not stack memory. All the threads of your process share the same heap*, so it makes more sense (at least to me) why the parent thread would see the exception when the child thread doesn't catch it. *FWIW, if you fork your process instead of starting a new thread, you'll get a new heap as well. However, forking is an expensive operation against the memory since you're copying all the heap contents over to the new process as well.
This SO thread discusses setting up a thread to catch all exceptions, which will probably be of interest:
catching exceptions from another thread
My C++ program exits with a std::logic_error and I'd like to track down the source line that caused it. How can I do that?
TBH, I'm using gdb, using g++ -g in order to add debug info. All I can get are these messages:
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_S_construct null not valid
Catchpoint 1 (exception thrown), 0x0045ffa0 in __cxa_throw ()
(gdb) bt
#0 0x0045ffa0 in __cxa_throw ()
#1 0x004601e8 in std::__throw_logic_error(char const*) ()
#2 0x00502238 in typeinfo for std::__timepunct<wchar_t> ()
#3 0x004685f8 in std::runtime_error::what() const ()
#4 0x03210da8 in ?? ()
#5 0x002efbcc in ?? ()
#6 0x00468734 in std::domain_error::~domain_error() ()
#7 0x00000000 in ?? ()
(gdb)
You use a debugger.
Using debugger tools is a very important skill to learn with compiled languages like C and C++.
The exception objects don't carry any source information with them. However, they hopefully contain a useful message accessible using the what() member. Other than that you'd either have to use a debugger allowing to break when exceptions are thrown or set a break point into the constructor of std::logic_error. As long as exceptions are exceptional this works OK. It doesn't work too well with code throwing exceptions in non-exceptional cases.
So my understanding of both pthread_exit and pthread_cancel is that they both cause an exception-like thing called a "forced unwind" to be thrown out of the relevant stack frame in the target thread. This can be caught in order to do thread-specific clean-up, but must be re-thrown or else we get an implicit abort() at the end of the catch block that didn't re-throw.
In the case of pthread_cancel, that happens either immediately on receipt of the associated signal, or the next entry into a cancellation point, or when the signal is next unblocked, depending on the thread's cancellation state and type.
In the case of pthread_exit, the calling thread immediately undergoes a forced unwind.
Fine. This "exception" is a normal part of the process of killing a thread. So why, even when I re-throw it, is it causing std::terminate() to be called, aborting my whole application?
Note that I'm catching and re-throwing the exception a couple times.
Note also that I'm calling pthread_exit out of my SIGTERM signal handler. This works fine in my toy test code, compiled with g++ 4.3.2, which has a thread run signal(SIGTERM, handler_that_calls_pthread_exit) and then sit in a tight while loop until it gets the TERM signal. But it doesn't work in the real application.
Relevant stack frames:
(gdb) where
#0 0x0000003425c30265 in raise () from /lib64/libc.so.6
#1 0x0000003425c31d10 in abort () from /lib64/libc.so.6
#2 0x00000000012b7740 in sv_bsd_terminate () at exception_handlers.cpp:38
#3 0x00002aef65983aa6 in __cxxabiv1::__terminate (handler=0x518)
at /view/ken_gcc_4.3/vobs/Compiler/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:43
#4 0x00002aef65983ad3 in std::terminate ()
at /view/ken_gcc_4.3/vobs/Compiler/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:53
#5 0x00002aef65983a5a in __cxxabiv1::__gxx_personality_v0 (
version=<value optimized out>, actions=<value optimized out>,
exception_class=<value optimized out>, ue_header=0x645bcd80,
context=0x645bb940)
at /view/ken_gcc_4.3/vobs/Compiler/gcc/libstdc++-v3/libsupc++/eh_personality.cc:657
#6 0x00002aef6524d68c in _Unwind_ForcedUnwind_Phase2 (exc=0x645bcd80,
context=0x645bb940)
at /view/ken_gcc_4.3/vobs/Compiler/gcc/libgcc/../gcc/unwind.inc:180
#7 0x00002aef6524d723 in _Unwind_ForcedUnwind (exc=0x645bcd80,
stop=<value optimized out>, stop_argument=0x645bc1a0)
at /view/ken_gcc_4.3/vobs/Compiler/gcc/libgcc/../gcc/unwind.inc:212
#8 0x000000342640cf80 in __pthread_unwind () from /lib64/libpthread.so.0
#9 0x00000034264077a5 in pthread_exit () from /lib64/libpthread.so.0
#10 0x0000000000f0d959 in threadHandleTerm (sig=<value optimized out>)
at osiThreadLauncherLinux.cpp:46
#11 <signal handler called>
Thanks!
Eric
Note also that I'm calling
pthread_exit out of my SIGTERM signal
handler.
This is your problem. To quote from the POSIX specs (http://pubs.opengroup.org/onlinepubs/009695399/functions/signal.html):
If the signal occurs other than as the result of calling abort(), raise(), kill(), pthread_kill(), or sigqueue(), the behavior is undefined if the signal handler refers to any object with static storage duration other than by assigning a value to an object declared as volatile sig_atomic_t, or if the signal handler calls any function in the standard library other than one of the functions listed in Signal Concepts.
The list of permitted functions is given at http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html#tag_02_04_03, and does not include pthread_exit(). Therefore your program is exhibiting undefined behaviour.
I can think of three choices:
Set a flag in the signal handler which is checked by the thread periodically, rather than trying to exit directly from the signal handler.
Use sigwait() to explicitly wait for the signal on an independent thread. This thread can then explicitly call pthread_cancel() on the thread you wish to exit.
Mask the signal, and call sigpending() periodically on the thread that is to be exited, and exit if the signal is pending.