GDB printing library's warning origin call stack - c++

I am getting lots of warnings from OpenSceneGraph and all look like this:
Warning:: Picked up error in TriangleIntersect
(-117448 -2.12751e+06 -519242, -120167 -2.17679e+06 -383117, -234607 -1.85755e+06 -431865)
(-nan, -nan, -nan)
And unfortunately I cannot trace the origin of them.
I tried to launch my program, then interrupt it with CTRL + C
and with bt print the back-trace. It gives me only some basic trace from application workflow like:
#0 0x00007fffe8116bf9 in __GI___poll (fds=0x5555584549f0, nfds=3, timeout=461) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x00007fffe45395c9 in () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 0x00007fffe45396dc in g_main_context_iteration () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00007ffff2a2897f in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#4 0x00007ffff29cd9fa in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#5 0x00007ffff29d6aa4 in QCoreApplication::exec() () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
The warnings occur when entering:
void osgViewer::ViewerBase::frame()
I tried to enter it also but it escapes immediately, prints the warnings and continues the program flow. I suppose it triggers some actions maybe in other threads.
And here comes my question, is there a chance of getting to the origin/trace of those warning messages with GDB?
I am using OpenSceneGraph 3.4

The error is coming from this line:
https://github.com/jklimke/osg/blob/master/src/osgUtil/LineSegmentIntersector.cpp#L174
That code and error isn't even in the current trunk/head OSG source, so maybe your problem could be resolved by simply updating to the latest version of OSG.

Related

Is it safe to add or remove log sinks while still logging from other threads with boost log?

With boost::log is it safe to add or remove log sinks while still logging from other threads? Is there some manual locking that I need to do to make these operations thread-safe?
What I am trying to do is to start a new text file that contains a subset of entries of an existing log file between two points in my program.
For example if I have the following entries going to a log file "main_debug.log"
line 1
line 2
line 3
line 4
And then I add a new sink after line 1, and remove it after line 3, I would see "new_debug.log" containing the following entries
line 2
line 3
What I have seems to work most of the time, but I am occasionally seeing segmentation faults occurring within boost::log. An example of an instance of this occurring when I managed to catch it with gdb:
Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 8760]
boost::intrusive::list_impl<boost::intrusive::derivation_value_traits<boost::log::v2_mt_posix::attribute_value_set::node, boost::log::v2_mt_posix::attribute_value_set::implementation::node_traits, (boost::intrusive::link_mode_type)0>, unsigned int, true, void>::clear_and_dispose<boost::log::v2_mt_posix::attribute_value_set::implementation::disposer> (this=0x5e64aff4, disposer=...) at /boost-1.60.0/boost/intrusive/list.hpp:738
738 /boost-1.60.0/boost/intrusive/list.hpp: No such file or directory.
(gdb) bt
#0 boost::intrusive::list_impl<boost::intrusive::derivation_value_traits<boost::log::v2_mt_posix::attribute_value_set::node, boost::log::v2_mt_posix::attribute_value_set::implementation::node_traits, (boost::intrusive::link_mode_type)0>, unsigned int, true, void>::clear_and_dispose<boost::log::v2_mt_posix::attribute_value_set::implementation::disposer> (this=0x5e64aff4, disposer=...) at /boost-1.60.0/boost/intrusive/list.hpp:738
#1 boost::log::v2_mt_posix::attribute_value_set::implementation::~implementation (this=0x5e64afe8, __in_chrg=<optimized out>) at /boost-1.60.0/libs/log/src/attribute_value_set.cpp:150
#2 boost::log::v2_mt_posix::attribute_value_set::implementation::destroy (p=0x5e64afe8) at /boost-1.60.0/libs/log/src/attribute_value_set.cpp:239
#3 boost::log::v2_mt_posix::attribute_value_set::~attribute_value_set (this=0x5e64b3e4, __in_chrg=<optimized out>) at /boost-1.60.0/libs/log/src/attribute_value_set.cpp:519
#4 0x76e3bbac in boost::log::v2_mt_posix::record_view::public_data::~public_data (this=0x5e64b3e0, __in_chrg=<optimized out>) at /boost-1.60.0/boost/log/core/record_view.hpp:86
#5 boost::log::v2_mt_posix::record_view::private_data::~private_data (this=0x5e64b3e0, __in_chrg=<optimized out>) at /boost-1.60.0/libs/log/src/core.cpp:79
#6 boost::log::v2_mt_posix::record_view::private_data::destroy (this=0x5e64b3e0) at /boost-1.60.0/libs/log/src/core.cpp:131
#7 boost::log::v2_mt_posix::record_view::public_data::destroy (p=0x5e64b3e0) at /boost-1.60.0/libs/log/src/core.cpp:184
#8 0x0020b030 in boost::log::v2_mt_posix::sinks::asynchronous_sink<boost::log::v2_mt_posix::sinks::text_file_backend, boost::log::v2_mt_posix::sinks::unbounded_fifo_queue>::run() ()
#9 0x76d4be6c in boost::(anonymous namespace)::thread_proxy (param=<optimized out>) at /boost-1.60.0/libs/thread/src/pthread/thread.cpp:167
#10 0x76c22f00 in ?? () from /lib/libpthread.so.0
To add a new sink I am doing the following:
const auto pDebugBackend = boost::make_shared<boost::log::sinks::text_file_backend>(
boost::log::keywords::file_name = "debug.log",
boost::log::keywords::channel = "InfoConsole" );
const auto pNewDebugSink = boost::make_shared<boost::log::sinks::asynchronous_sink<boost::log::sinks::text_file_backend>>( pDebugBackend );
// Other code to set the filter and formatter for the sink.
boost::log::core::get()->add_sink( pNewDebugSink );
And to remove the sink some time later I have, which follows the order described in https://www.boost.org/doc/libs/1_60_0/libs/log/doc/html/log/detailed/sink_frontends.html#log.detailed.sink_frontends.async:
boost::log::core::get()->remove_sink( pNewDebugSink );
pNewDebugSink->stop();
pNewDebugSink->flush();
pNewDebugSink.reset();
I am using boost-1.60.0, and it is built with threading support enabled.
With boost::log is it safe to add or remove log sinks while still logging from other threads?
Yes, although adding and removing sinks are two distinct operations, and you may miss some log records while the old sink is removed and the new one is not added yet.
Regarding the crashes you're seeing, it seems to happen while the dedicated logging thread is still running (i.e. before the stop method completes), so it is possible that removing the sink is not related. This may be a bug in Boost.Log or some other library used by it, but your Boost version is rather old. Try updating and if it still reproduces, report a bug in Boost.Log with a reproducer code sample.

Reading OpenEXRs sequentially from a Pipe

I am trying to read a stream of EXRs from one pipe, process them and write the results into a different pipe. This this case they are named pipes but they could just as well be stdin and stdout.
My problem occurs when the pipe runs dry. OpenEXR doesn't like trying to read nothing and crashes with the following stack trace.
(gdb) run in.exr out.exr
Starting program: /Users/jon/Library/Developer/Xcode/DerivedData/compressor-abhdftqzleulxsfkpidvcazfowwo/Build/Products/Debug/compressor in.exr out.exr
Reading symbols for shared libraries +++++++++......................................................................................................... done
Reading symbols for shared libraries ............ done
Reading symbols for shared libraries . done
Reading symbols for shared libraries . done
terminate called throwing an exception
Program received signal SIGABRT, Aborted.
0x00007fff90957ce2 in __pthread_kill ()
(gdb) backtrace
#0 0x00007fff90957ce2 in __pthread_kill ()
#1 0x00007fff866f27d2 in pthread_kill ()
#2 0x00007fff866e3a7a in abort ()
#3 0x00007fff8643c7bc in abort_message ()
#4 0x00007fff86439fcf in default_terminate ()
#5 0x00007fff844d61cd in _objc_terminate ()
#6 0x00007fff8643a001 in safe_handler_caller ()
#7 0x00007fff86439fed in unexpected_defaults_to_terminate ()
#8 0x00007fff8643a040 in __cxxabiv1::__unexpected ()
#9 0x00007fff8643aefe in __cxa_call_unexpected ()
#10 0x0000000100008cfb in exr::ReadEXR (pixelBuffer=#0x7fff5fbfee00, is=#0x7fff5fbfeef8) at /Users/jon/Development/compressor/compressor/exr.cpp:47
#11 0x0000000100001c39 in main (argc=4, argv=0x7fff5fbffaa8) at /Users/jon/Development/compressor/compressor/main.cpp:79
I would really like OpenEXR to block the thread until more data becomes available but if there was some method of checking manually to see whether there is more data that would do, so long as it was somewhat robust.
Thanks.
The solution to this problem is indeed to extend Imf::Istream and implement it to block when the input pipe runs dry.
For this specific problem some considerations need to be made like pipes aren't seekable and d o not know their position, they can be worked around however.

infinite abort() in a backrace of a c++ program core dump

I have a strange problem that I can't solve. Please help!
The program is a multithreaded c++ application that runs on ARM Linux machine. Recently I began testing it for the long runs and sometimes it crashes after 1-2 days like so:
*** glibc detected ** /root/client/my_program: free(): invalid pointer: 0x002a9408 ***
When I open core dump I see that the main thread it seems has a corrupt stack: all I can see is infinite abort() calls.
GNU gdb (GDB) 7.3
...
This GDB was configured as "--host=i686 --target=arm-linux".
[New LWP 706]
[New LWP 700]
[New LWP 702]
[New LWP 703]
[New LWP 704]
[New LWP 705]
Core was generated by `/root/client/my_program'.
Program terminated with signal 6, Aborted.
#0 0x001c44d4 in raise ()
(gdb) bt
#0 0x001c44d4 in raise ()
#1 0x001c47e0 in abort ()
#2 0x001c47e0 in abort ()
#3 0x001c47e0 in abort ()
#4 0x001c47e0 in abort ()
#5 0x001c47e0 in abort ()
#6 0x001c47e0 in abort ()
#7 0x001c47e0 in abort ()
#8 0x001c47e0 in abort ()
#9 0x001c47e0 in abort ()
#10 0x001c47e0 in abort ()
#11 0x001c47e0 in abort ()
And it goes on and on. I tried to get to the bottom of it by moving up the stack: frame 3000 or even more, but eventually core dump runs out of frames and I still can't see why this has happened.
When I examine the other threads everything seems normal there.
(gdb) info threads
Id Target Id Frame
6 LWP 705 0x00132f04 in nanosleep ()
5 LWP 704 0x001e7a70 in select ()
4 LWP 703 0x00132f04 in nanosleep ()
3 LWP 702 0x00132318 in sem_wait ()
2 LWP 700 0x00132f04 in nanosleep ()
* 1 LWP 706 0x001c44d4 in raise ()
(gdb) thread 5
[Switching to thread 5 (LWP 704)]
#0 0x001e7a70 in select ()
(gdb) bt
#0 0x001e7a70 in select ()
#1 0x00057ad4 in CSerialPort::read (this=0xbea7d98c, string_buffer=..., delimiter=..., timeout_ms=1000) at CSerialPort.cpp:202
#2 0x00070de4 in CScanner::readResponse (this=0xbea7d4cc, resp_recv=..., timeout=1000, delim=...) at PidScanner.cpp:657
#3 0x00071198 in CScanner::sendExpect (this=0xbea7d4cc, cmd=..., exp_str=..., rcv_str=..., timeout=1000) at PidScanner.cpp:604
#4 0x00071d48 in CScanner::pollPid (this=0xbea7d4cc, mode=1, pid=12, pid_str=...) at PidScanner.cpp:525
#5 0x00072ce0 in CScanner::poll1 (this=0xbea7d4cc)
#6 0x00074c78 in CScanner::Poll (this=0xbea7d4cc)
#7 0x00089edc in CThread5::Thread5Poll (this=0xbea7d360)
#8 0x0008c140 in CThread5::run (this=0xbea7d360)
#9 0x00088698 in CThread::threadFunc (p=0xbea7d360)
#10 0x0012e6a0 in start_thread ()
#11 0x001e90e8 in clone ()
#12 0x001e90e8 in clone ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(Classes and functions names are a bit wierd because I changed them -:)
So, thread #1 is where the stack is corrupt, backtrace of every other (2-6) shows
Backtrace stopped: previous frame identical to this frame (corrupt stack?).
It happends because threads 2-6 are created in the thread #1.
The thing is that I can't run the program in gdb because it runs on an embedded system. I can't use remote gdb server. The only option is examining core dumps that occur not very often.
Could you please suggest something that could move me forward with this? (Maybe something else I can extract from the core dump or maybe somehow to make some hooks in the code to catch abort() call).
UPDATE: Basile Starynkevitch suggested to use Valgrind, but turns out it's ported only for ARMv7. I have ARM 926 which is ARMv5, so this won't work for me. There are some efforts to compile valgrind for ARMv5 though: Valgrind cross compilation for ARMv5tel, valgrind on the ARM9
UPDATE 2: Couldn't make Electric Fence work with my program. The program uses C++ and pthreads. The version of Efence I got, 2.1.13 crashed in a arbitrary place after I start a thread and try to do something more or less complicated (for example to put a value into an STL vector). I saw people mentioning some patches for Efence on the web but didn't have time to try them. I tried this on my Linux PC, not on the ARM, and other tools like valgrind or Dmalloc don't report any problems with the code. So, everyone using version 2.1.13 of efence be prepared to have problems with pthreads (or maybe pthread + C++ + STL, don't know).
My guess for the "infinite' aborts is that either abort() causes a loop (e.g. abort -> signal handler -> abort -> ...) or that gdb can't correctly interpret the frames on the stack.
In either case I would suggest manually checking out the stack of the problematic thread. If abort causes a loop, you should see a pattern or at least the return address of abort repeating every so often. Perhaps you can then more easily find the root of the problem by manually skipping large parts of the (repeating) stack.
Otherwise, you should find that there is no repeating pattern and hopefully the return address of the failing function somewhere on the stack. In the worst case such addresses are overwritten due to a buffer overflow or such, but perhaps then you can still get lucky and recognise what it is overwritten with.
One possibility here is that something in that thread has very, very badly smashed the stack by vastly overwriting an on-stack data structure, destroying all the needed data on the stack in the process. That makes postmortem debugging very unpleasant.
If you can reproduce the problem at will, the right thing to do is to run the thread under gdb and watch what is going on precisely at the moment when the the stack gets nuked. This may, in turn, require some sort of careful search to determine where exactly the error is happening.
If you cannot reproduce the problem at will, the best I can suggest is very carefully looking for clues in the thread local storage for that thread to see if it hints at where the thread was executing before death hit.

Stopping a Thrift server(TSimpleServer)

I have a simple use case for a Thrift Server(TSimpleServer) wherein I have a couple of threads spawned(besides the main thread). One of the newly spawned threads enters the Thrift event loop (i.e server.serve()). Upon receiving a signal in the main thread I invoke server.stop() which is causing the error posted below.
At first I thought it was an uncaught exception. However wrapping both the invocations of server.serve() and server.stop() in try-catch'es didn't help isolate the problem. Any thoughts/suggestions(on what I should be doing)? Most Thrift tutorials/guides/examples seem to talk about server start but don't seem to mention the stop scenario, any pointers/best-practices/suggestions in this regard would be great. Thanks.
Also, I am using thrift-0.7.0.
Error details:
Thrift: Fri Nov 18 21:22:47 2011 TServerTransport died on accept: TTransportExc\
eption: Interrupted
*** glibc detected *** ./build/mc_daemon: munmap_chunk(): invalid poi\
nter: 0x0000000000695f18 ***
Segmentation fault (core dumped)
Also here's the stack-trace:
#0 0x00007fb751c92f08 in ?? () from /lib/libc.so.6
#1 0x00007fb7524bb0eb in apache::thrift::server::TSimpleServer::serve (
this=0x1e5bca0) at src/server/TSimpleServer.cpp:140
#2 0x000000000046ce15 in a::b::server_thread::operator() (
this=0x695f18)
at /path/to/server_thread.cpp:80
#3 0x000000000046c1a9 in boost::detail::thread_data<boost::reference_wrapper<a\
ds::data_load::server_thread> >::run (this=0x1e5bd80)
at /usr/include/boost/thread/detail/thread.hpp:81
#4 0x00007fb7526f2b70 in thread_proxy ()
from /usr/lib/libboost_thread.so.1.40.0
#5 0x00007fb7516fd9ca in start_thread () from /lib/libpthread.so.0
#6 0x00007fb7519fa70d in clone () from /lib/libc.so.6
#7 0x0000000000000000 in ?? ()
Edit 1: I have added pseudo-code for the main thread, the thrift server thread and the background thread.
Edit 2: I seem to have resolved the original issue as noted in my answer below. However this solution leads to two rather undesirable/questionable design choices: (i) I had to introduce a thrift endpoint to enable a mechanism to stop the server (ii) The handler class for the thrift service(which is usually required to instantiate a server object) now requires a means to signal back to the server to stop, introducing a circular dependency of sorts.
Any suggestions on these design issues/choices would be greatly appreciated.
My problem seems to have stemmed from my code/design wherein I had signal-handler code in the main thread invoking stop on the server which was started in a 'server thread'. Changing this behavior(as noted in the pastebin code-snippets) helped resolve this issue.

Core dump in libc exit call

I am seeing a core dump in solaris at the exit procedure of my program.. How to debug and fix this kind of core dump?
(gdb) where
#0 0xff2cc0c0 in kill () from /usr/lib/libc.so.1
#1 0x0004dac0 in run_before_killed_handler (sig=11) at NdmpServer.cpp:1186
#2 signal handler called
#3 0xfee0ad50 in ?? ()
#4 0x00060a6c in proc_cleanup ()
#5 0xff2421ac in _exithandle () from /usr/lib/libc.so.1
#6 0xff2305d8 in exit () from /usr/lib/libc.so.1
#7 0x0003431c in _start ()
Your program apparently uses atexit(3C) to register an exit handler. The problem is occuring in that handler.
Without knowing the finer details of Solaris memory layouts, 0xfee0ad50 seems to be on the OS side. What OS call are you trying (and failing) to make in proc_cleanup?