We have some very strange problem, program starts to hang out on boost::asio library usage, called from our logging library builded on boost::log.
That happens only if we link our library in (this library works just fine in any other our project). Program starts to work if we create boost::asio::ip::tcp::socket object in initialization function of module, before log initialization, but it's not a decision of course. We also tried to add just array of same size, or more, but no, only socket object works.
GDB shows following:
#0 __pthread_mutex_unlock_usercnt (mutex=0xf779e504 <_rtld_global+1220>,
decr=1) at pthread_mutex_unlock.c:57
#1 0xf777db5e in tls_get_addr_tail (ti=0xf681388c, dtv=0x8bc4410,
the_map=0x8b31c48, the_map#entry=0x0) at dl-tls.c:730
#2 0xf778eed9 in ___tls_get_addr (ti=<optimized out>) at dl-tls.c:778
#3 0xf658ba8f in boost::asio::detail::keyword_tss_ptr<boost::asio::detail::call_stack<boost::asio::detail::task_io_service, boost::asio::detail::task_io_service_thread_info>::context>::operator boost::asio::detail::call_stack<boost::asio::detail::task_io_service, boost::asio::detail::task_io_service_thread_info>::context*() const () from /usr/local/lib/libcommon.so.0
#4 0xf6580de5 in boost::asio::detail::call_stack<boost::asio::detail::task_io_service, boost::asio::detail::task_io_service_thread_info>::top() ()
from /usr/local/lib/libcommon.so.0
#5 0xf657259e in boost::asio::asio_handler_allocate(unsigned int, ...) ()
from /usr/local/lib/libcommon.so.0
#6 0xf3491aa0 in void* boost_asio_handler_alloc_helpers::allocate<boost::function<void (boost::system::error_code const&)> >(unsigned int, boost::function<void (boost::system::error_code const&)>&) () from /usr/local/lib/liblog.so.0
#7 0xf348f43e in void boost::asio::detail::reactive_socket_service<boost::asio::ip::udp>::async_connect<boost::function<void (boost::system::error_code const&)> >(boost::asio::detail::reactive_socket_service<boost::asio::ip::udp>::implementation_type&, boost::asio::ip::basic_endpoint<boost::asio::ip::udp> const&, boost::function<void (boost::system::error_code const&)>&) ()
#8 0xf348b22a in boost::asio::async_result<boost::asio::handler_type<boost::function<void (boost::system::error_code const&)>, void (boost::system::error_code)>::type>::type boost::asio::datagram_socket_service<boost::asio::ip::udp>::async_connect<boost::function<void (boost::system::error_code const&)> >(boost::asio::detail::reactive_socket_service<boost::asio::ip::udp>::implementation_type&, boost::asio::ip::basic_endpoint<boost::asio::ip::udp> const&, boost::function<void (boost::system::error_code const&)>&&) () from /usr/local/lib/liblog.so.0
#9 0xf3487eab in boost::asio::async_result<boost::asio::handler_type<boost::function<void (boost::system::error_code const&)>, void (boost::system::error_code)>::type>::type boost::asio::basic_socket<boost::asio::ip::udp, boost::asio::datagram_socket_service<boost::asio::ip::udp> >::async_connect<boost::function<void (boost::system::error_code const&)> >(boost::asio::ip::basic_endpoint<boost::asio::ip::udp> const&, boost::function<void (boost::system::error_code const&)>&&)
() from /usr/local/lib/liblog.so.0
#10 0xf347c50f in syslog_udp_device::syslog_connect() ()
from /usr/local/lib/liblog.so.0
or this:
#0 0xf775de5d in __GI___pthread_mutex_lock (
mutex=0xf779e504 <_rtld_global+1220>) at ../nptl/pthread_mutex_lock.c:114
#1 0xf777db37 in tls_get_addr_tail (ti=0xf681388c, dtv=0x8bc4410,
the_map=0x8b31c48, the_map#entry=0x0) at dl-tls.c:722
#2 0xf778eed9 in ___tls_get_addr (ti=<optimized out>) at dl-tls.c:778
Others are the same.
No way to look deeper, cause it's not reproduce on local machine, only on kube cluster. May be you can to point me, what can cause this behaviour?
22 Sep, 20:23 UTC:
valgrind shows something with helgrind, but it's possible dataraces, that probably have no relation to problem. Other tools just hangs out and points nothing even after process -TERM kill. Determine today that another process (after adding same linkage) also hangs out on same step, but we have at least 3-4 apps with same libs, that works, even after rebuilds. Looks like ODR violation somewhere. Tried to link application that don't work with same link order as in worked app - no difference, still hangs out.
Well, that was really hard, but we determine problem. That was problem in glibc compatibility. Build machine was jessie with libc-23 and working machine was jessie with libc-19 (at least I think, that was a problem, may be other system libs). We debug that very hard, we try to compile all our libraries with same options (build machine for forked libraries build them with -O2 and our libraries with -O0), not helped.
But when we move from jessie to debian stretch on both build and running - all starts to work fine (both have libc-24). That was hard and long, cause we have many libraries, but, that solved the problem. Hope, you will never stack in such problem.
Related
I am working on one of the Powercenter 10x Transformations & Workflows and faced this error and unable to view the session logs and the emtire system is not stuck, everytime i have to force restart my laptop.I am having pretty much good configuration on my laptop with 32 GB RAM and 1 TB SSD hard disk. I even tried to recycle the integration services, but even that was also stuck and not responsive, any help is much appreciated.
(Thread 0x53deb940 (LWP 28161)):
0x000000385a87aefe in memcpy () from /lib64/libc.so.6
0x00002ba5bfb20def in zstrbuf::expand() () from /opt/infa/pc/v901/server/bin/libpmuti.so
0x00002ba5bfb20e5d in zstrbuf::overflow(int) () from /opt/infa/pc/v901/server/bin/libpmuti.so
0x00002ba5bfb1ee2a in zstreambuf::xsputn(unsigned short const*, int) () from
/opt/infa/pc/v901/server/bin/libpmuti.so
0x00002ba5bfb1e817 in zostream::write(unsigned short const*, int) () from
/opt/infa/pc/v901/server/bin/libpmuti.so
0x00000000005d9bdc in sendEMail(PmUString const&, PmUString const&, PmUString const&,
PMTValOrderedVector const&, SVarParamManager const*, eEmailType, unsigned int, int&) ()
0x0000000000567f8d in SSessionTask::sendPostSessionEmailForDTM(SSessionInfo*) ()
0x0000000000568a96 in SSessionTask::finishImpl() ()
0x0000000000595665 in STask::finish() ()
0x0000000000565f42 in SSessionTask::handlePrepareLBGroupNotification(STaskLBJobRequest*, ILBResult
const*, ILBRequestBase::EILBEvent, PmUString const&) ()
0x0000000000566c85 in SSessionTask::handleLBNotification(STaskLBGroup*, STaskLBJobRequest*,
ILBResult*&, ILBRequestBase::EILBEvent, PmUString const&) ()
0x0000000000582fc0 in SWorkflow::handleLBNotification(STask*, STaskLBGroup*, STaskLBJobRequest*,
ILBResult*&, ILBRequestBase::EILBEvent, PmUString const&) ()
0x00000000004facb2 in SHandleLBNotificationJob::execute()
Tracing level in Informatica defines the amount of data you wish to write in the session log when you execute the workflow. Tracing level is a very important aspect in Informatica as it helps in analyzing the error.
Terse: When you set the tracing level as terse, Informatica stores error information and information of rejected records. Terse tracing level occupies less space as compared to normal.
Default tracing level is normal. You can change the tracing level to terse to enhance the performance. Tracing level can be defined at an individual transformation level, or you can override the tracing level by defining it at the session level.
Please try to change the tracing level and run the Workflow once again. I hope this resolves your system issue.
I am getting lots of warnings from OpenSceneGraph and all look like this:
Warning:: Picked up error in TriangleIntersect
(-117448 -2.12751e+06 -519242, -120167 -2.17679e+06 -383117, -234607 -1.85755e+06 -431865)
(-nan, -nan, -nan)
And unfortunately I cannot trace the origin of them.
I tried to launch my program, then interrupt it with CTRL + C
and with bt print the back-trace. It gives me only some basic trace from application workflow like:
#0 0x00007fffe8116bf9 in __GI___poll (fds=0x5555584549f0, nfds=3, timeout=461) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x00007fffe45395c9 in () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 0x00007fffe45396dc in g_main_context_iteration () at /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00007ffff2a2897f in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#4 0x00007ffff29cd9fa in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#5 0x00007ffff29d6aa4 in QCoreApplication::exec() () at /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
The warnings occur when entering:
void osgViewer::ViewerBase::frame()
I tried to enter it also but it escapes immediately, prints the warnings and continues the program flow. I suppose it triggers some actions maybe in other threads.
And here comes my question, is there a chance of getting to the origin/trace of those warning messages with GDB?
I am using OpenSceneGraph 3.4
The error is coming from this line:
https://github.com/jklimke/osg/blob/master/src/osgUtil/LineSegmentIntersector.cpp#L174
That code and error isn't even in the current trunk/head OSG source, so maybe your problem could be resolved by simply updating to the latest version of OSG.
I'm new to custom hardware designs and I'm going to scale up my custom hardware which is functioning well with few boards. I need some help with making decision on prototypes and scaling up with the state of the prototypes.
This hardware is based on i.MX6Q processor & MT41K256M16TW-107 IT:P memory. This is most similar to nitrogen6_max development board.
I'm having trouble with my hardware which is really difficult to figure out as some boards are working really well and some are not (From 7 units of production 4 boards are functioning really well, one board getting segmentation faults and kernel panic while running linux application ). When I do memory calibration of bad boards those are really looks like same to good boards.
Segmentation fault is directing to some memory issues, I back traced and core dumped using linux GDB. >>
Program terminated with signal SIGSEGV, Segmentation fault.
#0 gcoHARDWARE_QuerySamplerBase (Hardware=0x22193dc, Hardware#entry=0x0,
VertexCount=0x7ef95370, VertexCount#entry=0x7ef95368, VertexBase=0x40000,
FragmentCount=FragmentCount#entry=0x2217814, FragmentBase=0x0) at
gc_hal_user_hardware_query.c:6020
6020 gc_hal_user_hardware_query.c: No such file or directory.
[Current thread is 1 (Thread 0x76feb010 (LWP 697))]
(gdb) bt
#0 gcoHARDWARE_QuerySamplerBase (Hardware=0x22193dc, Hardware#entry=0x0,
VertexCount=0x7ef95370, VertexCount#entry=0x7ef95368, VertexBase=0x40000,
FragmentCount=FragmentCount#entry=0x2217814, FragmentBase=0x0) at
gc_hal_user_hardware_query.c:6020
#1 0x765d20e8 in gcoHAL_QuerySamplerBase (Hal=<optimized out>,
VertexCount=VertexCount#entry=0x7ef95368, VertexBase=<optimized out>,
FragmentCount=FragmentCount#entry=0x2217814,
FragmentBase=0x0) at gc_hal_user_query.c:692
#2 0x681e31ec in gcChipRecompileEvaluateKeyStates (chipCtx=0x0,
gc=0x7ef95380) at src/chip/gc_chip_state.c:2115
#3 gcChipValidateRecompileState (gc=0x7ef95380, gc#entry=0x21bd96c,
chipCtx=0x0, chipCtx#entry=0x2217814) at src/chip/gc_chip_state.c:2634
#4 0x681c6da8 in __glChipDrawValidateState (gc=0x21bd96c) at
src/chip/gc_chip_draw.c:5217
#5 0x68195688 in __glDrawValidateState (gc=0x21bd96c) at
src/glcore/gc_es_draw.c:585
#6 __glDrawPrimitive (gc=0x21bd96c, mode=<optimized out>) at
src/glcore/gc_es_draw.c:943
#7 0x68171048 in glDrawArrays (mode=4, first=6, count=6) at
src/glcore/gc_es_api.c:399
#8 0x76c9ac72 in CEGUI::OpenGL3GeometryBuffer::draw() const () from
/usr/lib/libCEGUIOpenGLRenderer-0.so.2
#9 0x76dd1aee in CEGUI::RenderQueue::draw() const () from
/usr/lib/libCEGUIBase-0.so.2
#10 0x76e317d8 in CEGUI::RenderingSurface::draw(CEGUI::RenderQueue const&,
CEGUI::RenderQueueEventArgs&) () from /usr/lib/libCEGUIBase-0.so.2
#11 0x76e31838 in CEGUI::RenderingSurface::drawContent() () from
/usr/lib/libCEGUIBase-0.so.2
#12 0x76e36d30 in CEGUI::GUIContext::drawContent() () from
/usr/lib/libCEGUIBase-0.so.2
#13 0x76e31710 in CEGUI::RenderingSurface::draw() () from
/usr/lib/libCEGUIBase-0.so.2
#14 0x001bf79c in tengri::gui::cegui::System::Impl::draw (this=0x2374f08) at
codebase/src/gui/cegui/system.cpp:107
#15 tengri::gui::cegui::System::draw (this=this#entry=0x2374e74) at
codebase/src/gui/cegui/system.cpp:212
#16 0x000b151e in falcon::osd::view::MainWindowBase::Impl::preNativeUpdate
(this=0x2374e10) at codebase/src/osd/view/MainWindow.cpp:51
#17 falcon::osd::view::MainWindowBase::preNativeUpdate
(this=this#entry=0x209fe30) at codebase/src/osd/view/MainWindow.cpp:91
#18 0x000c4686 in falcon::osd::view::FBMainWindow::update (this=0x209fe00)
at
codebase/include/falcon/osd/view/FBMainWindow.h:56
#19 falcon::osd::view::App::Impl::execute (this=0x209fdb0) at
codebase/src/osd/view/app_view_osd_falcon.cpp:139
#20 falcon::osd::view::App::execute (this=<optimized out>) at
codebase/src/osd/view/app_view_osd_falcon.cpp:176
#21 0x000475f6 in falcon::osd::App::execute (this=this#entry=0x7ef95c84) at
codebase/src/osd/app_osd_falcon.cpp:75
#22 0x00047598 in main () at codebase/src/main.cpp:5
(gdb) Quit
Here I have attached NXP tool calibration results for 2 good boards and 1 bad(getting segmentation faults) board. Click on following links.
Board 1
Board 2
Board 3
I did stress test using stressapptest and it was a over night test. But I didn't get any fault and test was passed.
From above 3 boards Board 1 and Board 2 are working really well and Board 3 is getting kernel panics while running same application on 3 boards. Can you help me to figure out any clue from this results from above 3 boards ?
I did 50 units of production 6 months ago and only 30 were worked properly. But that is with Alliance memory AS4C256M16D3A-12BCN. So will this be an issue of the design ? If this is an issue of the ddr layout or whole design why some boards are working really well ?
Will this be an issue of the manufacturing side ? Then how this could be happen with the same production ? Because some are working and some are not.
Will stressapptest stress power as well. Do you know any linux app which can stress power as well?
I don't have much experience with mass production and but I like to move forward after learning and correcting this issues. I must be thankful to you if you will kindly reply me soon.
I have a single threaded program that crashes consistently at certain points right after free() is called when running in non-debug mode.
When in debug mode however, debugger breaks on the line that calls free() even though there are no break points set. When I try to step to the next line again, debugger breaks again on the same line. Stepping once again resumes execution as normal. No crash, no segfault, nothing.
EDIT-1: Contrary to what I wrote above, crashes in non-debug mode
turns out to be inconsistent, which makes me think I am somehow
writing somewhere that I shouldn't. (Breaks in debug mode are
still consistent, though.)
Call stack at the breaks shows some windows library functions(I think) called after the function that calls free() statement. I have no idea how to interpret them. And consequently, I have no idea how to go about debugging in this situation.
I have provided the call stacks at break points below. Can someone point me in a direction where I can tackle the problem? What might be causing the breaks in debugger mode?
Program is run on Windows Vista, compiled with gcc 4.9.2, debugger used is gdb. Assume double release is not the case.(I use ::operator new and ::operator delete overloads that catch that. Situation described is the same without these overloads as well.)
Note that the crash(or the involuntary breaks in debugger) is consistent. Happens every time, in the same execution point.
Here is the call stack at the initial break:
(Note that free_wrapper() is the function that houses free() statement that causes the crash/breaks.)
#0 0x770186ff ntdll!DbgBreakPoint() (C:\Windows\system32\ntdll.dll:??)
#1 0x77082edb ntdll!RtlpNtMakeTemporaryKey() (C:\Windows\system32\ntdll.dll:??)
#2 0x7706b953 ntdll!RtlImageRvaToVa() (C:\Windows\system32\ntdll.dll:??)
#3 0x77052c4f ntdll!RtlQueryRegistryValues() (C:\Windows\system32\ntdll.dll:??)
#4 0x77083f3b ntdll!RtlpNtMakeTemporaryKey() (C:\Windows\system32\ntdll.dll:??)
#5 0x7704bcfd ntdll!EtwSendNotification() (C:\Windows\system32\ntdll.dll:??)
#6 0x770374d5 ntdll!RtlEnumerateGenericTableWithoutSplaying() (C:\Windows\system32\ntdll.dll:??)
#7 0x75829dc6 KERNEL32!HeapFree() (C:\Windows\system32\kernel32.dll:??)
#8 0x75a99c03 msvcrt!free() (C:\Windows\system32\msvcrt.dll:??)
#9 0x350000 ?? () (??:??)
--> #10 0x534020 free_wrapper(pv=0x352af0) (C:\dm\bin\codes\CodeBlocks\ProjTemp\src\Unrelated\MemMgmt.cpp:282)
#11 0x407f74 operator delete(pv=0x352af0) (C:\dm\bin\codes\CodeBlocks\ProjTemp\main.cpp:1002)
#12 0x629a74 __gnu_cxx::new_allocator<char>::deallocate(this=0x22f718, __p=0x352af0 "\nÿÿÿÿÿÿº\r%") (C:/Program Files/CodeBlocks/MinGW/lib/gcc/mingw32/4.9.2/include/c++/ext/new_allocator.h:110)
#13 0x6c2257 std::allocator_traits<std::allocator<char> >::deallocate(__a=..., __p=0x352af0 "\nÿÿÿÿÿÿº\r%", __n=50) (C:/Program Files/CodeBlocks/MinGW/lib/gcc/mingw32/4.9.2/include/c++/bits/alloc_traits.h:383)
#14 0x611940 basic_CDataUnit<std::allocator<char> >::~basic_CDataUnit(this=0x22f714, __vtt_parm=0x781df4 <VTT for basic_CDataUnit_TDB<std::allocator<char> >+4>, __in_chrg=<optimized out>) (include/DataUnit/CDataUnit.h:112)
#15 0x61dfa1 basic_CDataUnit_TDB<std::allocator<char> >::~basic_CDataUnit_TDB(this=0x22f714, __in_chrg=<optimized out>, __vtt_parm=<optimized out>) (include/DataUnit/CDataUnit_TDB.h:125)
#16 0x503898 CTblSegHandle::UpdateChainedRowData(this=0x353cf8, new_row_data=..., old_row_fetch_res=..., vColTypes=..., block_hnd=...) (C:\dm\bin\codes\CodeBlocks\ProjTemp\src\SegHandles\CTblSegHandle.cpp:912)
#17 0x502fcc CTblSegHandle::UpdateRowData(this=0x353cf8, new_row_data=..., old_row_fetch_res=..., vColTypes=..., block_hnd=...) (C:\dm\bin\codes\CodeBlocks\ProjTemp\src\SegHandles\CTblSegHandle.cpp:764)
#18 0x443272 UpdateRow(row_addr=..., new_data_unit=..., vColTypes=..., block_hnd=..., seg_hnd=...) (C:\dm\bin\codes\CodeBlocks\ProjTemp\src\DbUtilities.cpp:910)
#19 0x443470 UpdateRow(row_addr=..., vColValues=..., vColTypes=...) (C:\dm\bin\codes\CodeBlocks\ProjTemp\src\DbUtilities.cpp:935)
#20 0x4023e3 test_RowChaining() (C:\dm\bin\codes\CodeBlocks\ProjTemp\main.cpp:234)
#21 0x4081c6 main() (C:\dm\bin\codes\CodeBlocks\ProjTemp\main.cpp:1034)
And here is the call stack when I step to the next line and debugger breaks one last time before resuming normal execution:
#0 0x770186ff ntdll!DbgBreakPoint() (C:\Windows\system32\ntdll.dll:??)
#1 0x77082edb ntdll!RtlpNtMakeTemporaryKey() (C:\Windows\system32\ntdll.dll:??)
#2 0x77052c7f ntdll!RtlQueryRegistryValues() (C:\Windows\system32\ntdll.dll:??)
#3 0x77083f3b ntdll!RtlpNtMakeTemporaryKey() (C:\Windows\system32\ntdll.dll:??)
#4 0x7704bcfd ntdll!EtwSendNotification() (C:\Windows\system32\ntdll.dll:??)
#5 0x770374d5 ntdll!RtlEnumerateGenericTableWithoutSplaying() (C:\Windows\system32\ntdll.dll:??)
#6 0x75829dc6 KERNEL32!HeapFree() (C:\Windows\system32\kernel32.dll:??)
#7 0x75a99c03 msvcrt!free() (C:\Windows\system32\msvcrt.dll:??)
#8 0x350000 ?? () (??:??)
--> #9 0x534020 free_wrapper(pv=0x352af0) (C:\dm\bin\codes\CodeBlocks\ProjTemp\src\Unrelated\MemMgmt.cpp:282)
#10 0x407f74 operator delete(pv=0x352af0) (C:\dm\bin\codes\CodeBlocks\ProjTemp\main.cpp:1002)
#11 0x629a74 __gnu_cxx::new_allocator<char>::deallocate(this=0x22f718, __p=0x352af0 "\nÿÿÿÿÿÿº\r%") (C:/Program Files/CodeBlocks/MinGW/lib/gcc/mingw32/4.9.2/include/c++/ext/new_allocator.h:110)
#12 0x6c2257 std::allocator_traits<std::allocator<char> >::deallocate(__a=..., __p=0x352af0 "\nÿÿÿÿÿÿº\r%", __n=50) (C:/Program Files/CodeBlocks/MinGW/lib/gcc/mingw32/4.9.2/include/c++/bits/alloc_traits.h:383)
#13 0x611940 basic_CDataUnit<std::allocator<char> >::~basic_CDataUnit(this=0x22f714, __vtt_parm=0x781df4 <VTT for basic_CDataUnit_TDB<std::allocator<char> >+4>, __in_chrg=<optimized out>) (include/DataUnit/CDataUnit.h:112)
#14 0x61dfa1 basic_CDataUnit_TDB<std::allocator<char> >::~basic_CDataUnit_TDB(this=0x22f714, __in_chrg=<optimized out>, __vtt_parm=<optimized out>) (include/DataUnit/CDataUnit_TDB.h:125)
#15 0x503898 CTblSegHandle::UpdateChainedRowData(this=0x353cf8, new_row_data=..., old_row_fetch_res=..., vColTypes=..., block_hnd=...) (C:\dm\bin\codes\CodeBlocks\ProjTemp\src\SegHandles\CTblSegHandle.cpp:912)
#16 0x502fcc CTblSegHandle::UpdateRowData(this=0x353cf8, new_row_data=..., old_row_fetch_res=..., vColTypes=..., block_hnd=...) (C:\dm\bin\codes\CodeBlocks\ProjTemp\src\SegHandles\CTblSegHandle.cpp:764)
#17 0x443272 UpdateRow(row_addr=..., new_data_unit=..., vColTypes=..., block_hnd=..., seg_hnd=...) (C:\dm\bin\codes\CodeBlocks\ProjTemp\src\DbUtilities.cpp:910)
#18 0x443470 UpdateRow(row_addr=..., vColValues=..., vColTypes=...) (C:\dm\bin\codes\CodeBlocks\ProjTemp\src\DbUtilities.cpp:935)
#19 0x4023e3 test_RowChaining() (C:\dm\bin\codes\CodeBlocks\ProjTemp\main.cpp:234)
#20 0x4081c6 main() (C:\dm\bin\codes\CodeBlocks\ProjTemp\main.cpp:1034)
When I see a call stack that looks like yours the most common cause is heap corruption. A double free or attempting to free a pointer that was never allocated can have similar call stacks. Since you characterize the crash as inconsistent that makes heap corruption the more likely candidate. Double frees and freeing unallocated pointers tend to crash consistently in the same place. To hunt down issues like this I usually:
Install Debugging Tools for Windows
Open a command prompt with elevated privileges
Change directory to the directory that Debugging Tools for Windows is installed in.
Enable full page heap by running gflags.exe -p /enable applicationName.exe /full
Launch application with debugger attached and recreate the issue.
Disable full page heap for the application by running gflags.exe -p /disable applicationName.exe
Running the application with full page heap places an inaccessible page at the end of each allocation so that the program stops immediately if it accesses memory beyond the allocation. This is according to the page GFlags and PageHeap. If a buffer overflow is causing the heap corruption this setting should cause the debugger to break when the overflow occurs..
Make sure to disable page heap when you are done debugging. Running under full page heap can greatly increase memory pressure on an application by making every heap allocation consume an entire page.
You can use valgrind to check if there is any invalid read /write or any invalid free is there in your CODE.
valgrind -v --leak-check=full --show-reachable=yes --log-file=log_valgrind ./Process
log_valgrind will contains invalid read/write.
Some time ago we separate our big project with almost static libraries to many projects with dynamic libraries.
Since then we stated seeing problems on shutdown.
Sometimes, the process would not terminate. With gdb I found, that on object destruction a segfault occurs, but the process is blocked in futex_wait.
I've since improved the code, by creating global objects are now created in function, instead of global static data. That reduced the problem: it doesn't happen in my development environment anymore.
However, in test environment (rare) and in production environment (often) processes still get stuck on shutdown. So we need to restart container manually, or have some kind of health check.
We are trying to simulate this kind of situation on standalone docker container running under Kubernetes where we have the process running under circusd and we see following:
#0 malloc_consolidate (av=0xf47fc400 <main_arena>) at malloc.c:4151
#1 0xf46ff1ab in _int_free (av=0xf47fc400 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:4057
#2 0xf48c6e68 in operator delete(void*) () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#3 0xf52d173d in std::_Deque_base<boost::log::v2_mt_posix::record_view, std::allocator<boost::log::v2_mt_posix::record_view> >::~_Deque_base() () from /usr/local/lib/liblog.so.0
#4 0xf52d18b3 in std::deque<boost::log::v2_mt_posix::record_view, std::allocator<boost::log::v2_mt_posix::record_view> >::~deque() () from /usr/local/lib/liblog.so.0
#5 0xf52d1940 in boost::log::v2_mt_posix::sinks::bounded_fifo_queue<4000u, boost::log::v2_mt_posix::sinks::drop_on_overflow>::~bounded_fifo_queue() () from /usr/local/lib/liblog.so.0
#6 0xf52d462e in boost::log::v2_mt_posix::sinks::asynchronous_sink<cout_sink, boost::log::v2_mt_posix::sinks::bounded_fifo_queue<4000u, boost::log::v2_mt_posix::sinks::drop_on_overflow>
>::~asynchronous_sink() () from /usr/local/lib/liblog.so.0
#7 0xf52d47f4 in asynchronous_sink<cout_sink>::~asynchronous_sink() () from /usr/local/lib/liblog.so.0
#8 0xf52c199a in boost::detail::sp_counted_impl_pd<asynchronous_sink<cout_sink>*, boost::detail::sp_ms_deleter<asynchronous_sink<cout_sink> >
>::dispose() () from /usr/local/lib/liblog.so.0
#9 0xf51f3e7b in boost::log::v2_mt_posix::core::~core() () from /usr/lib/libboost_log.so.1.58.0
#10 0xf51f6529 in boost::detail::sp_counted_impl_p<boost::log::v2_mt_posix::core>::dispose() () from /usr/lib/libboost_log.so.1.58.0
#11 0xf51f6160 in boost::shared_ptr<boost::log::v2_mt_posix::core>::~shared_ptr() () from /usr/lib/libboost_log.so.1.58.0
#12 0xf46bcfb3 in __cxa_finalize (d=0xf526fa88) at cxa_finalize.c:56
#13 0xf51eaab3 in ?? () from /usr/lib/libboost_log.so.1.58.0
#14 0xf7769e2c in _dl_fini () at dl-fini.c:252
#15 0xf46bcc21 in __run_exit_handlers (status=status#entry=0, listp=0xf47fc3a4 <__exit_funcs>, run_list_atexit=run_list_atexit#entry=true) at exit.c:82
#16 0xf46bcc7d in __GI_exit (status=0) at exit.c:104
#17 0xf46a572b in __libc_start_main (main=0x8060dc0, argc=5, argv=0xffdd1514, init=0x8088090, fini=0x8088100, rtld_fini=0xf7769c50 <_dl_fini>, stack_end=0xffdd150c) at libc-start.c:321
#18 0x080630cc in ?? ()
I have no ideas how to progress from here. What is happening? Why do we get the segfault in boost::log::core destruction in this environment?
Does anyone have some advice how can I find it, probably, based on experience?