Powercenter 10x "Error : Session cannot be aborted or restarted"

Powercenter 10x "Error : Session cannot be aborted or restarted" - informatica

I am working on one of the Powercenter 10x Transformations & Workflows and faced this error and unable to view the session logs and the emtire system is not stuck, everytime i have to force restart my laptop.I am having pretty much good configuration on my laptop with 32 GB RAM and 1 TB SSD hard disk. I even tried to recycle the integration services, but even that was also stuck and not responsive, any help is much appreciated.
(Thread 0x53deb940 (LWP 28161)):
0x000000385a87aefe in memcpy () from /lib64/libc.so.6
0x00002ba5bfb20def in zstrbuf::expand() () from /opt/infa/pc/v901/server/bin/libpmuti.so
0x00002ba5bfb20e5d in zstrbuf::overflow(int) () from /opt/infa/pc/v901/server/bin/libpmuti.so
0x00002ba5bfb1ee2a in zstreambuf::xsputn(unsigned short const*, int) () from
/opt/infa/pc/v901/server/bin/libpmuti.so
0x00002ba5bfb1e817 in zostream::write(unsigned short const*, int) () from
/opt/infa/pc/v901/server/bin/libpmuti.so
0x00000000005d9bdc in sendEMail(PmUString const&, PmUString const&, PmUString const&,
PMTValOrderedVector const&, SVarParamManager const*, eEmailType, unsigned int, int&) ()
0x0000000000567f8d in SSessionTask::sendPostSessionEmailForDTM(SSessionInfo*) ()
0x0000000000568a96 in SSessionTask::finishImpl() ()
0x0000000000595665 in STask::finish() ()
0x0000000000565f42 in SSessionTask::handlePrepareLBGroupNotification(STaskLBJobRequest*, ILBResult
const*, ILBRequestBase::EILBEvent, PmUString const&) ()
0x0000000000566c85 in SSessionTask::handleLBNotification(STaskLBGroup*, STaskLBJobRequest*,
ILBResult*&, ILBRequestBase::EILBEvent, PmUString const&) ()
0x0000000000582fc0 in SWorkflow::handleLBNotification(STask*, STaskLBGroup*, STaskLBJobRequest*,
ILBResult*&, ILBRequestBase::EILBEvent, PmUString const&) ()
0x00000000004facb2 in SHandleLBNotificationJob::execute()

Tracing level in Informatica defines the amount of data you wish to write in the session log when you execute the workflow. Tracing level is a very important aspect in Informatica as it helps in analyzing the error.
Terse: When you set the tracing level as terse, Informatica stores error information and information of rejected records. Terse tracing level occupies less space as compared to normal.
Default tracing level is normal. You can change the tracing level to terse to enhance the performance. Tracing level can be defined at an individual transformation level, or you can override the tracing level by defining it at the session level.
Please try to change the tracing level and run the Workflow once again. I hope this resolves your system issue.

Related

Program hangs out on tls_get_addr call

We have some very strange problem, program starts to hang out on boost::asio library usage, called from our logging library builded on boost::log.
That happens only if we link our library in (this library works just fine in any other our project). Program starts to work if we create boost::asio::ip::tcp::socket object in initialization function of module, before log initialization, but it's not a decision of course. We also tried to add just array of same size, or more, but no, only socket object works.
GDB shows following:
#0 __pthread_mutex_unlock_usercnt (mutex=0xf779e504 <_rtld_global+1220>,
decr=1) at pthread_mutex_unlock.c:57
#1 0xf777db5e in tls_get_addr_tail (ti=0xf681388c, dtv=0x8bc4410,
the_map=0x8b31c48, the_map#entry=0x0) at dl-tls.c:730
#2 0xf778eed9 in ___tls_get_addr (ti=<optimized out>) at dl-tls.c:778
#3 0xf658ba8f in boost::asio::detail::keyword_tss_ptr<boost::asio::detail::call_stack<boost::asio::detail::task_io_service, boost::asio::detail::task_io_service_thread_info>::context>::operator boost::asio::detail::call_stack<boost::asio::detail::task_io_service, boost::asio::detail::task_io_service_thread_info>::context*() const () from /usr/local/lib/libcommon.so.0
#4 0xf6580de5 in boost::asio::detail::call_stack<boost::asio::detail::task_io_service, boost::asio::detail::task_io_service_thread_info>::top() ()
from /usr/local/lib/libcommon.so.0
#5 0xf657259e in boost::asio::asio_handler_allocate(unsigned int, ...) ()
from /usr/local/lib/libcommon.so.0
#6 0xf3491aa0 in void* boost_asio_handler_alloc_helpers::allocate<boost::function<void (boost::system::error_code const&)> >(unsigned int, boost::function<void (boost::system::error_code const&)>&) () from /usr/local/lib/liblog.so.0
#7 0xf348f43e in void boost::asio::detail::reactive_socket_service<boost::asio::ip::udp>::async_connect<boost::function<void (boost::system::error_code const&)> >(boost::asio::detail::reactive_socket_service<boost::asio::ip::udp>::implementation_type&, boost::asio::ip::basic_endpoint<boost::asio::ip::udp> const&, boost::function<void (boost::system::error_code const&)>&) ()
#8 0xf348b22a in boost::asio::async_result<boost::asio::handler_type<boost::function<void (boost::system::error_code const&)>, void (boost::system::error_code)>::type>::type boost::asio::datagram_socket_service<boost::asio::ip::udp>::async_connect<boost::function<void (boost::system::error_code const&)> >(boost::asio::detail::reactive_socket_service<boost::asio::ip::udp>::implementation_type&, boost::asio::ip::basic_endpoint<boost::asio::ip::udp> const&, boost::function<void (boost::system::error_code const&)>&&) () from /usr/local/lib/liblog.so.0
#9 0xf3487eab in boost::asio::async_result<boost::asio::handler_type<boost::function<void (boost::system::error_code const&)>, void (boost::system::error_code)>::type>::type boost::asio::basic_socket<boost::asio::ip::udp, boost::asio::datagram_socket_service<boost::asio::ip::udp> >::async_connect<boost::function<void (boost::system::error_code const&)> >(boost::asio::ip::basic_endpoint<boost::asio::ip::udp> const&, boost::function<void (boost::system::error_code const&)>&&)
() from /usr/local/lib/liblog.so.0
#10 0xf347c50f in syslog_udp_device::syslog_connect() ()
from /usr/local/lib/liblog.so.0
or this:
#0 0xf775de5d in __GI___pthread_mutex_lock (
mutex=0xf779e504 <_rtld_global+1220>) at ../nptl/pthread_mutex_lock.c:114
#1 0xf777db37 in tls_get_addr_tail (ti=0xf681388c, dtv=0x8bc4410,
the_map=0x8b31c48, the_map#entry=0x0) at dl-tls.c:722
#2 0xf778eed9 in ___tls_get_addr (ti=<optimized out>) at dl-tls.c:778
Others are the same.
No way to look deeper, cause it's not reproduce on local machine, only on kube cluster. May be you can to point me, what can cause this behaviour?
22 Sep, 20:23 UTC:
valgrind shows something with helgrind, but it's possible dataraces, that probably have no relation to problem. Other tools just hangs out and points nothing even after process -TERM kill. Determine today that another process (after adding same linkage) also hangs out on same step, but we have at least 3-4 apps with same libs, that works, even after rebuilds. Looks like ODR violation somewhere. Tried to link application that don't work with same link order as in worked app - no difference, still hangs out.

Well, that was really hard, but we determine problem. That was problem in glibc compatibility. Build machine was jessie with libc-23 and working machine was jessie with libc-19 (at least I think, that was a problem, may be other system libs). We debug that very hard, we try to compile all our libraries with same options (build machine for forked libraries build them with -O2 and our libraries with -O0), not helped.
But when we move from jessie to debian stretch on both build and running - all starts to work fine (both have libc-24). That was hard and long, cause we have many libraries, but, that solved the problem. Hope, you will never stack in such problem.

Is it safe to add or remove log sinks while still logging from other threads with boost log?

With boost::log is it safe to add or remove log sinks while still logging from other threads? Is there some manual locking that I need to do to make these operations thread-safe?
What I am trying to do is to start a new text file that contains a subset of entries of an existing log file between two points in my program.
For example if I have the following entries going to a log file "main_debug.log"
line 1
line 2
line 3
line 4
And then I add a new sink after line 1, and remove it after line 3, I would see "new_debug.log" containing the following entries
line 2
line 3
What I have seems to work most of the time, but I am occasionally seeing segmentation faults occurring within boost::log. An example of an instance of this occurring when I managed to catch it with gdb:
Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 8760]
boost::intrusive::list_impl<boost::intrusive::derivation_value_traits<boost::log::v2_mt_posix::attribute_value_set::node, boost::log::v2_mt_posix::attribute_value_set::implementation::node_traits, (boost::intrusive::link_mode_type)0>, unsigned int, true, void>::clear_and_dispose<boost::log::v2_mt_posix::attribute_value_set::implementation::disposer> (this=0x5e64aff4, disposer=...) at /boost-1.60.0/boost/intrusive/list.hpp:738
738 /boost-1.60.0/boost/intrusive/list.hpp: No such file or directory.
(gdb) bt
#0 boost::intrusive::list_impl<boost::intrusive::derivation_value_traits<boost::log::v2_mt_posix::attribute_value_set::node, boost::log::v2_mt_posix::attribute_value_set::implementation::node_traits, (boost::intrusive::link_mode_type)0>, unsigned int, true, void>::clear_and_dispose<boost::log::v2_mt_posix::attribute_value_set::implementation::disposer> (this=0x5e64aff4, disposer=...) at /boost-1.60.0/boost/intrusive/list.hpp:738
#1 boost::log::v2_mt_posix::attribute_value_set::implementation::~implementation (this=0x5e64afe8, __in_chrg=<optimized out>) at /boost-1.60.0/libs/log/src/attribute_value_set.cpp:150
#2 boost::log::v2_mt_posix::attribute_value_set::implementation::destroy (p=0x5e64afe8) at /boost-1.60.0/libs/log/src/attribute_value_set.cpp:239
#3 boost::log::v2_mt_posix::attribute_value_set::~attribute_value_set (this=0x5e64b3e4, __in_chrg=<optimized out>) at /boost-1.60.0/libs/log/src/attribute_value_set.cpp:519
#4 0x76e3bbac in boost::log::v2_mt_posix::record_view::public_data::~public_data (this=0x5e64b3e0, __in_chrg=<optimized out>) at /boost-1.60.0/boost/log/core/record_view.hpp:86
#5 boost::log::v2_mt_posix::record_view::private_data::~private_data (this=0x5e64b3e0, __in_chrg=<optimized out>) at /boost-1.60.0/libs/log/src/core.cpp:79
#6 boost::log::v2_mt_posix::record_view::private_data::destroy (this=0x5e64b3e0) at /boost-1.60.0/libs/log/src/core.cpp:131
#7 boost::log::v2_mt_posix::record_view::public_data::destroy (p=0x5e64b3e0) at /boost-1.60.0/libs/log/src/core.cpp:184
#8 0x0020b030 in boost::log::v2_mt_posix::sinks::asynchronous_sink<boost::log::v2_mt_posix::sinks::text_file_backend, boost::log::v2_mt_posix::sinks::unbounded_fifo_queue>::run() ()
#9 0x76d4be6c in boost::(anonymous namespace)::thread_proxy (param=<optimized out>) at /boost-1.60.0/libs/thread/src/pthread/thread.cpp:167
#10 0x76c22f00 in ?? () from /lib/libpthread.so.0
To add a new sink I am doing the following:
const auto pDebugBackend = boost::make_shared<boost::log::sinks::text_file_backend>(
boost::log::keywords::file_name = "debug.log",
boost::log::keywords::channel = "InfoConsole" );
const auto pNewDebugSink = boost::make_shared<boost::log::sinks::asynchronous_sink<boost::log::sinks::text_file_backend>>( pDebugBackend );
// Other code to set the filter and formatter for the sink.
boost::log::core::get()->add_sink( pNewDebugSink );
And to remove the sink some time later I have, which follows the order described in https://www.boost.org/doc/libs/1_60_0/libs/log/doc/html/log/detailed/sink_frontends.html#log.detailed.sink_frontends.async:
boost::log::core::get()->remove_sink( pNewDebugSink );
pNewDebugSink->stop();
pNewDebugSink->flush();
pNewDebugSink.reset();
I am using boost-1.60.0, and it is built with threading support enabled.

With boost::log is it safe to add or remove log sinks while still logging from other threads?
Yes, although adding and removing sinks are two distinct operations, and you may miss some log records while the old sink is removed and the new one is not added yet.
Regarding the crashes you're seeing, it seems to happen while the dedicated logging thread is still running (i.e. before the stop method completes), so it is possible that removing the sink is not related. This may be a bug in Boost.Log or some other library used by it, but your Boost version is rather old. Try updating and if it still reproduces, report a bug in Boost.Log with a reproducer code sample.

Unexpected behavior of custom hardware design based on i.MX6 & MT41K256M16TW-107 IT:P

I'm new to custom hardware designs and I'm going to scale up my custom hardware which is functioning well with few boards. I need some help with making decision on prototypes and scaling up with the state of the prototypes.
This hardware is based on i.MX6Q processor & MT41K256M16TW-107 IT:P memory. This is most similar to nitrogen6_max development board.
I'm having trouble with my hardware which is really difficult to figure out as some boards are working really well and some are not (From 7 units of production 4 boards are functioning really well, one board getting segmentation faults and kernel panic while running linux application ). When I do memory calibration of bad boards those are really looks like same to good boards.
Segmentation fault is directing to some memory issues, I back traced and core dumped using linux GDB. >>
Program terminated with signal SIGSEGV, Segmentation fault.
#0 gcoHARDWARE_QuerySamplerBase (Hardware=0x22193dc, Hardware#entry=0x0,
VertexCount=0x7ef95370, VertexCount#entry=0x7ef95368, VertexBase=0x40000,
FragmentCount=FragmentCount#entry=0x2217814, FragmentBase=0x0) at
gc_hal_user_hardware_query.c:6020
6020 gc_hal_user_hardware_query.c: No such file or directory.
[Current thread is 1 (Thread 0x76feb010 (LWP 697))]
(gdb) bt
#0 gcoHARDWARE_QuerySamplerBase (Hardware=0x22193dc, Hardware#entry=0x0,
VertexCount=0x7ef95370, VertexCount#entry=0x7ef95368, VertexBase=0x40000,
FragmentCount=FragmentCount#entry=0x2217814, FragmentBase=0x0) at
gc_hal_user_hardware_query.c:6020
#1 0x765d20e8 in gcoHAL_QuerySamplerBase (Hal=<optimized out>,
VertexCount=VertexCount#entry=0x7ef95368, VertexBase=<optimized out>,
FragmentCount=FragmentCount#entry=0x2217814,
FragmentBase=0x0) at gc_hal_user_query.c:692
#2 0x681e31ec in gcChipRecompileEvaluateKeyStates (chipCtx=0x0,
gc=0x7ef95380) at src/chip/gc_chip_state.c:2115
#3 gcChipValidateRecompileState (gc=0x7ef95380, gc#entry=0x21bd96c,
chipCtx=0x0, chipCtx#entry=0x2217814) at src/chip/gc_chip_state.c:2634
#4 0x681c6da8 in __glChipDrawValidateState (gc=0x21bd96c) at
src/chip/gc_chip_draw.c:5217
#5 0x68195688 in __glDrawValidateState (gc=0x21bd96c) at
src/glcore/gc_es_draw.c:585
#6 __glDrawPrimitive (gc=0x21bd96c, mode=<optimized out>) at
src/glcore/gc_es_draw.c:943
#7 0x68171048 in glDrawArrays (mode=4, first=6, count=6) at
src/glcore/gc_es_api.c:399
#8 0x76c9ac72 in CEGUI::OpenGL3GeometryBuffer::draw() const () from
/usr/lib/libCEGUIOpenGLRenderer-0.so.2
#9 0x76dd1aee in CEGUI::RenderQueue::draw() const () from
/usr/lib/libCEGUIBase-0.so.2
#10 0x76e317d8 in CEGUI::RenderingSurface::draw(CEGUI::RenderQueue const&,
CEGUI::RenderQueueEventArgs&) () from /usr/lib/libCEGUIBase-0.so.2
#11 0x76e31838 in CEGUI::RenderingSurface::drawContent() () from
/usr/lib/libCEGUIBase-0.so.2
#12 0x76e36d30 in CEGUI::GUIContext::drawContent() () from
/usr/lib/libCEGUIBase-0.so.2
#13 0x76e31710 in CEGUI::RenderingSurface::draw() () from
/usr/lib/libCEGUIBase-0.so.2
#14 0x001bf79c in tengri::gui::cegui::System::Impl::draw (this=0x2374f08) at
codebase/src/gui/cegui/system.cpp:107
#15 tengri::gui::cegui::System::draw (this=this#entry=0x2374e74) at
codebase/src/gui/cegui/system.cpp:212
#16 0x000b151e in falcon::osd::view::MainWindowBase::Impl::preNativeUpdate
(this=0x2374e10) at codebase/src/osd/view/MainWindow.cpp:51
#17 falcon::osd::view::MainWindowBase::preNativeUpdate
(this=this#entry=0x209fe30) at codebase/src/osd/view/MainWindow.cpp:91
#18 0x000c4686 in falcon::osd::view::FBMainWindow::update (this=0x209fe00)
at
codebase/include/falcon/osd/view/FBMainWindow.h:56
#19 falcon::osd::view::App::Impl::execute (this=0x209fdb0) at
codebase/src/osd/view/app_view_osd_falcon.cpp:139
#20 falcon::osd::view::App::execute (this=<optimized out>) at
codebase/src/osd/view/app_view_osd_falcon.cpp:176
#21 0x000475f6 in falcon::osd::App::execute (this=this#entry=0x7ef95c84) at
codebase/src/osd/app_osd_falcon.cpp:75
#22 0x00047598 in main () at codebase/src/main.cpp:5
(gdb) Quit
Here I have attached NXP tool calibration results for 2 good boards and 1 bad(getting segmentation faults) board. Click on following links.
Board 1
Board 2
Board 3
I did stress test using stressapptest and it was a over night test. But I didn't get any fault and test was passed.
From above 3 boards Board 1 and Board 2 are working really well and Board 3 is getting kernel panics while running same application on 3 boards. Can you help me to figure out any clue from this results from above 3 boards ?
I did 50 units of production 6 months ago and only 30 were worked properly. But that is with Alliance memory AS4C256M16D3A-12BCN. So will this be an issue of the design ? If this is an issue of the ddr layout or whole design why some boards are working really well ?
Will this be an issue of the manufacturing side ? Then how this could be happen with the same production ? Because some are working and some are not.
Will stressapptest stress power as well. Do you know any linux app which can stress power as well?
I don't have much experience with mass production and but I like to move forward after learning and correcting this issues. I must be thankful to you if you will kindly reply me soon.

Virtual storage increases for a continuously running application

Before I ask my question let me explain my environment:
I have a C/C++ application that runs continuously (Infinite loop) inside an embedded Linux device.
The application records some data from the system and stores them in text files on an SD-card (1 file per day).
The recording occurs on a specific trigger detected from the systems (each 5 minutes for example) and each trigger inserts a new line in the text files.
Typical datatypes used within the application are: (o/i)stream, char arrays, char*, c_str() function, structs and struct*, static string arrays, #define, enums, FILE*, vector<>, and usual ones (int, string, etc.). Some of these datatypes are passed as arguments to functions.
The application is cross compiled with a custom GCC compiler within a Buildroot and BusyBox package for the device's CPU Atmel AT91RM9200QU.
The application executes some system commands using popen in which the output is read using the resulting FILE*
Now the application is running for three days and I noticed an increase of 32 KB byte in the virtual storage (VSZ from the top command) each day. By mistake the device restarted, I launched the application again and the VSZ value started from the usual value on each fresh start (about 2532 KB).
I developed another application that monitors the VSZ value for the application and it is scheduled using crontab each on each our to start monitor. I noticed at some point during the day the 32 KB I noticed happened 4 KB each hour.
So the main question is, what would be the reason that the VSZ increase ? Eventually it will reach a limit causing the system to crash that is my concern because the device have approx. 27 MB of RAM.
Update: Beside the VSZ value, the RSS also increases. I ran the application under valgrind --leak-check=full and after the first recording I aborted the application and the following message appeared many many times!.
==28211== 28 bytes in 1 blocks are possibly lost in loss record 15 of 52
==28211== at 0x4C29670: operator new(unsigned long) (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==28211== by 0x4EF33D8: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib64/libstdc++.so.6.0.19)
==28211== by 0x4EF4B00: char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) (in /usr/lib64/libstdc++.so.6.0.19)
==28211== by 0x4EF4F17: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) (in /usr/lib64/libstdc++.so.6.0.19)
==28211== by 0x403842: __static_initialization_and_destruction_0 (gatewayfunctions.h:28)
*==28211== by 0x403842: _GLOBAL__sub_I__Z18szBuildUDPTelegramSsii (gatewayfunctions.cpp:396)
==28211== by 0x41AE7C: __libc_csu_init (elf-init.c:88)
==28211== by 0x5676A94: (below main) (in /lib64/libc-2.19.so)
The same message appears, except that the line with * appears with a different file name. The other thing I notice, line 28 of file gatewayfunctions.h is a static string array declaration, this array is used in two files only. Any suggestions ?

Crashing C++ application on production machine due to segmentation fault

We are facing C++ application crash issue due to segmentation fault on RED hat Linux. We are using embedded python in C++.
Please find below my limitation
Don’t I have access to production machine where application crashes. Client send us core dump files when application crashes.
Problem is not reproducible on our test machine which has exactly same configuration as production machine.
Sometime application crashes after 1 hour, 4 hour ….1 day or 1 week. We haven’t get time frame or any specific pattern in which application crashes.
Application is complex and embedded python code is used from lot of places from within application. We have done extensive code reviews but couldn’t find the fix by doing code review.
As per stack trace in core dump, it is crashing around multiplication operation, reviewed code for such operation in code we haven’t get any code where such operation is performed. Might be such operations are called through python scripts executed from embedded python on which we don’t have control or we can’t review it.
We can’t use any profiling tool on production environment like Valgrind.
We are using gdb on our local machine to analyze core dump. We can’t run gdb on production machine.
Please find below the efforts we have putted in.
We have analyzed logs and continuously fired request that coming towards our application on our test environment to reproduce the problem.
We are not getting crash point in logs. Every time we get different logs. I think this is due to; Memory is smashed somewhere else and application crashes after sometime.
We have checked load at any point on our application and it is never exceeded our application limit.
Memory utilization of our application is also normal.
We have profiled our application with help of Valgrind in our test machine and removed valgrind errors but application is still crashing.
I appreciate any help to guide us to proceed further to solve the problem.
Below is the version details
Red hat linux server 5.6 (Tikanga)
Python 2.6.2 GCC 4.1
Following is the stack trace I am getting from the core dump files they have shared (on my machine). FYI, We don’t have access to production machine to run gdb on core dump files.
0 0x00000033c6678630 in ?? ()
1 0x00002b59d0e9501e in PyString_FromFormatV (format=0x2b59d0f2ab00 "can't multiply sequence by non-int of type '%.200s'", vargs=0x46421f20) at Objects/stringobject.c:291
2 0x00002b59d0ef1620 in PyErr_Format (exception=0x2b59d1170bc0, format=<value optimized out>) at Python/errors.c:548
3 0x00002b59d0e4bf1c in PyNumber_Multiply (v=0x2aaaac080600, w=0x2b59d116a550) at Objects/abstract.c:1192
4 0x00002b59d0ede326 in PyEval_EvalFrameEx (f=0x732b670, throwflag=<value optimized out>) at Python/ceval.c:1119
5 0x00002b59d0ee2493 in call_function (f=0x7269330, throwflag=<value optimized out>) at Python/ceval.c:3794
6 PyEval_EvalFrameEx (f=0x7269330, throwflag=<value optimized out>) at Python/ceval.c:2389
7 0x00002b59d0ee2493 in call_function (f=0x70983f0, throwflag=<value optimized out>) at Python/ceval.c:3794
8 PyEval_EvalFrameEx (f=0x70983f0, throwflag=<value optimized out>) at Python/ceval.c:2389
9 0x00002b59d0ee2493 in call_function (f=0x6f1b500, throwflag=<value optimized out>) at Python/ceval.c:3794
10 PyEval_EvalFrameEx (f=0x6f1b500, throwflag=<value optimized out>) at Python/ceval.c:2389
11 0x00002b59d0ee2493 in call_function (f=0x2aaab09d52e0, throwflag=<value optimized out>) at Python/ceval.c:3794
12 PyEval_EvalFrameEx (f=0x2aaab09d52e0, throwflag=<value optimized out>) at Python/ceval.c:2389
13 0x00002b59d0ee2d9f in ?? () at Python/ceval.c:2968 from /usr/local/lib/libpython2.6.so.1.0
14 0x0000000000000007 in ?? ()
15 0x00002b59d0e83042 in lookdict_string (mp=<value optimized out>, key=0x46424dc0, hash=40722104) at Objects/dictobject.c:412
16 0x00002aaab09d5458 in ?? ()
17 0x00002aaab09d5458 in ?? ()
18 0x00002aaab02a91f0 in ?? ()
19 0x00002aaab0b2c3a0 in ?? ()
20 0x0000000000000004 in ?? ()
21 0x00000000026d5eb8 in ?? ()
22 0x00002aaab0b2c3a0 in ?? ()
23 0x00002aaab071e080 in ?? ()
24 0x0000000046422bf0 in ?? ()
25 0x0000000046424dc0 in ?? ()
26 0x00000000026d5eb8 in ?? ()
27 0x00002aaab0987710 in ?? ()
28 0x00002b59d0ee2de2 in PyEval_EvalFrame (f=0x0) at Python/ceval.c:538
29 0x0000000000000000 in ?? ()

You are almost certainly doing something bad with pointers in your C++ code, which can be very tough to debug.
Do not assume that the stack trace is relevant. It might be relevant, but pointer misuse can often lead to crashes some time later
Build with full warnings on. The compiler can point out some non-obvious pointer misuse, such as returning a reference to a local.
Investigate your arrays. Try replacing arrays with std::vector (C++03) or std::array (C++11) so you can iterate using begin() and end() and you can index using at().
Investigate your pointers. Replace them with std::unique_ptr(C++11) or boost::scoped_ptr wherever you can (there should be no overhead in release builds). Replace the rest with shared_ptr or weak_ptr. Any that can't be replaced are probably the source of problematic logic.
Because of the very problems you're seeing, modern C++ allows almost all raw pointer usage to be removed entirely. Try it.

First things first, compile both your binary and libpython with debug symbols and push it out. The stack trace will be much easier to follow.
The relevant argument to g++ is -g.

Suggestions:
As already suggested, provide a complete debug build
Provide a memory test tool and a CPU torture test
Load debug symbols of python library when analyzing the core dump
The stacktrace shows something concerning eval(), so I guess you do dynamic code generation and evaluation/execution. If so, within this code, or passed arguments, there might be the actual error. Assertions at any interface to the code and code dumps may help.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Powercenter 10x "Error : Session cannot be aborted or restarted" - informatica

Related

Program hangs out on tls_get_addr call

Is it safe to add or remove log sinks while still logging from other threads with boost log?

Unexpected behavior of custom hardware design based on i.MX6 & MT41K256M16TW-107 IT:P

Virtual storage increases for a continuously running application

Crashing C++ application on production machine due to segmentation fault

Categories

Resources