I have written a program which runs
Every 20 seconds
When specific request to run that piece of code comes.
Every 1 min.
1 and 3 in the list above are two different instances and can overlap.
Program
Signature of the function giving trouble.
bool ProcessInfoHandler::getCPUInfo (rsc::ProcInfo &procInfo, bool isThreadCall)
I am getting the below mentioned crash after around 3 days of running the program.
#2 0x000000000041fdb8 in sn_sig_handler (signum=6, siginfo=0x451a7d80, undocumented= <value optimized out>) at common/main/sn_proc_main.cpp:109
#3 <signal handler called>
#4 0x00000031d9630265 in raise () from /lib64/libc.so.6
#5 0x00000031d9631d10 in abort () from /lib64/libc.so.6
#6 0x00000031d966a84b in __libc_message () from /lib64/libc.so.6
#7 0x00000031d967230f in _int_free () from /lib64/libc.so.6
#8 0x00000031d967276b in free () from /lib64/libc.so.6
#9 0x00000000004367a5 in deallocate (this=0x66cff0, __position=..., __x=<value optimized out>)
at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/ext/new_allocator.h:94
#10 _M_deallocate (this=0x66cff0, __position=..., __x=<value optimized out>) at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:133
#11 std::vector<cpu_instance_data_t, std::allocator<cpu_instance_data_t> >::_M_insert_aux (this=0x66cff0, __position=..., __x=<value optimized out>)
at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/vector.tcc:299
#12 0x0000000000431f8e in ProcessInfoHandler::getCPUInfo (this=<value optimized out>, procInfo=..., isThreadCall=false)
at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:610
#13 0x00000000004333be in ProcessInfoHandler::getProcessInformation (this=0xc16f9c0, procInfoSeq=..., isThreadCall=false) at processinfohandler.cc:255
My Questions
Unlike frame 13, where the file name and line number is given, frame 12 is not given that information. Does this mean that there is a problem with pass by reference of the vector?
Any pointers as to how I should proceed to debug this particular stack trace?
The code of getCPUInfo can not be given for proprietary reasons. Please suggest if there is a workaround for the same.
The call to std::vector<>::_M_insert_aux() indicates that the vector is being modified in getCPUInfo - if this code can be called concurrently (on multiple threads) which is implied by your "list of 3", then you need to have something such as a mutex to synchronize the threads.
std::vector is not thread safe.
Related
I am about to write a (very) large code for a scientific project, where a large numbers of allocatable arrays will be used. Is there an intrinsic fortran function or maybe a compiler flag which I can used which checks that all allocatable variables have been correctly deallocated? I am using gfortran.
gcc and therefore also gfortran can use the AddressSanitizer library to detect memory leaks. This can be enabled using the -fsanitize=address option.
The output will then be similar to what valgrind produces:
==26339==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 144 byte(s) in 1 object(s) allocated from:
#0 0x7f46fad68510 in malloc (/usr/lib64/libasan.so.4+0xdc510)
#1 0x407754 in __sectiontest_MOD_constructor /users/tiziano/work/tests/fortran/cp2k_input_parser/recursive_mwe_alternative/recursive_mwe.f90:27
#2 0x403939 in __sectiontest_MOD_demo /users/tiziano/work/tests/fortran/cp2k_input_parser/recursive_mwe_alternative/recursive_mwe.f90:95
#3 0x408564 in MAIN__ /users/tiziano/work/tests/fortran/cp2k_input_parser/recursive_mwe_alternative/recursive_mwe_prog.f90:5
#4 0x4085a4 in main /users/tiziano/work/tests/fortran/cp2k_input_parser/recursive_mwe_alternative/recursive_mwe_prog.f90:2
#5 0x7f46f9d8ef89 in __libc_start_main (/lib64/libc.so.6+0x20f89)
Direct leak of 96 byte(s) in 1 object(s) allocated from:
#0 0x7f46fad68510 in malloc (/usr/lib64/libasan.so.4+0xdc510)
#1 0x407754 in __sectiontest_MOD_constructor /users/tiziano/work/tests/fortran/cp2k_input_parser/recursive_mwe_alternative/recursive_mwe.f90:27
#2 0x403d72 in __sectiontest_MOD_demo /users/tiziano/work/tests/fortran/cp2k_input_parser/recursive_mwe_alternative/recursive_mwe.f90:95
#3 0x408564 in MAIN__ /users/tiziano/work/tests/fortran/cp2k_input_parser/recursive_mwe_alternative/recursive_mwe_prog.f90:5
#4 0x4085a4 in main /users/tiziano/work/tests/fortran/cp2k_input_parser/recursive_mwe_alternative/recursive_mwe_prog.f90:2
#5 0x7f46f9d8ef89 in __libc_start_main (/lib64/libc.so.6+0x20f89)
Indirect leak of 144 byte(s) in 1 object(s) allocated from:
#0 0x7f46fad68510 in malloc (/usr/lib64/libasan.so.4+0xdc510)
#1 0x405fec in __sectiontest_MOD_section_assign /users/tiziano/work/tests/fortran/cp2k_input_parser/recursive_mwe_alternative/recursive_mwe.f90:50
#2 0x408237 in __sectiontest_MOD_constructor /users/tiziano/work/tests/fortran/cp2k_input_parser/recursive_mwe_alternative/recursive_mwe.f90:30
#3 0x403d72 in __sectiontest_MOD_demo /users/tiziano/work/tests/fortran/cp2k_input_parser/recursive_mwe_alternative/recursive_mwe.f90:95
#4 0x408564 in MAIN__ /users/tiziano/work/tests/fortran/cp2k_input_parser/recursive_mwe_alternative/recursive_mwe_prog.f90:5
#5 0x4085a4 in main /users/tiziano/work/tests/fortran/cp2k_input_parser/recursive_mwe_alternative/recursive_mwe_prog.f90:2
#6 0x7f46f9d8ef89 in __libc_start_main (/lib64/libc.so.6+0x20f89)
SUMMARY: AddressSanitizer: 384 byte(s) leaked in 3 allocation(s).
As stated before in the comments, memory leaks should not be occurring with allocatables. On the other hand has experience shown that there can be compiler bugs which can cause memory leaks nevertheless.
I am having a problem debugging my code and am a bit confused by the gdb output. I have attached the gdb output below. The last 2 lines, line #13 and #14 are my code, but everything else is from the C++ library. What is confusing to me is that from about line #7 upward, it appears to be calling delete. This is initialization code and there are no deletes nor frees being called in the code flow. But something is causing delete to be called somewhere in the C++ library.
this is on a debian box with g++ 4.7.2
Anybody have a clue that could help me along?
EDIT: thanks you guys for your help. I indeed think there is something else going on here. Since the intent of my code is to construct a string using several append() calls, I added a call to reserve() in the ctor for that string so it would be large enough to handle a few append() calls without having to get more space. This has apparently helped because it is now harder for me to force the crash. But I do agree that the cause is probably elsewhere in my code. Again, thanks for all your help.
Program received signal SIGABRT, Aborted.
0xb7fe1424 in __kernel_vsyscall ()
(gdb) bt
#0 0xb7fe1424 in __kernel_vsyscall ()
#1 0xb7a9a941 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2 0xb7a9dd72 in *__GI_abort () at abort.c:92
#3 0xb7ad6e15 in __libc_message (do_abort=2, fmt=0xb7baee70 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:189
#4 0xb7ae0f01 in malloc_printerr (action=<optimized out>, str=0x6 <Address 0x6 out of bounds>, ptr=0xb71117f0) at malloc.c:6283
#5 0xb7ae2768 in _int_free (av=<optimized out>, p=<optimized out>) at malloc.c:4795
#6 0xb7ae581d in *__GI___libc_free (mem=0xb71117f0) at malloc.c:3738
#7 0xb7f244bf in operator delete(void*) () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#8 0xb7f8b48b in std::string::_Rep::_M_destroy(std::allocator<char> const&) () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#9 0xb7f8b4d0 in ?? () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#10 0xb7f8c7a0 in std::string::reserve(unsigned int) () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#11 0xb7f8caaa in std::string::append(char const*, unsigned int) () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#12 0xb7f8cb76 in std::string::append(char const*) () from /usr/lib/i386-linux-gnu/libstdc++.so.6
#13 0x0804fa38 in MethodRequest::MethodRequest (this=0x80977a0) at cLogProxy.cpp:26
#14 0x0804fac0 in DebugMethodRequest::DebugMethodRequest (this=0x80977a0,
thanks,
-Andres
You are calling std::string::append, that ultimately results in delete getting called. If we go through the steps involved in std::string::append, it might make more sense why delete gets called.
Say you have:
std::string s("abc");
s.append("def");
When you create s, memory has to be allocated to hold "abc". At the end of s.append("def");, there has to be enough memory associated with s to hold "abcdef". Steps to get there:
Get the length of s => 3.
Get the length of the input string "def" => 3.
Add them to figure out the length of the new string. => 6.
Allocate memory to hold the new string.
Copy "abc" to the newly allocated memory.
Append "def" to the newly allocated memory.
Associate the newly allocated memory with s.
Delete the old memory associated with s. (This is where delete comes into picture).
Something is doing string computations that are resulting in deletes internally. Seems likely something else is trashing memory.
Currently we are seeing our processes taking too long with mmap call.
Once the process reaches to roughly ~2.8 GB, the mmap call takes upto 100
seconds and its being killed by heart beat mechanism built in the process.
Would like to know anyone has seen this issue or know why would mmap take
more than 100 seconds when asked for memory. In all the cases the stack trace
looks the same but memory is allocated in different parts of the code.
Host and compiler info:
Host memory: 70 gb OS: redhat 6.3 compiler: gcc 4.4.6 process memory
limit(32 bit): 4 gb No Swap configured
And when this happens the host still has 50GB of memory left.
Stack Trace:
#0 0x55575430 in __kernel_vsyscall ()
#1 0x560f9dd8 in mmap () from /lib/libc.so.6
#2 0x5608f2db in _int_malloc () from /lib/libc.so.6
#3 0x5608fb7e in malloc () from /lib/libc.so.6
#4 0x55fb509a in operator new(unsigned int) () from /usr/lib/libstdc++.so.6
#5 0x55f91ed6 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep::_S_create(unsigned int, unsigned int, std::allocator<char> const&) ()
from /usr/lib/libstdc++.so.6
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7b8bc26 in std::basic_filebuf<char, std::char_traits<char> >::_M_terminate_output() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) where
#0 0x00007ffff7b8bc26 in std::basic_filebuf<char, std::char_traits<char> >::_M_terminate_output() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1 0x00007ffff7b8c6a2 in std::basic_filebuf<char, std::char_traits<char>>::close() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2 0x00007ffff7b8cb2a in std::basic_ofstream<char, std::char_traits<char> >::~basic_ofstream() ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x0000000000403e02 in main (argc=2, argv=0x7fffffffe1c8)
at main.cpp:630
I am facing this error after program execution and after "return 0;" has been executed.
I have used vectors from STL. This error is thrown only when input file size is very high (I am having around 10000 nodes in graph)
Also, I am not able to write output to a file. Currently I have commented that part.
Please help me with issue.
I am using Ubuntu 12.10 64 bit.
Errors after returning from main can be caused by (at least):
dodgy atexit handlers; or
memory corruption of some description.
Of those two, it's more likely to be the latter so you should run your code under a dynamic memory-use analysis tool, like valgrind. Your description of large vectors causing the problem also seems to support this contention.
First I am populating a structure which is quite big and have interrelations. and then I serialize that to a binary archive. Size of that structure depends on what data I feed to the program. I see the program taking ~2GB memory to build the structure which is expected and acceptable.
Then I start serializing the object. and I see program eating RAM while serializing. RAM usage growing till it reaches near 100%. swap usage is still 0 bytes.
and then the Application crashes. with a exception of bad_alloc on new
Why would serialization process take so much RAM and time ? and why would it crash while allocating memory when swap is empty ? the backtrace is too long to be pasted in full.
#0 0xb7fe1424 in __kernel_vsyscall ()
#1 0xb7c6e941 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2 0xb7c71e42 in abort () at abort.c:92
#3 0xb7e92055 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#4 0xb7e8ff35 in ?? () from /usr/lib/libstdc++.so.6
#5 0xb7e8ff72 in std::terminate() () from /usr/lib/libstdc++.so.6
#6 0xb7e900e1 in __cxa_throw () from /usr/lib/libstdc++.so.6
#7 0xb7e90677 in operator new(unsigned int) () from /usr/lib/libstdc++.so.6
#8 0xb7f00a9f in boost::archive::detail::basic_oarchive_impl::save_pointer(boost::archive::detail::basic_oarchive&, void const*, boost::archive::detail::basic_pointer_oserializer const*) () from /usr/lib/libboost_serialization.so.1.42.0
#9 0xb7effb42 in boost::archive::detail::basic_oarchive::save_pointer(void const*, boost::archive::detail::basic_pointer_oserializer const*) () from /usr/lib/libboost_serialization.so.1.42.0
#10 0x082d052c in void boost::archive::detail::save_pointer_type<boost::archive::binary_oarchive>::non_polymorphic::save<gcl::NestedConnection<gcl::Section, gcl::NestedConnection<gcl::Paragraph, gcl::NestedConnection<gcl::Line, void> > > >(boost::archive::binary_oarchive&, gcl::NestedConnection<gcl::Section, gcl::NestedConnection<gcl::Paragraph, gcl::NestedConnection<gcl::Line, void> > >&) ()
#11 0x082d0472 in void boost::archive::detail::save_pointer_type<boost::archive::binary_oarchive>::save<gcl::NestedConnection<gcl::Section, gcl::NestedConnection<gcl::Paragraph, gcl::NestedConnection<gcl::Line, void> > > >(boost::archive::binary_oarchive&, gcl::NestedConnection<gcl::Section, gcl::NestedConnection<gcl::Paragraph, gcl::NestedConnection<gcl::Line, void> > > const&) ()
.......
#172 0x082a91d8 in boost::archive::detail::interface_oarchive<boost::archive::binary_oarchive>::operator<< <gcl::Collation const> (this=0xbfffe500, t=...) at /usr/include/boost/archive/detail/interface_oarchive.hpp:64
#173 0x082a6298 in boost::archive::detail::interface_oarchive<boost::archive::binary_oarchive>::operator&<gcl::Collation> (this=0xbfffe500, t=...) at /usr/include/boost/archive/detail/interface_oarchive.hpp:72
#174 0x0829bd63 in main (argc=4, argv=0xbffff3f4) at /home/neel/projects/app/main.cpp:93
Program works properly When a smaller data is feeded to it.
Using Linux 64bit with 32bit PAE kernel boost 1.42
program was working without a crash few revision ago. I recently added some more bytes to the structures. may be then it was not reaching the end of RAM and now its reaching.
But why would new crash when there is enough swap ? why would serialization process take so much RAM ?
Question: why would it crash while allocating memory when swap is empty ?
The allocated object is too big to fit anywhere in the virtual address space:
The allocated object is humongous
virtual address space is too fragmented
virtual address space is all allocated
If your application is complied as a 32bits, the process virtual address space is limited to 4Gb.
Question: why would serialization process take so much RAM ?
I have not found any evidence why.
I realized that serialization process was taking extra memory, for its own house keeping works. and that was hitting the 3GB Barrier To stop serialization process from taking extra memory I disabled object tracking BOOST_CLASS_TRACKING and that fixed extra memory overhead.