std::vector reserve method fails to allocate enough memory - c++

I have a buffer class in my C++ application as follows:
class Buffer
{
public:
Buffer(size_t res): _rpos(0), _wpos(0)
{
_storage.reserve(res);
}
protected:
size_t _rpos, _wpos;
std::vector<uint8> _storage;
}
Sometimes using the constructor fails because its unable to allocate the required memory space. For example, once, calling the constructor with res = 37 caused a segfault with the following stack trace that i got from its core dump:
#0 0x00007f916a176ed5 in raise () from /lib/libc.so.6
No symbol table info available.
#1 0x00007f916a1783f3 in abort () from /lib/libc.so.6
No symbol table info available.
#2 0x00007f916a1b33a8 in ?? () from /lib/libc.so.6
No symbol table info available.
#3 0x00007f916a1b8948 in ?? () from /lib/libc.so.6
No symbol table info available.
#4 0x00007f916a1bb17c in ?? () from /lib/libc.so.6
No symbol table info available.
#5 0x00007f916a1bca78 in malloc () from /lib/libc.so.6
No symbol table info available.
#6 0x00007f916ac0c16d in operator new (sz=37)
at ../../.././libstdc++-v3/libsupc++/new_op.cc:52
p = <value optimized out>
#7 0x00000000004e3d11 in std::vector<unsigned char, std::allocator<unsigned char> >::reserve (this=0x7f911bc49cc0, __n=31077)
at /usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../include/c++/4.4.2/ext/new_allocator.h:89
__old_size = 0
__tmp = <value optimized out>
I've compiled this application using GCC 4.4.2 as a 64 bit application and I'm using it in Debian 5 x64.
Any help is much appreciated.
Thanks

Because the segfault is in malloc, most likely some other code has trashed the heap (i.e. written to parts of memory they do not own and are being used by the heap manager).
I recommend using Valgrind to find what code is trashing the heap.

If you can't use Valgrind to find out where your memory is corrupted because of the heavy load it implies, you can still test with lighter solutions.
For server application where Valgrind was not applicable (because the platform was on Solaris 8), I had pretty good result with mpatrol ( http://mpatrol.sf.net ) but especially dmalloc ( http://dmalloc.com ).
To some extend, you can use them without recompiling (just relinking for dmalloc, library preloading for mpatrol). They'll replace the memory primitives to perform extra checks on the memory use (bad argument to those primitives, reading off-by-one, heap corruption, ...) Some of those checks will be triggered exactly when the problem occurs while others will be triggered a bit later than the actual bad code. By tuning which checks are enabled, and when applicable the check frequency, you can run at almost full speed while performing basic checks.
I recommend recompiling with dmalloc to get so called 'FUNC_CHECK', for me, this added a lot of accuracy in bug spotting with a negligible performance cost.

Related

All threads in wait in core dump file, but someone triggered SIG_ABRT

I am supporting an application written in C++ over many years and as of late it has started to crash providing core dumps that we don't know how to handle.
It runs on an appliance on Ubuntu 14.04.5
When loading the core file in GDB it says that:
Program terminated with signal SIGABRT, Aborted
I can inspect 230 threads but they are all in wait() in the exact same memory position.
There is a thread with ID 1 that in theory could be the responsible but that thread is also in wait.
So I have two questions basically.
How does the id index of the threads work?
Is thread with GDB ID 1 the last active thread? or is that an arbitrary index and the failure can be in any of the other threads?
How can all threads be in wait() when a SIGABRT is triggered?
Shouldn't the instruction pointer be at the failing command when the OS decided to step in an halt the process? Or is it some sort of deadlock protection?
Any help much appreciated.
Backtrace of thread 1:
#0 0xf771dcd9 in ?? ()
#1 0xf74ad4ca in _int_free (av=0x38663364, p=<optimized out>,have_lock=-186161432) at malloc.c:3989
#2 0xf76b41ab in std::string::_Rep::_M_destroy(std::allocator<char> const&) () from /usr/lib32/libstdc++.so.6
#3 0xf764f82f in operator delete(void*) () from /usr/lib32/libstdc++.so.6
#4 0xf764f82f in operator delete(void*) () from /usr/lib32/libstdc++.so.6
#5 0x5685e8b4 in SlimStringMapper::~SlimStringMapper() ()
#6 0x567d6bc3 in destroy ()
#7 0x566a40b4 in HttpProxy::getLogonCredentials(HttpClient*, HttpServerTransaction*, std::string const&, std::string const&, std::string&, std::string&) ()
#8 0x566a5d04 in HttpProxy::add_authorization_header(HttpClient*, HttpServerTransaction*, Hosts::Host*) ()
#9 0x566af97c in HttpProxy::onClientRequest(HttpClient*, HttpServerTransaction*) ()
#10 0x566d597e in callOnClientRequest(HttpClient*, HttpServerTransaction*, FastHttpRequest*) ()
#11 0x566d169f in GateKeeper::onClientRequest(HttpClient*, HttpServerTransaction*) ()
#12 0x566a2291 in HttpClientThread::run() ()
#13 0x5682e37c in wa_run_thread ()
#14 0xf76f6f72 in start_thread (arg=0xec65ab40) at pthread_create.c:312
#15 0xf75282ae in query_module () at ../sysdeps/unix/syscall-template.S:82
#16 0xec65ab40 in ?? ()
Another thread that should be in wait:
#0 0xf771dcd9 in ?? ()
#1 0x5682e37c in wa_run_thread ()
#2 0xf76f6f72 in start_thread (arg=0xf33bdb40) at pthread_create.c:312
#3 0xf75282ae in query_module () at ../sysdeps/unix/syscall-template.S:82
#4 0xf33bdb40 in ?? ()
Best regards
Jon
How can all threads be in wait() when a SIGABRT is triggered?
Is wait the POSIX function, or something from the run-time environment? Are you looking at a higher-level backtrace?
Anyway, there is an easy explanation why this can happen: SIGABRT was sent to the process, and not generated by a thread in a synchronous fashion. Perhaps a coworker sent the signal to create the coredump, after observing the deadlock, to collect evidence for future analysis?
How does the id index of the threads work? Is thread with GDB ID 1 the last active thread?
When the program is running under GDB, GDB numbers threads as it discovers them, so thread 1 is always the main thread.
But when loading a core dump, GDB discoveres threads in the order in which the kernel saved them. The kernels that I have seen always save the thread which caused program termination first, so usually loading core into GDB immediately gets you to the crash point without the need to switch threads.
How can all threads be in wait() when a SIGABRT is triggered?
One possiblity is that you are not analyzing the core correctly. In particular, you need exact copies of shared libraries that were used at the time when the core was produced, and that's unlikely to be the case when the application runs on "appliance" and you are analysing core on your development machine. See this answer.
I just saw your question. First of all my answer is not specific to you direct question but some solution to handle this kind of situation. Multi-threading entirely depend on the hardware and operating system of a machine. Especially memory and processors. Increase in thread means requirement of more memory as well as more time slice for processor. I don’t think your application have more than 100 processor to facilitate 230 thread to run concurrently with highest performance. To avoid this situation do the below steps which may help you.
Control the creation of threads. Control number of threads running concurrently.
Increase the memory size of your application. (check compiler options to increase memory for the application at run time or O/S to allocate enough memory)
Set grid size and stack size of each thread properly. (calculation need to be done based on your application’s threads functionality, this is bit complicated. Please read some documentation)
Handle synchronized block properly to avoid any deadlock.
Where necessary use conditional lock etc.
As you told that most of your threads are in wait condition, that means they are waiting for a lock to release for their turn, that means one of the thread already acquire the lock and still busy in processing or probably in deadlock situation.

What are the possible reasons for POSIX SIGBUS?

My program recently crashed with the following stack;
Program terminated with signal 7, Bus error.
#0 0x00007f0f323beb55 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f0f323beb55 in raise () from /lib64/libc.so.6
#1 0x00007f0f35f8042e in skgesigOSCrash () from /usr/lib/oracle/11.2/client64/lib/libclntsh.so.11.1
#2 0x00007f0f36222ca9 in kpeDbgSignalHandler () from /usr/lib/oracle/11.2/client64/lib/libclntsh.so.11.1
#3 0x00007f0f35f8063e in skgesig_sigactionHandler () from /usr/lib/oracle/11.2/client64/lib/libclntsh.so.11.1
#4 <signal handler called>
What should I check in my code to avoid this? Or is this something Oracle should fix?
Main reasons you could get a bus error revolves around inaccessible memory. This could be due to many reasons:
Accessing through a deleted pointer.
Accessing through an uninitialized pointer.
Accessing through a NULL pointer.
Accessing the address which is not yours. It could be due to overflow errors.
Try adding the following to the $ORACLE_HOME/network/admin/*.ora file:
DIAG_ADR_ENABLED=OFF
DIAG_SIGHANDLER_ENABLED=FALSE
DIAG_DDE_ENABLED=FALSE
This sounds like an Oracle issue.
And also Oracle's libraries seem to be compiled by Intel compilers.

Pimpl + QSharedPointer - Destructor = Disaster

Yesterday I ran into misery which took me 24 hours of frustration. The problem boiled down to unexpected crashes occurring on random basis. To complicate things, debugging reports had absolutely random pattern as well. To complicate it even more, all debugging traces were leading to either random Qt sources or native DLLs, i.e. proving every time that the issue is rather not on my side.
Here you are a few examples of such lovely reports:
Program received signal SIGSEGV, Segmentation fault.
0x0000000077864324 in ntdll!RtlAppendStringToString () from C:\Windows\system32\ntdll.dll
(gdb) bt
#0 0x0000000077864324 in ntdll!RtlAppendStringToString () from C:\Windows\system32\ntdll.dll
#1 0x000000002efc0230 in ?? ()
#2 0x0000000002070005 in ?? ()
#3 0x000000002efc0000 in ?? ()
#4 0x000000007787969f in ntdll!RtlIsValidHandle () from C:\Windows\system32\ntdll.dll
#5 0x0000000000000000 in ?? ()
warning: HEAP: Free Heap block 307e5950 modified at 307e59c0 after it was freed
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000000778bf0b2 in ntdll!ExpInterlockedPopEntrySListFault16 () from C:\Windows\system32\ntdll.dll
(gdb) bt
#0 0x00000000778bf0b2 in ntdll!ExpInterlockedPopEntrySListFault16 () from C:\Windows\system32\ntdll.dll
#1 0x000000007786fd34 in ntdll!RtlIsValidHandle () from C:\Windows\system32\ntdll.dll
#2 0x0000000077910d20 in ntdll!RtlGetLastNtStatus () from C:\Windows\system32\ntdll.dll
#3 0x00000000307e5950 in ?? ()
#4 0x00000000307e59c0 in ?? ()
#5 0x00000000ffffffff in ?? ()
#6 0x0000000000220f10 in ?? ()
#7 0x0000000077712d60 in WaitForMultipleObjectsEx () from C:\Windows\system32\kernel32.dll
#8 0x0000000000000000 in ?? ()
Program received signal SIGSEGV, Segmentation fault.
0x0000000000a9678a in QBasicAtomicInt::ref (this=0x8) at ../../include/QtCore/../../../qt-src/src/corelib/arch/qatomic_x86_64.h:121
121 : "memory");
(gdb) bt
#0 0x0000000000a9678a in QBasicAtomicInt::ref (this=0x8) at ../../include/QtCore/../../../qt-src/src/corelib/arch/qatomic_x86_64.h:121
#1 0x00000000009df08e in QVariant::QVariant (this=0x21e4d0, p=...) at d:/Distributions/qt-src/src/corelib/kernel/qvariant.cpp:1426
#2 0x0000000000b4dde9 in QList<QVariant>::value (this=0x323bd480, i=1) at ../../include/QtCore/../../../qt-src/src/corelib/tools/qlist.h:666
#3 0x00000000009ccff7 in QObject::property (this=0x3067e900,
name=0xa9d042a <QCDEStyle::drawPrimitive(QStyle::PrimitiveElement, QStyleOption const*, QPainter*, QWidget const*) const::pts5+650> "_q_stylerect")
at d:/Distributions/qt-src/src/corelib/kernel/qobject.cpp:3742
#4 0x0000000000000000 in ?? ()
As you can see this stuff is pretty nasty, it gives one no useful information. But, there was one thing I didn't pay attention to. It was a weird warning during compilation which is also hard to catch with an eye:
In file included from d:/Libraries/x64/MinGW-w64/4.7.2/Qt/4.8.4/include/QtCore/qsharedpointer.h:50:0,
from d:/Libraries/x64/MinGW-w64/4.7.2/Qt/4.8.4/include/QtCore/QSharedPointer:1,
from ../../../../source/libraries/Project/sources/Method.hpp:4,
from ../../../../source/libraries/Project/sources/Slot.hpp:4,
from ../../../../source/libraries/Project/sources/Slot.cpp:1:
d:/Libraries/x64/MinGW-w64/4.7.2/Qt/4.8.4/include/QtCore/qsharedpointer_impl.h: In instantiation of 'static void QtSharedPointer::ExternalRefCount<T>::deref(QtSharedPointer::ExternalRefCount<T>::Data*, T*) [with T = Project::Method::Private; QtSharedPointer::ExternalRefCount<T>::Data = QtSharedPointer::ExternalRefCountData]':
d:/Libraries/x64/MinGW-w64/4.7.2/Qt/4.8.4/include/QtCore/qsharedpointer_impl.h:336:11: required from 'void QtSharedPointer::ExternalRefCount<T>::deref() [with T = Project::Method::Private]'
d:/Libraries/x64/MinGW-w64/4.7.2/Qt/4.8.4/include/QtCore/qsharedpointer_impl.h:401:38: required from 'QtSharedPointer::ExternalRefCount<T>::~ExternalRefCount() [with T = Project::Method::Private]'
d:/Libraries/x64/MinGW-w64/4.7.2/Qt/4.8.4/include/QtCore/qsharedpointer_impl.h:466:7: required from here
d:/Libraries/x64/MinGW-w64/4.7.2/Qt/4.8.4/include/QtCore/qsharedpointer_impl.h:342:21: warning: possible problem detected in invocation of delete operator: [enabled by default]
d:/Libraries/x64/MinGW-w64/4.7.2/Qt/4.8.4/include/QtCore/qsharedpointer_impl.h:337:28: warning: 'value' has incomplete type [enabled by default]
Actually, I turned to this warning only as a last resort because in such a desperate pursuit to find a bug, the code was already infected with logging to death literally.
After reading it carefully, I recalled that, for instance, if one uses std::unique_ptr or std::scoped_ptr for Pimpl - one should certainly provide desctructor, otherwise the code won't even compile. However, I also remember that std::shared_ptr does not care about destructor and works fine without it. It was another reason why I didn't pay attention to this strange warning. Long story short, when I added destructor, this random crashing stopped. Looks like Qt's QSharedPointer has some design flaws compared to std::shared_ptr. I guess it would be better, if Qt developers transformed this warning into error because debugging marathons like that are simply not worth one's time, effort and nerves.
My questions are:
What's wrong with QSharedPointer? Why destructor is so vital?
Why crashing happened when there was no destructor? These objects (which are using Pimpl + QSharedPointer) are created on stack and no other objects have access to them after their death. However, crashing happened during some random period of time after their death.
Has anyone ran into issues like that before? Please, share your experience.
Are there other pitfalls
like that in Qt - ones that I must know about for sure to stay
safe in future?
Hopefully, these questions and my post in general will help others to avoid the hell I've been to for the past 24 hours.
The issue has been worked around in Qt 5, see https://codereview.qt-project.org/#change,26974
The compiler calling the wrong destructor or assuming a different memory layout probably lead to some kind of memory corruption. I'd say a compiler should give an error for this issue and not a warning.
You'll run into a similar issue with std::unique_ptr, which can also cause broken destructors if used with an incomplete type. The fix is pretty trivial, of course - I declare a constructor for the class, then define it in the implementation file as
MyClass::~MyClass() = default;
The reason that this is an issue for std::unique_ptr but not std::shared_ptr is that the destructor is part of the type of the former, but is a member of the latter.

double free or corruption (!prev) while calling __do_global_dtors_aux

I'm getting this error message after my app has done everything right
/lib64/libc.so.6[0x3f1ee70d7f]
/lib64/libc.so.6(cfree+0x4b)[0x3f1ee711db]
/home/user/workspace/NewProject/build/bin/TestApp(_ZN9__gnu_cxx13new_allocatorIN5boost10shared_ptrINS1_5uuids4uuidEEEE10deallocateEPS5_m+0x20)[0x49c174]
/home/user/workspace/NewProject/build/bin/TestApp(_ZNSt12_Vector_baseIN5boost10shared_ptrINS0_5uuids4uuidEEESaIS4_EE13_M_deallocateEPS4_m+0x32)[0x495b84]
/home/user/workspace/NewProject/build/bin/TestApp(_ZNSt12_Vector_baseIN5boost10shared_ptrINS0_5uuids4uuidEEESaIS4_EED2Ev+0x47)[0x49598b]
/home/user/workspace/NewProject/build/bin/TestApp(_ZNSt6vectorIN5boost10shared_ptrINS0_5uuids4uuidEEESaIS4_EED1Ev+0x65)[0x48bf27]
/lib64/libc.so.6(__cxa_finalize+0x8e)[0x3f1ee337fe]
/home/user/workspace/NewProject/build/components/lib_path/libhelper-d.so[0x2aaaab052b36]
If I run the program in gdb I can get the following backtrace, but it is all I get:
#0 0x0000003f1ee30285 in raise () from /lib64/libc.so.6
#1 0x0000003f1ee31d30 in abort () from /lib64/libc.so.6
#2 0x0000003f1ee692bb in __libc_message () from /lib64/libc.so.6
#3 0x0000003f1ee70d7f in _int_free () from /lib64/libc.so.6
#4 0x0000003f1ee711db in free () from /lib64/libc.so.6
#5 0x000000000049c174 in __gnu_cxx::new_allocator<boost::shared_ptr<boost::uuids::uuid> >::deallocate (this=0x2aaaab2cea50, __p=0x1cfd8d0)
at /opt/local/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.5/../../../../include/c++/4.4.5/ext/new_allocator.h:95
#6 0x0000000000495b84 in std::_Vector_base<boost::shared_ptr<boost::uuids::uuid>, std::allocator<boost::shared_ptr<boost::uuids::uuid> > >::_M_deallocate (
this=0x2aaaab2cea50, __p=0x1cfd8d0, __n=8) at /opt/local/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.5/../../../../include/c++/4.4.5/bits/stl_vector.h:146
#7 0x000000000049598b in std::_Vector_base<boost::shared_ptr<boost::uuids::uuid>, std::allocator<boost::shared_ptr<boost::uuids::uuid> > >::~_Vector_base (
this=0x2aaaab2cea50, __in_chrg=<value optimized out>)
at /opt/local/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.5/../../../../include/c++/4.4.5/bits/stl_vector.h:132
#8 0x000000000048bf27 in std::vector<boost::shared_ptr<boost::uuids::uuid>, std::allocator<boost::shared_ptr<boost::uuids::uuid> > >::~vector (this=0x2aaaab2cea50,
__in_chrg=<value optimized out>) at /opt/local/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.5/../../../../include/c++/4.4.5/bits/stl_vector.h:313
#9 0x0000003f1ee337fe in __cxa_finalize () from /lib64/libc.so.6
#10 0x00002aaaab052b36 in __do_global_dtors_aux ()
from /home/user/workspace/NewProject/build/components/lib_path/libhelper-d.so
#11 0x0000000000000000 in ?? ()
I really have no idea of how to proceed from here.
UPDATE I forgot to mention that the only global variable of the type which appears in the error is cleared m_uuids.size() == 0 by the time the error appear.
I had this same problem using glog. In my case, it was this scenario:
I had a share library, call it 'common.so' that linked glog.
My main executable, call it 'app' also linked glog, and linked in common.so.
The problem I had was that glog was linked statically in both the .so and the exectuable. When I changed both #1 and #2 to link the .so instead of the .a, the problem went away.
Not sure this is your problem, but it could be. Generally speaking, corruption when freeing up memory often means that you corrupted the memory pool (such as deleting the same pointer twice). I believe linking in the .a in both cases, I was getting cleanup behavior on the same global pointer (an std::string in my case) twice.
Update:
After much investigation, this is very likely the problem. What happens is that each the executable and the .so have a global variable of std::string type (part of glog). These std::string global variables must be constructed when the object (exe, .so) is loaded by the dynamic linker/loader. Also, a destructor for each is added for cleanup using at_exit. However, when it comes time for at_exit functions to be called, both global reference point to the same std::string. That means the std::string destructor is called twice, but on the same object. Then free is called on the same memory location twice. Global std::string (or any class with a constructor) are a bad idea. If you choose to have a .so based architecture (a good idea), you have to be careful with all 3rd party libraries and how they handle globals. You stay out of most danger by linking to the .so for all 3rd party libraries.
Where the error is appearing is probably a little misleading. My best guess would be that you've got a vector of shared pointers and as it's being destroyed, one (at least) of those shared pointers is trying to delete the object that it's pointing to, only to find that it has already been deleted.
Are you mixing raw pointers with shared pointers anywhere? If so, you might find a perfectly innocuous looking delete somewhere which is pulling the rug from under the feet of your shared_ptr

C++ server crashes with abort() in _UTF8_init() on free()

I'm having problems with C++ code loaded via dlopen() by a C++ CGI server. After a while, the program crashes unexpectedly, but consistently at memory management function call (such as free(), calloc(), etc.) and produces core dump similar to this:
#0 0x0000000806b252dc in kill () from /lib/libc.so.6
#1 0x0000000804a1861e in raise () from /lib/libpthread.so.2
#2 0x0000000806b2416d in abort () from /lib/libc.so.6
#3 0x0000000806abdb45 in _UTF8_init () from /lib/libc.so.6
#4 0x0000000806abdfcc in _UTF8_init () from /lib/libc.so.6
#5 0x0000000806abeb1d in _UTF8_init () from /lib/libc.so.6
... the rest of the stack
Has anyone seen something like this before?
What is _UTF8_init() and why would memory management functions call it?
That smells like a corrupted heap, likely due to a buffer overrun somewhere in your code. Try running your program with Valgrind and look for any errors or warnings it emits.