Custom memory manager with thread local storage - c++

There is a custom memory manager in our program, all of our malloc/free calls are managed by the memory manager, but in the initial of the program getpwuid() will be call and in some customers' machine with nss_ldap activated it will call the malloc from libc not from our memory manager which leads to an error in our memory manager, the stack report from gdb is:
Breakpoint 2, 0x0000003df8cc6eb0 in brk () from /lib64/libc.so.6
0 0x0000003df8cc6eb0 in brk () from /lib64/libc.so.6
1 0x0000003df8cc6f72 in sbrk () from /lib64/libc.so.6
2 0x0000003df8c73d29 in __default_morecore () from /lib64/libc.so.6
3 0x0000003df8c70090 in _int_malloc () from /lib64/libc.so.6
4 0x0000003df8c70c9d in malloc () from /lib64/libc.so.6
5 0x0000003df880fc65 in __tls_get_addr () from /lib64/ld-linux-x86-64.so.2
6 0x00002aaaae302a7c in _nss_ldap_inc_depth () from /lib64/libnss_ldap.so.2
7 0x00002aaaae2f91a4 in _nss_ldap_enter () from /lib64/libnss_ldap.so.2
8 0x00002aaaae2f942c in _nss_ldap_getbyname () from /lib64/libnss_ldap.so.2
9 0x00002aaaae2f9aa9 in _nss_ldap_getpwuid_r () from /lib64/libnss_ldap.so.2
10 0x0000003df8c947c5 in getpwuid_r##GLIBC_2.2.5 () from /lib64/libc.so.6
11 0x0000003df8c9412f in getpwuid () from /lib64/libc.so.6
12 0x0000000001414be3 in lc_username ()
I've traced the code of _nss_ldap_inc_depth(), it seems the __tls_get_addr() got call because the thread local storage is used, I've try to change the memory manager to shared library but the __tls_get_addr() still call the malloc from libc, how can I made it call our memory manager instead of libc's ??

You can use LD_PRELOAD to load your library before any other library (including glibc) and it will be linked instead, something like:
$ LD_PRELOAD=/path/to/library/libmymalloc.so /bin/myprog
There's a tutorial here that shows how it works, it even has an example interposed malloc

You can change your memory manager to use mmap instread of brk.
There can be only one user of brk in a process. So if you haven't replaced all calls to malloc and related functions (calloc, strdup and more), you must not use brk.
mmap, however, has no such problems. Your memory manager can use mmap, and malloc can still work in parallel.

Related

double free or corruption (!prev) while calling __do_global_dtors_aux

I'm getting this error message after my app has done everything right
/lib64/libc.so.6[0x3f1ee70d7f]
/lib64/libc.so.6(cfree+0x4b)[0x3f1ee711db]
/home/user/workspace/NewProject/build/bin/TestApp(_ZN9__gnu_cxx13new_allocatorIN5boost10shared_ptrINS1_5uuids4uuidEEEE10deallocateEPS5_m+0x20)[0x49c174]
/home/user/workspace/NewProject/build/bin/TestApp(_ZNSt12_Vector_baseIN5boost10shared_ptrINS0_5uuids4uuidEEESaIS4_EE13_M_deallocateEPS4_m+0x32)[0x495b84]
/home/user/workspace/NewProject/build/bin/TestApp(_ZNSt12_Vector_baseIN5boost10shared_ptrINS0_5uuids4uuidEEESaIS4_EED2Ev+0x47)[0x49598b]
/home/user/workspace/NewProject/build/bin/TestApp(_ZNSt6vectorIN5boost10shared_ptrINS0_5uuids4uuidEEESaIS4_EED1Ev+0x65)[0x48bf27]
/lib64/libc.so.6(__cxa_finalize+0x8e)[0x3f1ee337fe]
/home/user/workspace/NewProject/build/components/lib_path/libhelper-d.so[0x2aaaab052b36]
If I run the program in gdb I can get the following backtrace, but it is all I get:
#0 0x0000003f1ee30285 in raise () from /lib64/libc.so.6
#1 0x0000003f1ee31d30 in abort () from /lib64/libc.so.6
#2 0x0000003f1ee692bb in __libc_message () from /lib64/libc.so.6
#3 0x0000003f1ee70d7f in _int_free () from /lib64/libc.so.6
#4 0x0000003f1ee711db in free () from /lib64/libc.so.6
#5 0x000000000049c174 in __gnu_cxx::new_allocator<boost::shared_ptr<boost::uuids::uuid> >::deallocate (this=0x2aaaab2cea50, __p=0x1cfd8d0)
at /opt/local/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.5/../../../../include/c++/4.4.5/ext/new_allocator.h:95
#6 0x0000000000495b84 in std::_Vector_base<boost::shared_ptr<boost::uuids::uuid>, std::allocator<boost::shared_ptr<boost::uuids::uuid> > >::_M_deallocate (
this=0x2aaaab2cea50, __p=0x1cfd8d0, __n=8) at /opt/local/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.5/../../../../include/c++/4.4.5/bits/stl_vector.h:146
#7 0x000000000049598b in std::_Vector_base<boost::shared_ptr<boost::uuids::uuid>, std::allocator<boost::shared_ptr<boost::uuids::uuid> > >::~_Vector_base (
this=0x2aaaab2cea50, __in_chrg=<value optimized out>)
at /opt/local/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.5/../../../../include/c++/4.4.5/bits/stl_vector.h:132
#8 0x000000000048bf27 in std::vector<boost::shared_ptr<boost::uuids::uuid>, std::allocator<boost::shared_ptr<boost::uuids::uuid> > >::~vector (this=0x2aaaab2cea50,
__in_chrg=<value optimized out>) at /opt/local/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.4.5/../../../../include/c++/4.4.5/bits/stl_vector.h:313
#9 0x0000003f1ee337fe in __cxa_finalize () from /lib64/libc.so.6
#10 0x00002aaaab052b36 in __do_global_dtors_aux ()
from /home/user/workspace/NewProject/build/components/lib_path/libhelper-d.so
#11 0x0000000000000000 in ?? ()
I really have no idea of how to proceed from here.
UPDATE I forgot to mention that the only global variable of the type which appears in the error is cleared m_uuids.size() == 0 by the time the error appear.
I had this same problem using glog. In my case, it was this scenario:
I had a share library, call it 'common.so' that linked glog.
My main executable, call it 'app' also linked glog, and linked in common.so.
The problem I had was that glog was linked statically in both the .so and the exectuable. When I changed both #1 and #2 to link the .so instead of the .a, the problem went away.
Not sure this is your problem, but it could be. Generally speaking, corruption when freeing up memory often means that you corrupted the memory pool (such as deleting the same pointer twice). I believe linking in the .a in both cases, I was getting cleanup behavior on the same global pointer (an std::string in my case) twice.
Update:
After much investigation, this is very likely the problem. What happens is that each the executable and the .so have a global variable of std::string type (part of glog). These std::string global variables must be constructed when the object (exe, .so) is loaded by the dynamic linker/loader. Also, a destructor for each is added for cleanup using at_exit. However, when it comes time for at_exit functions to be called, both global reference point to the same std::string. That means the std::string destructor is called twice, but on the same object. Then free is called on the same memory location twice. Global std::string (or any class with a constructor) are a bad idea. If you choose to have a .so based architecture (a good idea), you have to be careful with all 3rd party libraries and how they handle globals. You stay out of most danger by linking to the .so for all 3rd party libraries.
Where the error is appearing is probably a little misleading. My best guess would be that you've got a vector of shared pointers and as it's being destroyed, one (at least) of those shared pointers is trying to delete the object that it's pointing to, only to find that it has already been deleted.
Are you mixing raw pointers with shared pointers anywhere? If so, you might find a perfectly innocuous looking delete somewhere which is pulling the rug from under the feet of your shared_ptr

C++ server crashes with abort() in _UTF8_init() on free()

I'm having problems with C++ code loaded via dlopen() by a C++ CGI server. After a while, the program crashes unexpectedly, but consistently at memory management function call (such as free(), calloc(), etc.) and produces core dump similar to this:
#0 0x0000000806b252dc in kill () from /lib/libc.so.6
#1 0x0000000804a1861e in raise () from /lib/libpthread.so.2
#2 0x0000000806b2416d in abort () from /lib/libc.so.6
#3 0x0000000806abdb45 in _UTF8_init () from /lib/libc.so.6
#4 0x0000000806abdfcc in _UTF8_init () from /lib/libc.so.6
#5 0x0000000806abeb1d in _UTF8_init () from /lib/libc.so.6
... the rest of the stack
Has anyone seen something like this before?
What is _UTF8_init() and why would memory management functions call it?
That smells like a corrupted heap, likely due to a buffer overrun somewhere in your code. Try running your program with Valgrind and look for any errors or warnings it emits.

Simultaneous abort() in two threads

I have a backtrace with something I haven't seen before. See frame 2 in these threads:
Thread 31 (process 8752):
#0 0x00faa410 in __kernel_vsyscall ()
#1 0x00b0b139 in sigprocmask () from /lib/libc.so.6
#2 0x00b0c7a2 in abort () from /lib/libc.so.6
#3 0x00752aa0 in __gnu_cxx::__verbose_terminate_handler () from /usr/lib/libstdc++.so.6
#4 0x00750505 in ?? () from /usr/lib/libstdc++.so.6
#5 0x00750542 in std::terminate () from /usr/lib/libstdc++.so.6
#6 0x00750c65 in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
#7 0x00299c63 in ApplicationFunction()
Thread 1 (process 8749):
#0 0x00faa410 in __kernel_vsyscall ()
#1 0x00b0ad80 in raise () from /lib/libc.so.6
#2 0x00b0c691 in abort () from /lib/libc.so.6
#3 0x00b4324b in __libc_message () from /lib/libc.so.6
#4 0x00b495b6 in malloc_consolidate () from /lib/libc.so.6
#5 0x00b4b3bd in _int_malloc () from /lib/libc.so.6
#6 0x00b4d3ab in malloc () from /lib/libc.so.6
#7 0x08147f03 in AnotherApplicationFunction ()
When opening it with gdb and getting backtrace it gives me thread 1. Later I saw the weird state that thread 31 is in. This thread is from the library that we had problems with so I'd believe the crash is caused by it.
So what does it mean? Two threads simultaneously doing something illegal? Or it's one of them, causing somehow abort() in the other one?
The OS is Linux Red Hat Enterprise 5.3, it's a multiprocessor server.
It is hard to be sure, but my first suspicion upon seeing these stack traces would be a memory corruption (possibly a buffer overrun on the heap). If that's the case, then the corruption is probably the root cause of both threads ending up in abort.
Can you valgrind your app?
Looks like it could be heap corruption, detected by malloc in thread 1, causing or caused by the error in thread 31.
Some broken piece of code overwriting a.o. the vtable in thread 31 could easily cause this.
It's possible that the reason thread 31 aborted is because it trashed the application heap in some way. Then when the main thread tried to allocate memory the heap data structure was in a bad state, causing the allocation to fail and abort the application again.

Boost threads coring on startup

I have a program that brings up and tears down multiple threads throughout its life. Everything works great for awhile, but eventually, I get the following core dump stack trace.
#0 0x009887a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x007617a5 in raise () from /lib/tls/libc.so.6
#2 0x00763209 in abort () from /lib/tls/libc.so.6
#3 0x003ec1bb in __gnu_cxx::__verbose_terminate_handler () from /usr/lib/libstdc++.so.6
#4 0x003e9ed1 in __cxa_call_unexpected () from /usr/lib/libstdc++.so.6
#5 0x003e9f06 in std::terminate () from /usr/lib/libstdc++.so.6
#6 0x003ea04f in __cxa_throw () from /usr/lib/libstdc++.so.6
#7 0x00d5562b in boost::thread::start_thread () from /h/Program/bin/../lib/libboost_thread-gcc34-mt-1_39.so.1.39.0
At first, I was leaking threads, and figured the core was due to hitting some maximum limit of number of current threads, but now it seems that this problems occurs even when I don't. For reference, in the core above there were 13 active threads executing.
I did some searching to try and figure out why start_thread would core, but I didn't come across anything. Anyone have any ideas?
start_thread is throwing an uncaught exception, see which exceptions can start_thread throw and place a catch around it to see what is the problem.
What are the values carried by thread_resource_error? It looks like you can call native_error() to find out.
Since this is a wrapper around pthreads there are only a couple of possibilities - EAGAIN, EINVAL and EPERM. It looks as if boost has exceptions it would likely throw for EINVAL and EPERM - i.e. unsupported_thread_option() and thread_permission_error().
That pretty much leaves EAGAIN so I would double check that you really aren't exceeding the system limits on the number of threads. You are sure you are joining them, or if detached, they are really gone?

std::vector reserve method fails to allocate enough memory

I have a buffer class in my C++ application as follows:
class Buffer
{
public:
Buffer(size_t res): _rpos(0), _wpos(0)
{
_storage.reserve(res);
}
protected:
size_t _rpos, _wpos;
std::vector<uint8> _storage;
}
Sometimes using the constructor fails because its unable to allocate the required memory space. For example, once, calling the constructor with res = 37 caused a segfault with the following stack trace that i got from its core dump:
#0 0x00007f916a176ed5 in raise () from /lib/libc.so.6
No symbol table info available.
#1 0x00007f916a1783f3 in abort () from /lib/libc.so.6
No symbol table info available.
#2 0x00007f916a1b33a8 in ?? () from /lib/libc.so.6
No symbol table info available.
#3 0x00007f916a1b8948 in ?? () from /lib/libc.so.6
No symbol table info available.
#4 0x00007f916a1bb17c in ?? () from /lib/libc.so.6
No symbol table info available.
#5 0x00007f916a1bca78 in malloc () from /lib/libc.so.6
No symbol table info available.
#6 0x00007f916ac0c16d in operator new (sz=37)
at ../../.././libstdc++-v3/libsupc++/new_op.cc:52
p = <value optimized out>
#7 0x00000000004e3d11 in std::vector<unsigned char, std::allocator<unsigned char> >::reserve (this=0x7f911bc49cc0, __n=31077)
at /usr/local/lib/gcc/x86_64-unknown-linux-gnu/4.4.2/../../../../include/c++/4.4.2/ext/new_allocator.h:89
__old_size = 0
__tmp = <value optimized out>
I've compiled this application using GCC 4.4.2 as a 64 bit application and I'm using it in Debian 5 x64.
Any help is much appreciated.
Thanks
Because the segfault is in malloc, most likely some other code has trashed the heap (i.e. written to parts of memory they do not own and are being used by the heap manager).
I recommend using Valgrind to find what code is trashing the heap.
If you can't use Valgrind to find out where your memory is corrupted because of the heavy load it implies, you can still test with lighter solutions.
For server application where Valgrind was not applicable (because the platform was on Solaris 8), I had pretty good result with mpatrol ( http://mpatrol.sf.net ) but especially dmalloc ( http://dmalloc.com ).
To some extend, you can use them without recompiling (just relinking for dmalloc, library preloading for mpatrol). They'll replace the memory primitives to perform extra checks on the memory use (bad argument to those primitives, reading off-by-one, heap corruption, ...) Some of those checks will be triggered exactly when the problem occurs while others will be triggered a bit later than the actual bad code. By tuning which checks are enabled, and when applicable the check frequency, you can run at almost full speed while performing basic checks.
I recommend recompiling with dmalloc to get so called 'FUNC_CHECK', for me, this added a lot of accuracy in bug spotting with a negligible performance cost.