Override global new/delete and malloc/free with tcmalloc library - c++

I want to override new/delete and malloc/free. I have tcmalloc library linked in my application. My aim is to add stats.
From new I am calling malloc. Below is an example it's global.
void* my_malloc(size_t size, const char *file, int line, const char *func)
{
void *p = malloc(size);
....
....
....
return p;
}
#define malloc(X) my_malloc(X, __FILE__, __LINE__, __FUNCTION__)
void *
operator new(size_t size)
{
auto new_addr = malloc(size);
....
...
return new_addr;
}
New/delete override is working fine.
My question is what happen to other file where I have use malloc directly for example
first.cpp
malloc(sizeof(..))
second.cpp
malloc(sizeof(..))
How this malloc call get's interpret as my macro is not in header file.

tcmalloc provides new/delete hooks that can be used to implement any kind of tracking/accounting of memory usage. See e.g. AddNewHook in https://github.com/gperftools/gperftools/blob/master/src/gperftools/malloc_hook.h

Related

Dynamic vector allocation in external RAM

I am currently working on a large project of my own on an STM32F7 cortex-m7 microcontroller in C++ using GCC. I need to store a wide array in an external SDRAM (16 MB) containing vectors of notes structures (12 bytes each). I have already a working FMC and a custom ram region created
/* Specify the memory areas */
MEMORY
{
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 512K
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 1024K
SDRAM (xrw) : ORIGIN = 0xC0000000, LENGTH = 16M
}
/* Define output sections */
SECTIONS
{
/* Section créée pour l'allocation dans la SDRAM externe*/
.fmc :
{
. = ALIGN(8);
*(.fmc)
*(.fmc*)
. = ALIGN(8);
} >SDRAM
my array is declared like this :
std::vector<SequencerNoteEvent> NotesVectorArray[kMaxPpqn] __attribute__((section(".fmc")));
So far it's OK. I created an array of vectors in my external RAM. How can i proceed in order to make my vectors element creation
NotesVectorArray[position].push_back(note);
happening in the same external RAM dynamically ? I am currently only able to declare static data using the __attribute__(section)
I read lot of things about C++ allocators, memory pools, but i don't get where the allocation take place in the vector code and how i should replace it... I "just" need to have the same allocation system than the usual but in another part of my memory for this precise type.
It seems possible to have multiple heaps. Where is located the connection between scatter file and the effective memory allocation ?
Thanks in advance,
Ben
I was facing the same issue and got a working solution by redefining operator new. The benefit of this over using a custom allocator is that all dynamic allocations will automatically be in SDRAM, so you're free to use std::function or any STL container that requires the heap, without having to pass a custom allocator argument every time.
I did the following:
Remove the standard library by adding -nostdlib to the compilation flags. I also added this to the linker flags. Also remove and --specs=... from your compiler and linker flags. Bonus: you'll save ~60-80k of code space!
Write your own replacement for operator new, operator delete. I created a new file called new.cpp added the following: (taken from https://arobenko.gitbooks.io/bare_metal_cpp/content/compiler_output/dyn_mem.html)
//new.cpp
#include <cstdlib>
#include <new>
void *operator new(size_t size) noexcept { return malloc(size); }
void operator delete(void *p) noexcept { free(p); }
void *operator new[](size_t size) noexcept { return operator new(size); }
void operator delete[](void *p) noexcept { operator delete(p); }
void *operator new(size_t size, std::nothrow_t) noexcept { return operator new(size); }
void operator delete(void *p, std::nothrow_t) noexcept { operator delete(p); }
void *operator new[](size_t size, std::nothrow_t) noexcept { return operator new(size); }
void operator delete[](void *p, std::nothrow_t) noexcept { operator delete(p); }
Define a variable in the linker script that corresponds to the lowest SDRAM address (start of the heap). This isn't strictly necessary, as you could use a constant (0xC0000000) in your code in step 4, but it makes things easier to keep the SDRAM address in just one place. Just add one line: PROVIDE( _fmc_start = . );
//linker script
[snip]
/* Section créée pour l'allocation dans la SDRAM externe*/
.fmc :
{
. = ALIGN(8);
PROVIDE( _fmc_start = . );
*(.fmc)
*(.fmc*)
. = ALIGN(8);
} >SDRAM
The new implementation of new will directly call malloc(), which will call _sbrk() when it needs a block of memory in the heap.
So you have to define _sbrk() to keep track of the heap pointer, and simply start the pointer at the lowest SDRAM address that you just defined in the linker script.
Here's what I did:
//sbrk.cpp, or append this to new.cpp
#include <cstdlib>
extern "C" size_t _sbrk(int incr)
{
extern char _fmc_start; // Defined by the linker
static char *heap_end;
char *prev_heap_end;
if (heap_end == 0)
heap_end = &_fmc_start;
prev_heap_end = heap_end;
//Optionally can check for out-of-memory error here
heap_end += incr;
return (size_t)prev_heap_end;
}
At this point, if you try to compile you will probably get many linker errors. The exact errors will depend on what parts of the standard library your project uses besides just new and delete. In my case, I was using std::function and the compiler complained that it didn't have __throw_bad_function_call(). You probably will also see an error for things like _init() and _fini() and __errno. The basic strategy is to create your own empty function stub or variable declaration for each of these. It's worth doing a quick search for each of these to see what purpose they serve, you might find out your project or a library you're including is using some features you weren't aware of! When you create a stub, you'll need to match the function signature correctly, so that will require searching the internet as well. Many of the stubs are for handling errors/exceptions, so you can decide how to handle it in the function body (or ignore errors).
For example, here's some info on the (obsolete, but required) _init() and _fini() functions: https://tldp.org/HOWTO/Program-Library-HOWTO/miscellaneous.html
Here's some info on __dso_handle: https://lists.debian.org/debian-gcc/2003/07/msg00057.html
__cxa_pure_virtual(): What is the purpose of __cxa_pure_virtual?
Here's what my project required to work:
//syscalls.cpp
extern "C" {
void _init(void) {}
void _fini(void) {}
int __errno;
void *__dso_handle = (void *)&__dso_handle;
}
namespace std
{
//These are error handlers. They could alternatively restart, or log an error
void __throw_bad_function_call() { while (1); }
void __cxa_pure_virtual() { while (1); }
} // namespace std
After all that, it will finally compile and if you use a debugger you'll see that the address of your vectors start at 0xC0000000!

fixing the recursive call to malloc with LD_PRELOAD

I am using LD_PRELOAD to log malloc calls from an application and map out the virtual address space however malloc is used internally by fopen/printf. Is there a way I can fix this issue?
I know about glibc's hooks but I want to avoid changing the source code of the application.
My issue was caused by the fact that malloc is used internally by glibc so when I use LD_PRELOAD to override malloc any attempt to log caused malloc to be called resulting in a recursive call to malloc itself
Solution:
call original malloc whenever the TLS needs memory allocation
providing code:
static __thread int no_hook;
static void *(*real_malloc)(size_t) = NULL;
static void __attribute__((constructor))init(void) {
real_malloc = (void * (*)(size_t))dlsym(RTLD_NEXT, "malloc");
}
void * malloc(size_t len) {
void* ret;
void* caller;
if (no_hook) {
return (*real_malloc)(len);
}
no_hook = 1;
caller = (void*)(long) __builtin_return_address(0);
printf("malloc call %zu from %lu\n", len, (long)caller);
ret = (*real_malloc)(len);
// fprintf(logfp, ") -> %pn", ret);
no_hook = 0;
return ret;
}

Track memory allocation per function

So I know I can track memory allocation with methods of overloading new globally like so: http://www.almostinfinite.com/memtrack.html
However, I was wondering if there was a good way to do this per function so I can get a report of how much is allocated per function. Right now I can get file and lines and what the typeid is as in the link I provided but I would like to find which function is allocating the most.
What about doing something like: http://ideone.com/Wqjkrw
#include <iostream>
#include <cstring>
class MemTracker
{
private:
static char func_name[100];
static size_t current_size;
public:
MemTracker(const char* FuncName) {strcpy(&func_name[0], FuncName);}
static void inc(size_t amount) {current_size += amount;}
static void print() {std::cout<<func_name<<" allocated: "<<current_size<<" bytes.\n";}
static void reset() {current_size = 0; memset(&func_name[0], 0, sizeof(func_name)/sizeof(char));}
};
char MemTracker::func_name[100] = {0};
size_t MemTracker::current_size = 0;
void* operator new(size_t size)
{
MemTracker::inc(size);
return malloc(size);
}
void operator delete(void* ptr)
{
free(ptr);
}
void FuncOne()
{
MemTracker(__func__);
int* i = new int[100];
delete[] i;
i = new int[200];
delete[] i;
MemTracker::print();
MemTracker::reset();
}
void FuncTwo()
{
MemTracker(__func__);
char* c = new char[1024];
delete[] c;
c = new char[2048];
delete[] c;
MemTracker::print();
MemTracker::reset();
}
int main()
{
FuncOne();
FuncTwo();
FuncTwo();
FuncTwo();
return 0;
}
Prints:
FuncOne allocated: 1200 bytes.
FuncTwo allocated: 3072 bytes.
FuncTwo allocated: 3072 bytes.
FuncTwo allocated: 3072 bytes.
What platform are you using? There might be platform specific solutions without changing the functions in your code base.
If you are using Microsoft Visual Studio, you can use compiler switches /Gh and /GH to let the compiler call functions _penter and _pexit that you can define. In those functions, you can query how much memory the program is using. There should be enough information in there to figure out how much memory is allocated in each function.
Example code for checking memory usage is provided in this MSDN article.

Global new operator overloading

I have read about new and delete overloading for memory tracking in How_To_Find_Memory_Leaks
I defined these global operators:
inline void* __cdecl operator new( unsigned int size, const char *file, int line ) {
void* ptr = malloc( size );
AddTrack((DWORD)ptr, size, file, line);
return ptr;
}
inline void* __cdecl operator new( unsigned int size, void* ptr, const char *file, int line ) {
return ptr;
}
It works well with new and new[] operators, but i have a problem with placement new ( second one ). My define look like:
#define new new( __FILE__, __LINE__)
#define new(x) new( x, __FILE__, __LINE__)
They work separately. But when i try to use them both there are errors appear. As I understand they substitute each other. I know I can have macro with variable number of arguments like this:
#define new( ... ) new( __VA_ARGS__, __FILE__, __LINE__)
But I need the same macro with and without arguments at all, so both new-s in these lines substitute right:
g_brushes = new Brush[ num_brushes ];
...
new( &g_brushes[i] )Brush(sides);
If you decide to walk the dark path of overriding the global new, you have to make sure you consider all of the following scenarios:
new Foo; // 1
new Foo[10]; // 2
new (std::nothrow) Foo; // 3
new (p) Foo; // 4 - placement, not overridable
(Foo*) ::operator new(sizeof(Foo)); // 5 - direct invocation of the operator
And it should be possible to be able to handle all the above except for the last. Shudder.
The necessary knol and the sleight of hand is that your macro should end with new. When your macro ends with new, you can delegate the different ways it can be invoked to new itself.
Here's one, non-thread safe way to proceed. Define a type to capture the context of the invocation, so we will be able to retrieve this context later in the operator itself,
struct new_context {
new_context(const char* file, const int line)
: file_(file), line_(line) { scope_ = this; }
~new_context() { scope_ = 0; }
static new_context const& scope() { assert(scope_); return *scope_; }
operator bool() const { return false; }
const char* file_;
const int line_;
private:
static new_context* scope_;
};
Next, define your override to create a new_context temporary just before the invocation,
#define new new_context(__FILE__, __LINE__) ? 0 : new
using the ternary operator conditional assignment to evaluate an expression before delegating to the operator new, notice that it is where the operator bool that we have defined above comes handy.
Then inside your new overrides (I'll use standard C++98 here instead of the MSC C++), all you have to do is to retrieve the context:
void* operator new (std::size_t size) throw (std::bad_alloc) {
std::cout
<< "new"
<< "," << new_context::scope().file_
<< ":" << new_context::scope().line_
<< std::endl;
return 0;
}
This approach will deal with all the cases 1-4 from above, and what's important and can be easily overlooked is that in case 4 where your overloads are not invoked, as you cannot replace placement new (§18.4.​1.3), you still know that placement new took place because new_context will be created and destroyed.
To summarise, you don’t need to modify what follows after the new operator, so all possible syntaxes remain valid. And the other point is that the new_context object temporary is going to be kept alive until the end of the expression the operator participates in, so you are safe to obtain it from a global singleton.
See a gcc example live. Adapting to MSC is left to the reader.

Problems with LD_PRELOAD and calloc() interposition for certain executables

Relating to a previous question of mine
I've successfully interposed malloc, but calloc seems to be more problematic.
That is with certain hosts, calloc gets stuck in an infinite loop with a possible internal calloc call inside dlsym. However, a basic test host does not exhibit this behaviour, but my system's "ls" command does.
Here's my code:
// build with: g++ -O2 -Wall -fPIC -ldl -o libnano.so -shared Main.cc
#include <stdio.h>
#include <dlfcn.h>
bool gNanoUp = false;// global
// Function types
typedef void* (*MallocFn)(size_t size);
typedef void* (*CallocFn)(size_t elements, size_t size);
struct MemoryFunctions {
MallocFn mMalloc;
CallocFn mCalloc;
};
MemoryFunctions orgMemFuncs;
// Save original methods.
void __attribute__((constructor)) __nano_init(void) {
fprintf(stderr, "NANO: init()\n");
// Get address of original functions
orgMemFuncs.mMalloc = (MallocFn)dlsym(RTLD_NEXT, "malloc");
orgMemFuncs.mCalloc = (CallocFn)dlsym(RTLD_NEXT, "calloc");
fprintf(stderr, "NANO: malloc() found #%p\n", orgMemFuncs.mMalloc);
fprintf(stderr, "NANO: calloc() found #%p\n", orgMemFuncs.mCalloc);
gNanoUp = true;
}
// replacement functions
extern "C" {
void *malloc(size_t size) {
if (!gNanoUp) __nano_init();
return orgMemFuncs.mMalloc(size);
}
void* calloc(size_t elements, size_t size) {
if (!gNanoUp) __nano_init();
return orgMemFuncs.mCalloc(elements, size);
}
}
Now, When I do the following, I get an infinite loop followed by a seg fault, eg:
% setenv LD_PRELOAD "./libnano.so"
% ls
...
NANO: init()
NANO: init()
NANO: init()
Segmentation fault (core dumped)
However if I comment out the calloc interposer, it almost seems to work:
% setenv LD_PRELOAD "./libnano.so"
% ls
NANO: init()
NANO: malloc() found #0x3b36274dc0
NANO: calloc() found #0x3b362749e0
NANO: init()
NANO: malloc() found #0x3b36274dc0
NANO: calloc() found #0x3b362749e0
<directory contents>
...
So somethings up with "ls" that means init() gets called twice.
EDIT
Note that the following host program works correctly - init() is only called once, and calloc is successfully interposed, as you can see from the output.
// build with: g++ test.cc -o test
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[]) {
void* p = malloc(123);
printf("HOST p=%p\n", p);
free(p);
char* c = new char;
printf("HOST c=%p\n", c);
delete c;
void* ca = calloc(10,10);
printf("HOST ca=%p\n", ca);
free(ca);
}
% setenv LD_PRELOAD "./libnano.so"
% ./test
NANO: init()
NANO: malloc() found #0x3b36274dc0
NANO: calloc() found #0x3b362749e0
HOST p=0x601010
HOST c=0x601010
HOST ca=0x601030
I know I am a bit late (6 years). But I wanted to override calloc() today and faced a problem because dlsym() internally uses calloc(). I solved it using a simple technique and thought of sharing it here:
static unsigned char buffer[8192];
void *calloc(size_t nmemb, size_t size)
{
if (calloc_ptr == NULL) // obtained from dlsym
return buffer;
init(); // uses dlsym() to find address of the real calloc()
return calloc_ptr(len);
}
void free(void *in)
{
if (in == buffer)
return;
free_ptr(in);
}
buffer satisfies the need of dlsym() till the real calloc() has been located and my calloc_ptr function pointer initialized.
With regard to __nano_init() being called twice: You've declared the function as a constructor, so it's called when the library is loaded, and it's called a second time explicitly when your malloc() and calloc() implementations are first called. Pick one.
With regard to the calloc() interposer crashing your application: Some of the functions you're using, including dlsym() and fprintf(), may themselves be attempting to allocate memory, calling your interposer functions. Consider the consequences, and act accordingly.
Using dlsym based hooking can result in crashes, as dlsym calls back into the memory allocator. Instead use malloc hooks, as I suggested in your prior question; these can be installed without actually invoking dlsym at all.
You can get away with a preliminary poor calloc that simply returns NULL. This actually works on Linux, YMMV.
static void* poor_calloc(size_t nmemb, size_t size)
{
// So dlsym uses calloc internally, which will lead to infinite recursion, since our calloc calls dlsym.
// Apparently dlsym can cope with slightly wrong calloc, see for further explanation:
// http://blog.bigpixel.ro/2010/09/interposing-calloc-on-linux
return NULL; // This is a poor implementation of calloc!
}
You can also use sbrk to allocate the memory for "poor calloc".