Determine allocation call site over time (line of code) [duplicate] - c++

I am looking for a way to track memory allocations in a C++ program. I am not interested in memory leaks, which seem to be what most tools are trying to find, but rather creating a memory usage profile for the application. Ideal output would be either a big list of function names plus number of maximum allocated bytes over time or better yet, a graphical representation of the heap over time. Horizontal axis is time, vertical axis heap space. Every function would get it's own color and draw lines according to allocated heap bytes. Bonus points for identifying allocated object types as well.
The idea is to find memory bottlenecks/to visualize what functions/threads consume the most memory and should be targetted for further optimization.
I have briefly looked at Purify, BoundsChecker and AQTime but they don't seem to be what I'm after. Valgrind looks suitable, however, I'm on Windows. Memtrack looks promising, but requires significant changes to the source code.
My google skills must have failed me, cause it doesn't seem to be such an uncommon request? All the needed information to create a tool like that should be readily available from the program's debug symbols plus runtime API calls - no?

Use Valgrind and its tool Massif. Its example output (a part of it):
99.48% (20,000B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->49.74% (10,000B) 0x804841A: main (example.c:20)
|
->39.79% (8,000B) 0x80483C2: g (example.c:5)
| ->19.90% (4,000B) 0x80483E2: f (example.c:11)
| | ->19.90% (4,000B) 0x8048431: main (example.c:23)
| |
| ->19.90% (4,000B) 0x8048436: main (example.c:25)
|
->09.95% (2,000B) 0x80483DA: f (example.c:10)
->09.95% (2,000B) 0x8048431: main (example.c:23)
So, you will get detailed information:
WHO allocated the memory (functions: g(), f(), and main() in above example); you also get complete backtrace leading to allocating function,
to WHICH data structure the memory did go (no data structures in above example),
WHEN it happened,
what PERCENTAGE of all allocated memory it is (g: 39.7%, f: 9.95%, main: 49.7%).
Here is Massif manual
You can track heap allocation as well as stack allocation (turned off by default).
PS. I just read that you're on Windows. I will leave the answer though, because it gives a picture of what you can get from a possible tool.

Microsoft have well documented memory tracking functions. However, for some reason they are not really well-known in the developer community. These are CRT debug functions. Good starting point will be CRT Debug Heap functions.
Check the following links for more details
Heap state reporting functions
Tracking heap allocation requests. Probably this is the functionality that you are looking for.

For a generic C++ memory tracker you will need to overload the following:
global operator new
global operator new []
global operator delete
global operator delete []
any class allocators
any in-place allocators
The tricky bit is getting useful information, the overloaded operators only have size information for allocators and memory pointers for deletes. One answer is to use macros. I know. Nasty. An example - place in a header which is included from all source files:
#undef new
void *operator new (size_t size, char *file, int line, char *function);
// other operators
#define new new (__FILE__, __LINE__, __FUNCTION__)
and create a source file with:
void *operator new (size_t size, char *file, int line, char *function)
{
// add tracking code here...
return malloc (size);
}
The above only works if you don't have any operator new defined at class scope. If you do have some at class scope, do:
#define NEW new (__FILE__, __LINE__, __FUNCTION__)
and replace 'new type' with 'NEW type', but that requires changing a lot of code potentially.
As it's a macro, removing the memory tracker is quite straightforward, the header becomes:
#if defined ENABLED_MEMORY_TRACKER
#undef new
void *operator new (size_t size, char *file, int line, char *function);
// other operators
#define NEW new (__FILE__, __LINE__, __FUNCTION__)
#else
#define NEW new
#endif
and the implementation file:
#if defined ENABLED_MEMORY_TRACKER
void *operator new (size_t size, char *file, int line, char *function)
{
// add tracking code here...
return malloc (size);
}
endif

Update: to the answer of #Skizz
Since C++20, we can use std::source_location instead of macros like __FILE__ and __LINE__.
(As this is a major simplification, I believe that it deserves a seperate answer).

On Xcode, you can use Instruments to track allocations, VM usage, and several other parameters. Mostly popular among iOS developers, but worth a try.

On Mac OS X, you can use the code profiling tool Shark to do this, IIRC.

"A graphical representation of the heap over time" - close to what you are looking for is implemented in Intel(R) Single Event API, details can be found in this article (its rather big to put it here).
It shows you timeline of per-block-size allocations and allows to add additional mark up to your code to understand the whole picture better.

The Visual Studio IDE has built-in heap profiling support (since 2015), which is probably the easiest to start with. It has graphical views of heap usage over time, and can track allocations by function/method.
Measure memory usage in Visual Studio
The CRT also has debug and profile support, which is more detailed and more low-level. You could track the data and plot the results using some other tool:
CRT Debug Heap Details
In particular, look at _CrtMemCheckpoint and related functions.

Related

Overloading base types with a custom allocator, and its alternatives

So, this is a bit of an open question. But let's say that I have a large application which globally overrides the various new and delete operators so that they use home-brewed jemalloc-style arenas and custom alignments.
All fine and good, but I have been running into segfault issues because other C++-based DLLs and their dependencies also use the overloaded allocators when they shouldn't (namely LLVM), putting the little custom allocator to its knees (lack of memory and more stresses).
Testing workarounds, I have wrapped (and moved) those global operators into a class, and I made all base classes inherit from it. And well, that works for classes, but not for base types. That's the problem.
Given that C++ doesn't allow useful things like having separate allocators per namespace, or limiting the new operator per executable module, what is the best way of emulating this in base data types, where I can't directly subclass an int?
The obvious way is wrapping them in a custom template, but the problem is performance. Do I have to emulate all the array and indexing operations under a second layer just so that I can malloc from a different place without having to change the rest of the functional code? There's a better way?
P.S.: I have also been thinking about using special global new/delete operators with extra parameters, while leaving the standard ones alone. Thus ensuring that I am (well, my executable module is) the only one calling those global functions. It should be a simple search-and-replace.
Well, quick update. What I did in the end to 'solve' this conundrum is to manually detect if the code that called the overridden global allocators comes from the main executable module and conditionally redirect all the external new / delete calls to their corresponding malloc / free while still using the custom arena allocator for our own internal code.
How? After doing some R&D I found that this could be done by using the _ReturnAddress() built-in on MSVC and __builtin_extract_return_addr(__builtin_return_address(0)) on GCC/Clang; and I can say that it seems to work fine so far in production software.
Now, when some C++ code from our address space wants some memory we can see where it comes from.
But, how do we find out if that address is part of some other module in our process space or our own? We might need to find out both the base and end addresses of the main program, cache them at startup as globals, and check that the return address is within bounds.
All for extremely little overhead. But, our second problem is that retrieving the base address is different in every platform. After some research I found that things were more straightforward than expected:
In Windows/Win32 we can simply do this:
#include <windows.h>
#include <psapi.h>
inline void __initialize_base_address()
{
MODULEINFO minfo;
GetModuleInformation(GetCurrentProcess(), GetModuleHandle(NULL), &minfo, sizeof(minfo));
base_addr = (uintptr_t) minfo.lpBaseOfDll;
base_end = (uintptr_t) minfo.lpBaseOfDll + minfo.SizeOfImage;
}
In Linux there are a thousand ways of doing this, including linker globals and some debuggey (verbose and unreliable) ways of walking the process module table. I was looking at the linker map output and noticed that the _init and _fini functions always seem to wrap the rest of the .text section symbols. Sometimes it's hard to get to the simplest solution that works everywhere:
#include <link.h>
inline void __initialize_base_address()
{
void *handle = dlopen(0, RTLD_NOW);
base_addr = (uintptr_t) dlsym(handle, "_init");
base_end = (uintptr_t) dlsym(handle, "_fini");
dlclose(handle);
}
While in macOS things are even less documented and I had to cobble together my own thing using the Darwin kernel open-source code and tracking down some obscure low-level tools as reference. Keep in mind that _NSGetMachExecuteHeader() is just a wrapper for the internal _mh_execute_header linker global. If you need to do anything about parsing the Mach-O format and its structures then getsect.h is the way to go:
#include <mach-o/getsect.h>
#include <mach-o/ldsyms.h>
#include <crt_externs.h>
inline void __initialize_base_address()
{
size_t size;
void *ptr = getsectiondata(&_mh_execute_header, SEG_TEXT, SECT_TEXT, &size);
base_addr = (uintptr_t) _NSGetMachExecuteHeader();
base_end = (uintptr_t) ptr + size;
}
Another thing to keep in mind is that this some-other-cpp-module-is-using-our-internal-allocator-that-globally-overrides-new-causing-weird-bugs issue seems to be a problem in Linux and maybe macOS, I didn't have this issue in Windows, probably because no conflicting DLLs were loaded in the process, being mostly C API-based. I think, or maybe the platform uses different C++ runtimes for each module.
The main issue I had was caused by Mesa3D, which uses LLVM (pure C++ in and out) for many of their GLSL shader compilers and liked to gobble up big chunks of my small custom-tailored memory arena uninvited.
Rewriting a legacy program that is structurally dependent on these allocators was out of the question due to its sheer size and complexity, so this turned out to be the best way of making things work as expected.
It's only a few lines of optional, sneaky, extra per-platform code.

How to remediate Microsoft typeinfo.name() memory leaks?

Microsoft has a decades old bug when using its leak check gear in debug builds. The leak is reported from the allocation made by the runtime library when using C++ type information, like typeinfo.name(). Also see Memory leaks reported by debug CRT inside typeinfo.name() on Microsoft Connect.
We've been getting error reports and user list discussions because of the leaks for about the same amount of time. The Microsoft bug could also mask real leaks from user programs. The latter point is especially worrisome to me because we may not tending to real problems because of the masking.
I'd like to try to squash the leaks due to use of typeid(T) and typeinfo.name(). My question is, how can we work around Microsoft's bug? Is there a work around available?
On the line of my suggestion in the Q's comments.
For if (valueType == typeid(int)) you can use the type_index (at least since C++11)
For type_info.name() leaking memory:
Since totally eliminating the leak doesn't seem possible, the next best thing would be reduce their number (to only one leaked per type interrogation) and, secondary, to tag them for reporting purposes. Being inside some templated classes, one can hope that the 'mem leaking' report will use the class names (or at least the source file where the allocation happened) - you can subsequently use this information to filter them out from the 'all leaked memory' reports.
So instead of using typeid(<typename>), you use something like:
"file typeid_name_workaround.hpp"
template <typename T> struct get_type_name {
static const char* name() const {
static const char* ret=typeid(T).name();
return ret;
}
};
Other .cpp/.hpp file
#include "typeid_name_workaround.hpp"
struct dummy {
};
int main() {
// instead of typeid(dummy).name() you use
get_type_name<dummy>::name();
}

Get number of blocks allocated on the heap to detect memory leaks

Is there a function available that can get the number of blocks of memory that are currently allocated on the heap? It can be Windows/Visual Studio specific.
I'd like to use that to check if a function leaks memory, without using a dedicated profiler. I'm thinking about something like this:
int before = AllocatedBlocksCount();
foo();
if (AllocatedBlocksCount() > before)
printf("Memory leak!!!");
There are several ways to do it (specific to the CRT that comes with Microsoft Visual Studio.)
One way would be to use the _CrtMemCheckpoint() function before and after the call you are interested in, and then compare the difference with _CrtMemDifference().
_CrtMemState s1, s2, s3;
_CrtMemCheckpoint (&s1);
foo(); // Memory allocations take place here
_CrtMemCheckpoint (&s2);
if (_CrtMemDifference(&s3, &s1, &s2)) // Returns true if there's a difference
_CrtMemDumpStatistics (&s3);
You can also enumerate all the allocated blocks using _CrtDoForAllClientObjects(), and a couple of other methods using the debug routines of the Visual C++ CRT.
Notes:
All these are in the <crtdbg.h> header.
They obviously work only on Windows and when compiling with VC.
You need to set up CRT debugging and a few flags and other things.
These are rather tricky features; make sure to read the relevant parts of the MSDN carefully.
These only work in debug mode (i.e. linking with the debug CRT and the _DEBUG macro defined.)

Using 'new' to allocate memory dynamically in C++?

I am working on some C++ code and am having some problems with the function described below. I haven't used much C++ before, at least not for a long time and so i'm trying to learn as I go along to a large extent. The win32api doesn't help much with the confusion factor either...
The function is succesfully called twice, before failing when called at a later stage when it is called in the application.
PTSTR getDomainFromDN(PTSTR dnPtstr) {
size_t nDn=wcslen(dnPtstr);
size_t *pnNumCharConverted = new size_t;
wchar_t *szTemp = new wchar_t[10]; // for debugging purposes
_itow_s((int)nDn,szTemp,10,10); // for debugging purposes
AddToMessageLog(EVENTLOG_ERROR_TYPE,szTemp); // for debugging purposes (displays an integer value before failing)
AddToMessageLog(EVENTLOG_ERROR_TYPE,TEXT("Marker A")); // for debugging purposes
char *dn = new char[nDn];
// !!!!!!!!!!!! all goes wrong here, doesn't get to next line, nDn does have a value when it fails (61)
AddToMessageLog(EVENTLOG_ERROR_TYPE,TEXT("Marker B")); // for debugging purposes
wcstombs_s(pnNumCharConverted,dn,nDn+1,dnPtstr,nDn+1);
...more code here...
delete[] dn;
delete pnNumCharConverted;
return result
}
At first i thought it was a memory allocation problem or something as it fails on the line char *dn = new char[nDn];, the last marker showing as 'Marker A'. I used delete[] on the pointer further down to no avail. I know that nDn is a value because I print this out to a message log using _itow_s for debugging. I also know that dnPtrstr is a PTSTR.
I tried using malloc as well with free() in the old C style but this doesn't improve things.
I tried sanitizing your code a bit. One of the big tricks to C++ is to not explicitly use memory management when it can be avoided. Use vectors instead of raw arrays. Strings instead of char pointers.
And don't unnecessarily allocate objects dynamically. Put them on the stack, where they're automatically freed.
And, as in every other language, initialize your variables.
PTSTR getDomainFromDN(PTSTR dnPtstr) {
std::wstring someUnknownString = dnPtstr;
size_t numCharConverted = 0;
std::wstring temp; // for debugging purposes
std::ostringstream sstr;
sstr << temp;
AddToMessageLog(EVENTLOG_ERROR_TYPE,sstr.str().c_str()); // for debugging purposes (displays an integer value before failing)
AddToMessageLog(EVENTLOG_ERROR_TYPE,TEXT("Marker A")); // for debugging purposes
std::vector<char> dn(someUnknownString.size());
AddToMessageLog(EVENTLOG_ERROR_TYPE,TEXT("Marker B")); // for debugging purposes
wcstombs_s(&numCharConverted, &dn[0], dn.size(), someUnknownString.c_str(), dn.size());
...more code here...
return result
}
This might not have solved your problem, but it has eliminated a large number of potential errors.
Given that I can't reproduce your problem from the code you've supplied, this is really the best I can do.
Now, if you could come up with sane names instead of dnPtstr and dn, it might actually be nearly readable. ;)
i think your problem is this line:
wcstombs_s(pnNumCharConverted,dn,nDn+1,dnPtstr,nDn+1);
because you are telling wcstombs_s to copy up to nDn+1 characters into dn which is only nDn characters long.
try changing the line to:
wcstombs_s(pnNumCharConverted,dn,nDn,dnPtstr,nDn);
or perhaps better yet:
wcstombs_s(pnNumCharConverted,dn,nDn,dnPtstr,_TRUNCATE);
im not sure how you are debugging this or how AddToMessageLog is implemented, but if you are just inspecting the log to trace the code, and AddToMessageLog is buffering your logging, then perhaps the error occurs before that buffer is flushed.
If you are sure that "char *dn = new char[nDn];" is failing, TRY "set_new_handler" -> http://msdn.microsoft.com/en-us/library/5fath9te(VS.80).aspx
On a side note, few things:
The very first operation "size_t nDn=wcslen(dnPtstr);" is not 100% correct. You are doing wcslen on dnPtstr assuming dnPtstr to be unicode. However, this is not the case since it could be PWSTR or PSTR based on whether UNICODE is defined or not. So, use _tcslen(). Its better if you give some time to understand UNICODE, NON-UNICODE stuff since they would help you a lot in Windows C++ development.
Why are you using so many "new" if you are using these variables only in this function (I am assuming it). Prefer stack over heap for local variables unles you have a definite requirement.

calling code stored in the heap from vc++

Imagine I am doing something like this:
void *p = malloc (1000);
*((char*)p) = some_opcode;
*((char*)p+1) = another_opcode; // for the sake of the example: the opcodes are ok
....
etc...
How can I define a function pointer to call p as if it was a function? (i'm using VC++ 2008 express).
Thanks
A comment wasn't enough space. Joe_Muc is correct. You should not stuff code into memory obtained by malloc or new. You will run into problems if you change the page properties of pages that Windows allocates.
This isn't a problem becuase using VirtualAlloc() and the related WIn32 APIs is every easy: call VirtualAlloc() and set the flProtect to [PAGE_EXECUTE_READWRITE][2]
Note, you should probably do three allocations, one guard page, the pages you need for your code, then another guard page. This will give you a little protection from bad code.
Also wrap calls to your generated code with structured exception handling.
Next, the Windows X86 ABI (calling conventions) are not well documented (I know, I've looked). There is some info here, here, here The best way to see how things work is to look at code generated by the compiler. This is easy to do with the \FA switches ( there are four of them).
You can find the 64-bit calling conventions here.
Also, you can still obtain Microsoft's Macro Assembler MASM here. I recommend writing your machine code in MASM and look at its output, then have your machine code generator do similar things.
Intel's and AMD's processor manuals are good references - get them if you don't have them.
Actually, malloc probably won't cut it. On Windows you probably need to call something like [VirtualAlloc](http://msdn.microsoft.com/en-us/library/aa366887(VS.85).aspx) in order to get an executable page of memory.
Starting small:
void main(void)
{
char* p = (char*)VirtualAlloc(NULL, 4096, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
p[0] = (char)0xC3; // ret
typedef void (*functype)();
functype func = (functype)p;
(*func)();
}
The next step for playing nice with your code is to preserve the EBP register. This is left as an exercise. :-)
After writing this, I ran it with malloc and it also worked. That may be because I'm running an admin account on Windows 2000 Server. Other versions of Windows may actually need the VirtualAlloc call. Who knows.
If you have the right opcodes in place, calling can be as simple as casting to a function pointer and calling it.
typedef void (*voidFunc)();
char *p = malloc (1000);
p[0] = some_opcode;
p[1] = another_opcode; // for the sake of the example: the opcodes are ok
p[n] = // return opcode...
((voidFunc)p)();
Note though that unless you mark the page as executable, your processor may not let you execute code generated on the heap.
I'm also currently looking into executing generated code, and while the answers here didn't give me precisely what I needed, you guys sent me on the right track.
If you need to mark a page as executable on POSIX systems (Linux, BSD etc.), check out the mmap(2) function.