Memory snapshot of a huge C/C++ project (Windows/Unix) - c++

I'm trying to take a snapshot of the memory used by a large application running on both Unix / Windows. My ultimate aim would be to have a kind of chart breaking down the memory used by which area of code.
The program is split into about 30 different projects most of which are either static libraries or dynamic dlls. Some of these are written in C, some C++ and others a mixture of the two. In total the code across all projects is about 600,000 lines.
With the heap I could try and overload every 'malloc/free' and 'new/delete' across all projects and track it that way, but that is quite daunting with an application this size.
Also, that wouldn't pick up all the static global data littered around the projects too.
Thanks for any help.

You could give valgrind a try.
Here is a quote about one of the tools:
Massif
Massif is a heap profiler. It performs detailed heap profiling by taking regular snapshots of a program's heap. It produces a graph showing heap usage over time, including information about which parts of the program are responsible for the most memory allocations. The graph is supplemented by a text or HTML file that includes more information for determining where the most memory is being allocated. Massif runs programs about 20x slower than normal.
It supports on Linux now, but if doing the analysis on Linux and applying the results to the Windows version works for you this might help you.

If you are working with ELF binaries you could check the object files "*.o" before linking with an elf analyzer and see how big are the static memory sections and the size the bss(non-initialized static data) will have once loaded.

Related

Troubleshoot C++ program memory usage issue

I am authoring a C++ program and find it consumes too much memory. I would like to know which part of the program consumes the most number of memory, ideally, I would like to know how much percentage of memory are consumed by what kind of C++ objects the program is using at a particular moment.
In Java, I know tools like Eclipse Memory Analyzer (https://www.eclipse.org/mat/) which could take a heap dump and show/visualize such memory usage, and I wonder if this can be done for a C++ program. For example, I expect to use a tool/approach letting me know a particular vector<shared_ptr<MyObject>> is holding 30% of the memory.
Note:
I develop the program mainly on macOS (compile using Apple Clang), so it will be better if the approach works on macOS. But I do deploy to Linux as well (compile using gcc) so approaches/tools on Linux is okay.
I tried using Apple's Intruments for such purpose, but so far I can only use it to find memory allocation issue. I have no idea how to figure out the memory consumption of the program at a particular moment (the memory consumption should be related with C++ objects in the program so that I can do some action to reduce it accordingly).
I don't find an easy way to visualize/summarize each part of my program's memory yet. So far, the best tool/approach that I found is Apple's Instruments (if you are on macOS).
By using Instruments, you can use Allocations profiling template. When using this profiling template, you can choose File ==> Recording Options ==> Check Discard events for freed memory option
And you will be able to figure out the un-free memory (aka. the data that are still in the memory) during allocation recording. If you have your program's debug symbol loaded, you can see which function leads to this result.
Although this doesn't address all the issues, it does help to identify part of the problem.

Find out where memory is consumed

I have this relatively large numerical application code that may run for a few days and eventually spit out some numbers. The whole thing is written in C++, making use of a bunch of 3rd-party libraries, and compiled using GCC 4.6. The code uses shared pointers throughout.
Unfortunately, over time, the memory consumption of the code increases until all of the (shared) memory is used up, then crashes. Algorithmically, the code shouldn't build up memory over time, so there'll be a bug somewhere.
I did run a small example through valgrind's leak checker which reports that all should be fine. My thought was that shared pointers might unintentionally be created someplace, preventing from unneeded data from being freed along the process (but this is just a guess).
At the end of the day, I'm running out of ideas how to debug such a thing.
Any ideas?
Since you already have the valgrind toolsuite available, I would advise you to run the massif tool.
Massif will track the memory allocation origins and the report will indicate you how many bytes each allocation site/function created. This will help you understand where that memory blow-up comes from.
GNU libstdc++ defaults to caching STL-related memory allocations, apparently for microbenchmark speed reasons. However, the actual effect tends to be quite negative for both speed and memory footprint when using allocators such as tcmalloc and jemalloc. tcmalloc disables this behavior by setting GLIBCPP_FORCE_NEW=1 and GLIBCXX_FORCE_NEW=1 in the environment (for libstdc++ versions 3.3 and 3.4, respectively), but I know of no other allocator that does so. Therefore it generally a good idea to set the appropriate environment variable when launching your application.
Even if you have no leaks, you could face memory fragmentation.
If you are on Linux, I suggest to try jemalloc allocator. It runs great on Linux. It runs on many architectures, I used it successfully even on zLinux (on IBM zSeries mainframe). It's really easy to use - you don't even need to rebuild your application or any libraries, just build jemalloc and start your application with LD_PRELOAD set like this: LD_PRELOAD=/usr/lib/libjemalloc.so <app>

Memory footprint profiling

Suppose I have a program written in C/C++ and I'd like to find out how much of memory was used for data (heap, stack) and how much of memory was used for code (libs, executable files, etc).
I have once measured the dynamic memory space used using 'valgrind' but I don't think is has a feature to profile memory footprint for data and code.
Platform : Mac (possibly Linux)
Your development environment should have some sort of linker options. Generally in such you can instruct it to create a link map. The information you are looking for is likely to be in the link map, or calculable based on the information in the link map.

How to profile memory usage?

I am aware of Valgrind, but it just detects memory management issues. What I am searching is a tool that gives me an overview, which parts of my program do consume how much memory. A graphical representation with e.g. a tree map (as KCachegrind does for Callgrind) would be cool.
I am working on a Linux machine, so windows tools will not help me very much.
Use massif, which is part of the Valgrind tools. massif-visualizer can help you graph the data or you can just use the ms_print command.
Try out the heap profiler delivered with gperftools, by Google. I've always built it from sources, but it's available as a precompiled package under several Linux distros.
It's as simple to use as linking a dynamic library to your executables and running the program. It collects information about every dynamic memory allocation (as far as I've seen) and save to disk a memory dump every time one of the following happens:
HEAP_PROFILE_ALLOCATION_INTERVAL bytes have been allocated by the program (default: 1Gb)
the high-water memory usage mark increases by HEAP_PROFILE_INUSE_INTERVAL bytes (default: 100Mb)
HEAP_PROFILE_TIME_INTERVAL seconds have elapsed (default: inactive)
You explicitly call HeapProfilerDump() from your code
The last one, in my experience, is the most useful because you can control exactly when to have a snapshot of the heap usage and then compare two different snapshots and see what's wrong.
Eventually, there are several possible output formats, like textual or graphical (in the form of a directed graph):
Using this tool I've been able to spot incorrect memory usages that I couldn't find using Massif.
A "newer" option is HeapTrack. Contrary to massif, it is an instrumented version of malloc/free that stores all the calls and dumps a log.
The GUI is nice (but requires Qt5 IIRC) and the results timings (because you may want to track time as well) are less biased than valgrind (as they are not emulated).
Use callgrind option with valgrind

How can I categorize the memory usage of a NON-.NET application/DLL?

I have a 32-bit Visual Studio 8.0 C++ Windows DLL (non-.NET) that appears to be taking up more memory than I would expect. I want to determine exactly where the memory is going, not just a single figure of the total memory used (not interested in Task Manager or Resource Monitor's memory usage values). Back in 16-bit days HeapWalker was very helpful and you could even select a BITMAP handle and view it's graphic contents. I'm trying to remember how to read a .MAP file and add up the various sections but there is very little documentation and I'm not sure how accurate this technique is. Anybody have any advice?
If you need to find the size of various sections of the DLL you can use dumpbin.exe. It is a command line tool for inspecting DLLs and executables. Be sure to run vcvars32.bat before trying to run it.
To look at the actual memory consumption of your DLL, I would suggest starting with umdh.exe. It ships as part of windbg from Microsoft. As long as you build your files with a pdb, it will will be able to resolve symbols in your application. You can then take a few snap shots of the memory to look for leaks. You can also do a complete dump of all allocations to see where memory is being allocated and how much is being allocated.