How to iterate all malloc chunks (glibc) - c++

I'm trying to iterate all the malloc_chunk in all arenas. (debugging based on core file, for memory leak and memory corruption investigation)
As i know each arena have top_chunk which point to the top chunk inside of one arena, based on top_chunk, inside of it, there's prev_size and size, based on the code (glibc/malloc/malloc.c):
I can get the previous continuous chunks, and then loop all the chunks in one arena. (i can statistic the chunks with the size and the number, which like WinDBG: !heap -stat -h) and also based on prev_size and size, i can check the chunk is corrupt or not.
In arena(malloc_state), there's a member variable: next which point to next arena. Then i can loop all the arena's chunks.
But i met a problem is if the chunk is not allocated, the prev_size is invalid, how to get the previous malloc_chunk?? Or this way is not correct.
Question Background:
The memory leak bug we have is memory leak reported in several online data node(our project is distributed storage cluster).
What we did and result:
We use valrgind to reproduce the bug in test cluster, but unfortunately we get nothing.
I tried to investigate more about the heap, tried to analyze the heap chunk and follow the way which i did before in WinDBG(which have very interesting heap commands to digger the memory leak and memory corruption), but i was blocked by the Question which i asked.
We use valgrind-massif to analyze the allocation(which i think it's very detail and interesting, could show which allocation takes how much memory). Massif show several clues, we follow this and check code, finally found the leak(a map is very huge, and in-proper usage of it, but i would erase in holder-class's destructor, that's why valgrind not report this).
I'll digger more about the gdb-heap source code to know more about glic malloc structure.

The free open source program https://github.com/vmware/chap does what you want here for glibc malloc. Just grab a core (either because the core crashed or grab a lib core by using gcore or using the generate command from within gdb). Then just open the core by doing:
chap yourCoreFileName
Once you get to the chap prompt, if you want to iterate through all the chunks, both free and not, you can do any of the following, depending on the verbosity you want, but keeping in mind that an "allocation" in chap does not contain the chunk header, but rather starts at the address returned by malloc.
Try any of the following:
count allocations
summarize allocations
describe allocations
show allocations
If you only care about allocations that are currently in use try any of the following:
count used
summarize used
describe used
show used
If you only care about allocations that are leaked try any of the following:
count leaked
summarize leaked
describe leaked
show leaked
More details are available in documentation available from the github URL mentioned above.
In terms of corruption, chap does some checking at startup and reports many kinds of corruption, although the output may be a bit cryptic at times.

First, before digging into the implementation details of malloc, your time may be better spent with a tool like valgrind or even run under the MALLOC_CHECK_ environment variable to let the internal heap consistency checking do the work for you.
But, since you asked....
glibc's malloc.c has some helpful comments about looking at the previous chunk.
Some particularly interesting ones are:
/* Note that we cannot even look at prev unless it is not inuse */
And:
If prev_inuse is set for any given chunk, then you CANNOT determine the size of the previous chunk, and might even get a memory addressing fault when trying to do so.
This is just a limitation of the malloc implimentation. When a previous chunk is in use, the footer that would store the size is used by the user-data of the allocation instead.
While it doesn't help your case, you can check whether a previous chunk is in use by following what the prev_inuse macro does.
#define PREV_INUSE 0x1
#define prev_inuse(p) ((p)->size & PREV_INUSE)
It checks the low-order bit of the current chunk's size. (All chunk sizes are divisible by 4 so the lower 2 bits can be used for status.) That would help you stop your iteration before going off into no-man's land.
Unfortunately, you'd still be terminating your loop early, before visiting every chunk.
If you really want to iterate over all chunks, I'd recommend that you start at malloc_state::top and follow the next_chunk until next_chunk points to the top.

Try pmap <PID> -XX command to trace down the memory usage from different aspects.

Related

find memory leaks when Valgrind shows nothing

I am looking for memory leaks in a C++ program on Linux with a heavy legacy background (multi-threaded, using libstdc++ containers). This program is a proxy server, an intermediary for requests from clients to servers.
Valgrind has detected a few ones that are now fixed, and shows nothing more.
But the RSS of the process (resident memory as shown by /proc//stat) still grows on a given repeated stimulus (around 9 bytes per iteration). This is not linear and grows by big steps, probably because the lib c++ containers does memory optimization and the RSS is measured by pages that have a size of 4096 bytes).
As Valgrind finds nothing, I may suspect some recursive calls that grow the stack or some unused and forgotten tables (eg: std::list, std::map, std::string, etc.) that keep growing.
The only methods I see for my search are:
Reading the code;
Reduce the scope by deactivating parts of the code;
But these are laborious and time consuming.
How could I improve my search? Are there tools for finding growing stacks or tables?
Any other idea about the cause of the leak (except dangling pointers, uncontrolled recursion, growing tables)?
Use https://github.com/vmware/chap :
To do this, gather a live core of your process (after running an hour's worth of iterations) then start chap with the path of that core as the only argument. From the chap prompt, try the following:
count used
count free
count leaked
count writable
Assuming reported for used is signicantly larger than the number reported for free, the next thing you want to check is the number for leaked. If that number is non-zero, you actually have leaks in the sense of memory that is no longer referenced. Follow USERGUIDE.md for some strategies for analyzing that.
If the number for leaked is 0 or not significant, but the number for used is you likely have some container growth. Use summarize used as the next step.

Memory leak that doesn't crash when OOM, or show up in massif/valgrind

I have an internal C++ application that will indefinitely grow--so much so that we've had to implement logic that actually kills it once the RSS reaches a certain peak size (2.0G) just to maintain some semblance of order. However, this has shown some strange behaviors.
First, I ran the application through Valgrind w/ memcheck, and fixed some random memory leaks here and there. However, the extent of these memory leaks were measured in the 10s of megabytes. This makes sense, as it could be that there's no actual memory leaking--it could just be poor memory management on the application side.
Next, I used Valgrind w/ massif to check to see where the memory is going, and this is where it gets strange. The peak snapshot is 161M--nowhere near the 1.9G+ peaks we see using the RSS field. The largest consumption is where I'd expect--in std::string--but this is not abnormal.
Finally, and this is the most puzzling--before we were aware of this memory leak, I actually was testing this service on AWS, and just for fun, set the number of workers to a high number on a CC2.8XL machine, 44 workers. That's 60.5G of RAM, and no swap. Fast forward a month: I go to look at the host--and low and behold, it's maxed out on RAM--BUT! The processes are still running fine, and are stuck at varying stages of memory usage--almost evenly distributed from 800M to 1.9G. Every once in a while dmesg prints out an Xen error about being unable to allocate memory, but other than that, the processes never die and continue to actively process (i.e., they're not "stuck").
Is there something I'm missing here? It's basically working, but for the life of me, I can't figure out why. What would be a good recommendation on what to look for next? Are there any tools that might help me figure it out?
Note that valgrind memcheck only discovers when you "abandon" memory. while(1) vec.push_back(n++); will fill all available memory but not report any leaks. By the sounds of things, you are collecting strings somewhere that take up a lot of space. I have also worked on code that uses a lot of memory but not really leaking it [it's all in various places that valgrind is happy is not a leak!]. Sometimes you can track it down by simply adding some markers to the memory allocations, or some such, to indicate WHERE you are allocating memory.
In std:: functions, there is typically an Allocator argument. If you implement several different pools of memory, you may find where you are allocating memory.
I have also seen cases where I think that the process is having it's memory fragmented, so there are lots of little free spaces in the heap - this can happen if, for example, you create a lot of strings by adding to the size of the string.
If it's a issue of fragmentation, run valgrind massif with the --pages-as-heap=yes option may confirm whether if it's fragmentation.

Heap size keep increasing until application crash (C++)

I would like to consult this problem
I have a program that I'm running that in the long run, its memory keep increasing up until all resource are exhausted and of course it crashes (takes several of days to reach critical size).
what I've done till now, is using Valgrind, found all the memory leaks and fixed them, but now I still have a small memory leak that is caused by heap increasing size, for this I used Valgrind massif tool.
the problem is that when I use massif, it cannot run for too long time, and it causes the application to crash after several hours.
I've tried to find the memory leak for an one hour run, the problem that the minimum threshold cannot be lowered from 1% of memory, and after 1 hour I can see memory increase but it still small compare to the rest of the application.
so I can see part that takes more memory, but I cannot see which parts
example in valgrind output file:
->03.11% (4,377,152B) in 28 places, all below massif's threshold (01.00%)
any thoughts?
Use google perftools.
You can link your program or even LD_PRELOAD the library in and it will profile your heap use generating snapshots, it won't take much of your performance out, when you see that the heap is already too big you can stop it and get a graph of where the memory is spent.
EDIT:
tutorial here
Example:
Have you used valgrind with -leak-check--full? What are you using that could use memory? Have you deleted every new?
Maybe you crashed because you are allocating a huge memory space at once (happened to me before) and valgrind can't see it sometime.
It is "strange" anyway, tell us the answer if you find it !

How does a memory leak improve performance

I'm building a large RTree (spatial index) full of nodes. It needs to be able to handle many queries AND updates. Objects are continuously being created and destroyed. The basic test I'm running is to see the performance of the tree as the number of objects in the tree increases. I insert from 100-20000 uniformly size, randomly located objects in increments of 100. Searching and updating are irrelevant to the issue I am currently faced with.
Now, when there is NO memory leak the "insert into tree" performance is everywhere. It goes anywhere from 10.5 seconds with ~15000 objects to 1.5 with ~18000. There is no pattern whatsoever.
When I deliberately add in a leak, as simple as putting in "new int;" I don't assign it to anything, that right there is a line to itself, the performance instantly falls onto a nice gentle curve sloping from 0 (roughly) seconds for 100 objects to 1.5 for the full 20k.
Very, very lost at this point. If you want source code I can include it but it's huuugggeeee and literally the only line that makes a difference is "new int;"
Thanks in advance!
-nick
I'm not sure how you came up with this new int test, but it's not very good way to fix things :) Run your code using a profiler and find out where the real delays are. Then concentrate on fixing the hot spots.
g++ has it built in - just compile with -pg
Without more information it's impossible to be sure.
However I wonder if this is to do with heap fragmentation. By creating a freeing many blocks of memory you'll likely be creating a whole load of small fragments of memory linked together.The memory manager needs to keep track of them all so it can allocate them again if needed.
Some memory managers when you free a block try to "merge" it with surrounding blocks of memory and on a highly fragmented heap this can be very slow as it tries to find the surrounding blocks. Not only this, but if you have limited physical memory it can "touch " many physical pages of memory as it follows the chain of memory blocks which can cause a whole load of extremely slow page faults which will be very variable in speed depending on exactly how much physical memory the OS decides to give that process.
By leaving some un-freed memory you will be changing this pattern of access which might make a large difference to the speed. You might for example be forcing the run time library to allocate new block of memory each time rather than having to track down a suitably sized existing block to reuse.
I have no evidence this is the case in your program, but I do know that memory fragmentation is often the causes of slow programs when a lot of new and free is performed.
The possible thing that is happening which explains this (a theory)
The compiler did not remove the empty new int
The new int is in one of the inner loops or somewhere in your recursive traversal wherein it gets executed the most amount of time
The overall RSS of the process increases and eventually the total memory being used by the process
There are page faults happening because of this
Because of the page-faults, the process becomes I/O bound instead of being CPU bound
End result, you see a drop in the throughput. It will help if you can mention the compiler being used and the options for the compiler that you are using to build the code.
I am taking a stab in the dark here but the problem could be the way the heap gets fragmented. You said that you are creating a destroying large numbers of objects. I will assume that the objects are all of different size.
When one allocates memory on the heap, a cell the size needed is broken off from the heap. When the memory is freed, the cell is added to a freelist. When one does a new alloc, the allocator walks the heap until a cell that is big enough is found. When doing large numbers of allocations, the free list can get rather long and walking the list can take a non-trivial amount of time.
Now an int is rather small. So when you do your new int, it may well eat up all the small heap cells on the free list and thus dramatically speed up larger allocations.
The chances are, however that you are allocating and freeing similar sized objects. If you use your own freelists, you will safe yourself many heap walks and may dramatically improve performance. This is exactly what the STL allocators do to improve performance.
Solution: Do not run from Visual Studio. Actually run the .exe file. Figured this out because that's what the profilers were doing and the numbers were magically dropping. Checked memory usage and version running (and giving me EXCEPTIONAL times) was not blowing up to excessively huge sizes.
Solution to why the hell Visual Studio does ridiculous crap like this: No clue.

Any useful suggestions to figure out where memory is being free'd in a Win32 process?

An application I am working with is exhibiting the following behaviour:
During a particular high-memory operation, the memory usage of the process under Task Manager (Mem Usage stat) reaches a peak of approximately 2.5GB (Note: A registry key has been set to allow this, as usually there is a maximum of 2GB for a process under 32-bit Windows)
After the operation is complete, the process size slowly starts decreasing at a rate of 1MB per second.
I am trying to figure out the easiest way to quickly determine who is freeing this memory, and where it is being free'd.
I am having trouble attaching a memory profiler to my code, and I don't particularly want to override the new/delete operators to track the allocations/deallocations (IOW, I want to do this without re-compiling my code).
Can anyone offer any useful suggestions of how I could do this via the Visual Studio debugger?
Update
I should also mention that it's a multi-threaded application, so pausing the application and analysing the call stack through the debugger is not the most desirable option. I considered freezing different threads one at a time to see if the memory stops reducing, but I'm fairly certain this will cause the application to crash.
Ahh! You're looking at the wrong counter!
Mem Usage doesn't tell you that memory is being freed. Only that the working set is being purged! This could mean some other application needs memory, or the VMM decided to mark some of your process's pages as Stand By for some other process to quickly use. It does not mean that VirtualFree, HeapFree or any other free function is being called.
Look at the commit size (VM Size, Private Bytes, etc).
But if you still want to know when memory is being decommitted or freed or what-have-you, then break on some free calls. E.g. (for Visual C++)
{,,kernel32.dll}HeapFree
or
{,,msvcr80.dll}free
etc.
Or just a regular function breakpoint on the above. Just make sure it resolves the address.
cdb/WinDbg let you do it via
bp kernel32!HeapFree
bp msvcrt!free
etc.
Names may vary depending on which CRT version you use and how you link against it (via /MT or /MD and its variants)
You might find this article useful:
http://www.gamasutra.com/view/feature/1430/monitoring_your_pcs_memory_usage_.php?print=1
basically what I had in mind was hooking the low level allocation functions.
A couple different ideas:
The C runtime has a set of memory debugging functions; you'd need to recompile though. You could get a snapshot at computation completion and later, and use _CrtMemDifference to see what changed.
Or, you can attach to the process in your debugger, and cause it to dump a core before and after the memory freeing. Using NTSD, you can see what heaps are around, and the sizes of things. (You'll need a lot of disk space, and a fair amount of patience.) There's a setting (I think you get it through gflags, but I don't remember) that causes it to save a piece of the call stack as part of the dump; using that you can figure out what kind of object is being deallocated. Unfortunately, it only stores 4 or 5 stack frames, so you'll likely have to do something more clever as the next step to figure out where it's being freed. Either look at the code ("oh yeah, there's only one place where that can happen") or put in breakpoints on those destructors, or add tracing to the allocations and deallocations.
If your memory manager wipes free'd data to a known value (usually something like 0xfeeefeee), you can set a data breakpoint on a particular instance of something you're interested in. When it gets free'd, the breakpoint will trigger when the memory gets wiped.
I recommend you to check UMDH tool that comes with Debugging Tools For Windows (You can find usage and samples in the debugging tools help). You can snap shot running process's heap allocations with stack trace and compare them.
You could try Memory Validator to monitor the allocations and deallocations. Memory Validator has a couple of features that will help you identify where data is being deallocated:
Hotspots view. This can show you a tree of all allocations and deallocations or just all allocations or just all deallocations. It presents the data as a percentage of memory activity (based on amount of memory (de)allocated at a given location).
Analysis view. You can perform queries asking for data in a given address range. You can restrict these queries to any of alloc, realloc, dealloc behaviours.
Objects view. You can view allocations by type and see the maximum number of objects of each type (plus lots of other stats). Right click on a type to get a context menu, choose show all deallocations - will show deallocation locations for that type on Analysis tab.
I think the Hotspots view may give you the insight you need.