Can't track down source of huge memory use

Can't track down source of huge memory use - c++

I've been trying to track down a memory problem for a couple of days - my program is using around 3GB of memory, when it should be using around 200MB-300MB. Valgrind is actually reporting that it is using ~300MB at its peak, and is not reporting any memory leaks.
The program reads an input file, and stores every unique word in that file. It is multi-threaded, and I've been running it using 4 threads. My major sources of data are:
Constant-size array of wchar_t (4MB total)
Map between words and a list of associated values. This grows with the size of input. If there are 1,000,000 unique words in the input file, there will be 1,000,000 entries in the tree.
I am doing a huge number of allocations and deallocations (using new and delete) -- at least two per unique word. Is it possible that memory I free is not being reused for some reason, causing the program to keep acquiring more and more memory? It consistently grabs more as it continues to run.
In general, any ideas about where I should go from here?
Edit 1 (based on advice from Graham):
One path I'll try is minimizing allocation. I'll work with a single string per thread (which may grow occasionally if a word is longer than this string is), but if I remember my code correctly this will eliminate a huge number of new/delete calls. If all goes well I'll be left with: one-time allocation of input buffer, one-time allocation of string-per-thread (with some reallocs), two allocs per map entry (one for key, one for value).
Thanks!

It's likely to be heap fragmentation. Because you are allocating and releasing small blocks in such huge quantities, it's probable that there are loads of small free chunks which are too small to be reused by subsequent allocations. Since these chunks are effectively wasted, the process has to keep grabbing more and more memory from the system to honour new allocations.
You may be able to mitigate the effect by first reserving a sufficiently large default capacity in each string with string::reserve(), and then clearing strings to empty when you're finished with them (rather than deleting). Then, keep a list of empty strings to be reused instead of allocating new ones all the time.
EDIT: the above suggestion assumes the objects being allocated are std::strings. If they're not, then you can probably still apply the general technique of keeping old empty objects around for reuse.

Memory your program frees should be returned to the heap where it can be allocated again.
However, that does not mean it is freed back to the operating system. Often, the app will continue to "own" memory that has been allocated and freed.
Is this a Windows app? How are you allocating and freeing the memory? And how are you determining how much memory the app is using?

You should try wrapping the resource allocations into a class if you can. Call new in the constructor, and delete in the destructor. Try and take advantage of scope so memory management is done more automatically.
http://en.wikipedia.org/wiki/RAII

Related

Linux: Why do the values in smaps increase continuously?

I sum up the values of mappings' of the current proceess. Repeated this over a period of time. Saved the result in a file and then I plotted it. What I find a little bit odd is, that the values for the different fields of smaps seems to be increasing more or less linearly. Also I allocated some memory using the new command in c++. I freed the memory, but no recognizable difference. I was accepting some up and down movement in the plot of the fields. Unfortunately, there were not any up and down movement.
Is this behaviour normal or did I perhaps do something wrong? But I am pretty sure, that my parser works, because I checked it with pmap. My parser and pmap return for the same process the same result.

Allocating memory from OS is pretty expensive therefore large blocks of heap are allocated at a time. new tries to find empty place on the pre-allocated heap and only when there is none, it allocates another block from the OS. Deallocating from this pre-allocated heap is also done with large blocks only. (You can check manual page “mallopt” on how to tune the behavior using environment. Note that all allocations need to be done in pages, each usually 4 KiB large.)
This goes for small memory allocations. Large allocations (by default, 128 KiB or more, again tuneable with mallopt) are done with anonymous mmap and are deallocated when freed.

Is there any benefit to use multiple heaps for memory management purposes?

I am a student of a system software faculty. Now I'm developing a memory manager for Windows. Here's my simple implementation of malloc() and free():
HANDLE heap = HeapCreate(0, 0, 0);
void* hmalloc(size_t size)
{
return HeapAlloc(heap, 0, size);
}
void hfree(void* memory)
{
HeapFree(heap, 0, memory);
}
int main()
{
int* ptr1 = (int*)hmalloc(100*sizeof(int));
int* ptr2 = (int*)hmalloc(100*sizeof(int));
int* ptr3 = (int*)hmalloc(100*sizeof(int));
hfree(ptr2);
hfree(ptr3);
hfree(ptr1);
return 0;
}
It works fine. But I can't understand is there a reason to use multiple heaps? Well, I can allocate memory in the heap and get the address to an allocated memory chunk. But here I use ONE heap. Is there a reason to use multiple heaps? Maybe for multi-threaded/multi-process applications? Please explain.

The main reason for using multiple heaps/custom allocators are for better memory control. Usually after lots of new/delete's the memory can get fragmented and loose performance for the application (also the app will consume more memory). Using the memory in a more controlled environment can reduce heap fragmentation.
Also another usage is for preventing memory leaks in the application, you could just free the entire heap you allocated and you don't need to bother with freeing all the object allocated there.
Another usage is for tightly allocated objects, if you have for example a list then you could allocate all the nodes in a smaller dedicated heap and the app will gain performance because there will be less cache misses when iterating the nodes.
Edit: memory management is however a hard topic and in some cases it is not done right. Andrei Alexandrescu had a talk at one point and he said that for some application replacing the custom allocator with the default one increased the performance of the application.

This is a good link that elaborates on why you may need multiple heap:
https://caligari.dartmouth.edu/doc/ibmcxx/en_US/doc/libref/concepts/cumemmng.htm
"Why Use Multiple Heaps?
Using a single runtime heap is fine for most programs. However, using multiple
heaps can be more efficient and can help you improve your program's performance
and reduce wasted memory for a number of reasons:
1- When you allocate from a single heap, you may end up with memory blocks on
different pages of memory. For example, you might have a linked list that
allocates memory each time you add a node to the list. If you allocate memory for
other data in between adding nodes, the memory blocks for the nodes could end up
on many different pages. To access the data in the list, the system may have to
swap many pages, which can significantly slow your program.
With multiple heaps, you can specify which heap you allocate from. For example,
you might create a heap specifically for the linked list. The list's memory blocks
and the data they contain would remain close together on fewer pages, reducing the
amount of swapping required.
2- In multithread applications, only one thread can access the heap at a time to
ensure memory is safely allocated and freed. For example, say thread 1 is
allocating memory, and thread 2 has a call to free. Thread 2 must wait until
thread 1 has finished its allocation before it can access the heap. Again, this
can slow down performance, especially if your program does a lot of memory
operations.
If you create a separate heap for each thread, you can allocate from them
concurrently, eliminating both the waiting period and the overhead required to
serialize access to the heap.
3- With a single heap, you must explicitly free each block that you allocate. If you
have a linked list that allocates memory for each node, you have to traverse the
entire list and free each block individually, which can take some time.
If you create a separate heap for that linked list, you can destroy it with a
single call and free all the memory at once.
4- When you have only one heap, all components share it (including the IBM C and
C++ Compilers runtime library, vendor libraries, and your own code). If one
component corrupts the heap, another component might fail. You may have trouble
discovering the cause of the problem and where the heap was damaged.
With multiple heaps, you can create a separate heap for each component, so if
one damages the heap (for example, by using a freed pointer), the others can
continue unaffected. You also know where to look to correct the problem."

A reason would be the scenario that you need to execute a program internally e.g. running simulation code. By creating your own heap you could allow that heap to have execution rights which by default for security reasons is turned off. (Windows)

You have some good thoughts and this'd work for C but in C++ you have destructors, it is VERY important they run.
You can think of all types as having constructors/destructors, just that logically "do nothing".
This is about allocators. See "The buddy algorithm" which uses powers of two to align and re-use stuff.
If I allocate 4 bytes somewhere, my allocator might allocate a 4kb section just for 4 byte allocations. That way I can fit 1024 4 byte things in the block, if I need more add another block and so forth.
Ask it for 4kb and it wont allocate that in the 4byte block, it might have a separate one for larger requests.
This means you can keep big things together. If I go 17 bytes then 13 bytes the 1 byte and the 13byte gets freed, I can only stick something in there of <=13 bytes.
Hence the buddy system and powers of 2, easy to do using lshifts, if I want a 2.5kb block, I allocate it as the smallest power of 2 that'll fit (4kb in this case) that way I can use the slot afterwards for <=4kb items.
This is not for garbage collection, this is just keeping things more compact and neat, using your own allocator can stop calls to the OS (depending on the default implementation of new and delete they might already do this for your compiler) and make new/delete very quick.
Heap-compacting is very different, you need a list of every pointer that points to your heap, or some way to traverse the entire memory graph (like spits Java) so when you move stuff round and "compact" it you can update everything that pointed to that thing to where it currently is.

The only time I ever used more than one heap was when I wrote a program that would build a complicated data structure. It would have been non-trivial to free the data structure by walking through it and freeing the individual nodes, but luckily for me the program only needed the data structure temporarily (while it performed a particular operation), so I used a separate heap for the data structure so that when I no longer needed it, I could free it with one call to HeapDestroy.

Windows heap manager and heap segments

I found the following sentence in a book :
Whenever the heap manager runs out of committed space in the heap segment, it
explicitly commits more memory and divides the newly committed space into blocks
as more and more allocations are requested
Does this mean when a block is allocated in the segment the virtual memory used by the user and the metadata isn't considered committed anymore ?

This is from the advanced windows debugging book I take it, not sure what you mean as you get kind of vague towards the end, however what it basically means is as follows:
When you allocate heap space the contents of the heap are not necessarily pre-determined, so you can use that allocated space as you see fit: for example, I allocate 1 megabyte of heap memory, and I then decide to populate that space with only 512k or data, that would mean I have committed half of my allocated heap, leaving a further 512k free. That memory will still show as being utilised to the OS because I have explicitly set the heap allocation to 1024k, however next time I use that same space I could use more or less than the 512k I utilised last time, up to the amount I have allocated for use. The amount you use at a given point is the commit, the amount you have set aside is the allocation.
This is all much much simplified, and I would recommend reading such sources as:
stack-memory-vs-heap-memory from here
the-stack-and-the-heap from learn CPP
Memory_Stack_vs_Heap from CBootCamp
As good sources to get you started on memory and its usage in C++.
If there is anything specific or more detail you can think of (your question is a bit unclear) then let me know and I will get back to you as soon as possible.

No. Allocated blocks are part of committed memory.

C++: Does this look like memory fragmentation?

SUMMARY:
I have an application which consumes way more memory that it should (roughly about 250% of the expected amount) but I can't seem to find any memory leaks. Calling the same function (which does a lot of allocations) will keep increasing memory usage to some point and then it will not change and stay there.
PROGRAM DETAILS:
The application uses a quadtree data structure to store 'Points'. It is possible to specify the maximum number of points to be stored in memory (cache size). The 'Points' are stored in 'PointBuckets' (arrays of points linked to the leaf nodes of the quadtree) which, if the maximum total number of points in the quadtree is reached, are serialized and saved to temporary files, to be retrieved when needed. This all seems to work fine.
Now when a file is loaded a new Quadtree is created and the old one is deleted if it exists, then points are read from the file and inserted into the quadtree one by one. A lot of memory allocations take place as buckets are being created and deleted during node splitting etc.
SYMPTOMS:
If I load a file that is expected to use 300MB of memory once, I get the expected amount of memory consumed. All good. If I keep loading the same file over and over again the memory usage keeps growing (I'm looking at the RES column in top, Linux) till about 700MB. That could indicate a memory leak. However if I then keep loading the files still, memory consumption just stays at 700MB.
Another thing: When I use valgrind massif and look at the memory usage it always stays within expected limit. For example if I specify cache size to be 1.5 GB and run my program alone, it will eventually consume 4GB of memory. If I run it in massif, it will stay below 2GB for all the time and then in the produced graphs I'll be able to see that it in fact never allocated more then the expected 1.5GB. My naive assumption is that this happens because massif uses a custom memory pool which somehow prevents fragmentation.
So what do you think is going on here? What kind of solution should I look for to solve this issue, if it is memory fragmentation?

I'd put it more at simple allocator and OS caching behaviours. They retain memory you allocated instead of freeing it so that it can be returned to you in a more prompt fashion the next time you request it. However, 250% does sound like a lot for this kind of effect- you could be looking at fragmentation problems.
Try swapping your allocator for a fragmentation-free allocator like object pool or memory arena.

Find huge blocks of allocated memory

I have a program (daemon) that is written in c/c++. It runs flawlessly, but after some period of time( it can be 5 days, week, 2 weeks ) it becomes to allocate a lot of megabytes of memory. I can't understand what parts of code do not free allocated memory. At startup memory usage is about 20-30 megabytes. Then after some period, or maybe event, it grows slowly about 1Mb per hour, and if not terminated can crash because no memory is available.
I've tried to use Valgrind and did shutdown the daemon in usual way when it has already allocated about 500Mb of memory. Shutdown process was really long, but when it finished Valgrind said no memory leaks were found, except for mysql_init/mysql_close procedures(about 504bytes are definetly lost). Google says not to worry about this Mysql leak, and gives some reasons why memory diagnostic tools like Valgrind think that it is a leak.
I don't really know what parts of code allocate memory but free it only on program shutdown. Help me to find out this

Valgrind only detects pointers that aren't deleted, more or less. Keeping them around when you don't need them is a different problem.
Firstly, all objects and memory are freed at shutdown. If there's a leak, valgrind will detect it as memory not referenced by an object, etc. Any leaks however are freed by the operating system in the end.
If you're catching all exceptions (...) and not doing anything with them, well, don't do that. It's a common cause.
Secondly, a logfile of destructors that are called during shutdown might be helpful. Perhaps at the end of main(), set a global flag; any destructors called while that flag is set can output that they exist. See if there are lots of objects that shouldn't be there.
A bit easier, you can use a global variable, each ctor can increment it by 1, and dtor decrement by 1. If you find that the number of objects isn't staying relatively the same, you can investigate which ones are making the problem using similar techniques.
Thirdly, use Boost and its scoped smart pointers to help, but do not rely on smart pointers as the holy grail.
There is a possible underlying issue that I have come across. For long-running programs, memory fragmentation can lead to large memory usage. You may delete a 1mb object, then try to create a 2mb object; the creation will be in new space because that 1mb 'free chunk' is not big enough. Then when you make a 512kb object it may go into that 1mb object's space, only using 1/2 of available space, but making it so that your next 1mb object needs to be allocated in big space.
Unfortunately this problem can become bad, due to small objects being allocated in persistent places. There may be, say, 50-byte classes 300kb apart in memory, and like 100 of them, but no 512kb objects can be allocated in that space, so it allocates an additional 512kb for each new object, effectively wasting 90% of actual 'free' space even though your program owns more than enough already.
This problem is hard to track down as the definite cause, but if you examine your program's flow, look for small allocations. Remember std::list/vector/etc. can all cause this; if you're looking to make a daemon that does lots of memory ops run for weeks, it's a good idea to pre-allocate memory using reserve(). Memory pools are even better.
Depending on the time you want to put in, you can also either make (or find) a custom memory allocator that will report on objects when it shuts down, too.

Try to use Valgrind Massif tool. From Massif manual:
Also, there are certain space leaks that aren't detected by
traditional leak-checkers, such as Memcheck's. That's because the
memory isn't ever actually lost -- a pointer remains to it -- but it's
not in use. Programs that have leaks like this can unnecessarily
increase the amount of memory they are using over time. Massif can
help identify these leaks.
Massif should show you what's happening with memory and where it is allocated and not freeing until shutdown.

Since you are sure, there's no memory leak, your program might be allocating memory and storing data without leaking.
For example, let's say your program uses a linked list...
struct list{
DATA_ARRAY arr; //Some data
struct *list next;
};
While(true) //infinite loop
{
// Add new nodes to list
// Store some data in the node
}
There's no leak here. But the loop adds new nodes forever and stores data and everything is perfectly valid. But memory usage increases all the time. Since you are running for 2-5 days, something like this is certainly possible.
You may have to inspect the code and free memory if no longer needed.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js