Windows heap manager and heap segments

Windows heap manager and heap segments - c++

I found the following sentence in a book :
Whenever the heap manager runs out of committed space in the heap segment, it
explicitly commits more memory and divides the newly committed space into blocks
as more and more allocations are requested
Does this mean when a block is allocated in the segment the virtual memory used by the user and the metadata isn't considered committed anymore ?

This is from the advanced windows debugging book I take it, not sure what you mean as you get kind of vague towards the end, however what it basically means is as follows:
When you allocate heap space the contents of the heap are not necessarily pre-determined, so you can use that allocated space as you see fit: for example, I allocate 1 megabyte of heap memory, and I then decide to populate that space with only 512k or data, that would mean I have committed half of my allocated heap, leaving a further 512k free. That memory will still show as being utilised to the OS because I have explicitly set the heap allocation to 1024k, however next time I use that same space I could use more or less than the 512k I utilised last time, up to the amount I have allocated for use. The amount you use at a given point is the commit, the amount you have set aside is the allocation.
This is all much much simplified, and I would recommend reading such sources as:
stack-memory-vs-heap-memory from here
the-stack-and-the-heap from learn CPP
Memory_Stack_vs_Heap from CBootCamp
As good sources to get you started on memory and its usage in C++.
If there is anything specific or more detail you can think of (your question is a bit unclear) then let me know and I will get back to you as soon as possible.

No. Allocated blocks are part of committed memory.

Related

delete[] does not free memory of array returned by function [duplicate]

I am observing the following behavior in my test program:
I am doing malloc() for 1 MB and then free() it after sleep(10). I am doing this five times. I am observing memory consumption in top while the program is running.
Once free()-d, I am expecting the program's virtual memory (VIRT) consumption to be down by 1 MB. But actually it isn't. It stays stable. What is the explanation for this behavior? Does malloc() do some reserve while allocating memory?

Once free()-d, I am expecting program's virtual memory (VIRT) consumption to be down by 1MB.
Well, this is not guaranteed by the C standard. It only says, once you free() the memory, you should not be accessing that any more.
Whether the memory block is actually returned to the available memory pool or kept aside for future allocations is decided by the memory manager.

The C standard doesn't force on the implementer of malloc and free to return the memory to the OS directly. So different C library implementations will behave differently. Some of them might give it back directly and some might not. In fact, the same implementation will also behave differently depending on the allocation sizes and patterns.
This behavior, of course, is for good reasons:
It is not always possible. OS-level memory allocations usually are done in pages (4KB, 4MB, or ... sizes at once). And if a small part of the page is still being used after freeing another part then the page cannot be given back to the operating system until that part is also freed.
Efficiency. It is very likely that an application will ask for memory again. So why give it back to the OS and ask for it again soon after. (of course, there is probably a limit on the size of the memory kept.)
In most cases, you are not accountable for the memory you free if the implementation decided to keep it (assuming it is a good implementation). Sooner or later it will be reallocated or returned to the OS. Hence, optimizing for memory usage should be based on the amount you have malloc-ed and you haven't free-d. The case where you have to worry about this, is when your allocation patterns/sizes start causing memory fragmentation which is a very big topic on its own.
If you are, however, on an embedded system and the amount of memory available is limited and you need more control over when/how memory is allocated and freed then you need to ask for memory pages from the OS directly and manage it manually.
Edit: I did not explain why you are not accountable for memory you free.
The reason is, on a modern OS, allocated memory is virtual. Meaning if you allocate 512MB on 32-bit system or 10TB of 64-bit system, as long as you don't read or write to that memory, it will not reserve any physical space for it. Actually, it will only reserve physical memory for the pages you touch from that big block and not the entire block. And after "a while of not using that memory", its contents will be copied to disk and the underlying physical memory will be used for something else.

This is very dependent on the actual malloc implementation in use.
Under Linux, there is a threshold (MMAP_THRESHOLD) to decide where the memory for a given malloc() request comes from.
If the requested amount is below or equal to MMAP_THRESHOLD, the request is satisfied by either taking it from the so-called "free list", if any memory blocks have already been free()d. Otherwise, the "break line" of the program (i. e. the end of the data segment) is increased and the memory made available to the program by this process is used for the request.
On free(), the freed memory block is added to the free list. If there is enough free memory at the very end of the data segment, the break line (mentionned above) is moved again to shrink the data segment, returning the excess memory to the OS.
If the requested amount exceeds MMAP_THRESHOLD, a separate memory block is requested by the OS and returned again during free().
See also https://linux.die.net/man/3/malloc for details.

How C++ heap managers track the size of allocated objects?

I want to estimate memory consumption in C++ when I allocate objects in the heap. I can start my calculation with sizeof(object) and round it up to the nearest multiple of the heap block (typically 8 bytes). But if the whole allocated block goes to the allocated object, how can the heap manager tell the size of the object when I ask it to delete the pointer?
If the heap manager tracks the size of each object, does it mean that I should add ~4 bytes per allocated object in my calculation to the total memory consumption for heap manager internal expenses? Or does it store this information in much more compact form? What are extra costs (memory-wise) of heap memory allocation?
I understand that my question is very implementation specific, but I appreciate any hints about the heap metadata storage of major implementations such as gcc (or maybe it is about libc).

Heap allocators are not free. There is the cost per block of allocation (both in size and perhaps search if using a find best algorithm), the cost of joining blocks upon free, and any lost size per block when the size requested is smaller than the block size returned. There is also a cost in terms of memory fragmentation. Consider placing a small 1 byte allocation in the middle of your heap. At this point, you can no longer return a contiguous block larger than 1/2 the heap - half your heap is fragmented. Good allocators fight all the above and try to maximize all benefits.
Consider the following allocation system (used in many real world applications over a decade on numerous handheld game devices.)
Create a primary heap where each allocation has a prev ptr, next ptr, size, and possibly other info. Round it to 16 bytes per entry. Store this info before or after the actual memory pointer being returned - your choice as there are advantages to each. Yes, you are allocating requested size + 16 bytes here.
Now just keep a pointer to a free list and possibly a used list.
Allocation is done by finding a block on the free list big enough for our use and splitting it into the size requested and the remainder (first fit), or by searching the whole list for as exact a match as possible (best fit). Simple enough.
Freeing is moving the current item back into the free list, joining areas adjacent to each other if possible. You can see how this can go to O(n).
For smaller allocations, get a single allocation (either from newly created heap, or from global memory) that will be your unit allocation zone. Split this area up into "block size" chunk addresses and push these addresses onto a free stack. Allocation is popping an address from this list. Freeing is just pushing the allocation back to the list - both O(1).
Then inside your malloc/new/etc, check size if it is within the unit size, allocate from the unit allocator, otherwise use the O(n) allocator. My studies have shown that you can get 90-95% of allocations to fit within the unit allocator's block size without too much issue.
Also, you can allocate slabs of memory for memory pools and leave them allocated while using them over and over. A few larger allocations are much cheaper to manage (Unix systems use this alot...)
Advantages:
All unit allocations have no external fragmentation.
Small allocations run constant time.
Larger allocation can be tailored to the exact size of the data requested.
Disadvantages:
You pay an internal fragmentation cost when you want memory smaller than the block requsted.
Many, many allocations and frees outside your small block size will eventually fragment your memory. This can lead to all sorts of unpleasantness. This can be managed with OS help or with care on alloc/free order.
Once you get 1000s of large allocations going, there is a CPU price. Slabs / multiple heaps can be used to fix this as well.
There are many, many schemes out there, but this one is simple and has been used in commercial apps for ages though, so I wanted to start here.

Why is memory not reusable after allocating/deallocating a number of small objects?

While investigating a memory link in one of our projects, I've run into a strange issue. Somehow, the memory allocated for objects (vector of shared_ptr to object, see below) is not fully reclaimed when the parent container goes out of scope and can't be used except for small objects.
The minimal example: when the program starts, I can allocate a single continuous block of 1.5Gb without problem. After I use the memory somewhat (by creating and destructing an number of small objects), I can no longer do big block allocation.
Test program:
#include <iostream>
#include <memory>
#include <vector>
using namespace std;
class BigClass
{
private:
double a[10000];
};
void TestMemory() {
cout<< "Performing TestMemory"<<endl;
vector<shared_ptr<BigClass>> list;
for (int i = 0; i<10000; i++) {
shared_ptr<BigClass> p(new BigClass());
list.push_back(p);
};
};
void TestBigBlock() {
cout<< "Performing TestBigBlock"<<endl;
char* bigBlock = new char [1024*1024*1536];
delete[] bigBlock;
}
int main() {
TestBigBlock();
TestMemory();
TestBigBlock();
}
Problem also repeats if using plain pointers with new/delete or malloc/free in cycle, instead of shared_ptr.
The culprit seems to be that after TestMemory(), the application's virtual memory stays at 827125760 (regardless of number of times I call it). As a consequence, there's no free VM regrion big enough to hold 1.5 GB. But I'm not sure why - since I'm definitely freeing the memory I used. Is it some "performance optimization" CRT does to minimize OS calls?
Environment is Windows 7 x64 + VS2012 + 32-bit app without LAA

Sorry for posting yet another answer since I am unable to comment; I believe many of the others are quite close to the answer really :-)
Anyway, the culprit is most likely address space fragmentation. I gather you are using Visual C++ on Windows.
The C / C++ runtime memory allocator (invoked by malloc or new) uses the Windows heap to allocate memory. The Windows heap manager has an optimization in which it will hold on to blocks under a certain size limit, in order to be able to reuse them if the application requests a block of similar size later. For larger blocks (I can't remember the exact value, but I guess it's around a megabyte) it will use VirtualAlloc outright.
Other long-running 32-bit applications with a pattern of many small allocations have this problem too; the one that made me aware of the issue is MATLAB - I was using the 'cell array' feature to basically allocate millions of 300-400 byte blocks, causing exactly this issue of address space fragmentation even after freeing them.
A workaround is to use the Windows heap functions (HeapCreate() etc.) to create a private heap, allocate your memory through that (passing a custom C++ allocator to your container classes as needed), and then destroy that heap when you want the memory back - This also has the happy side-effect of being very fast vs delete()ing a zillion blocks in a loop..
Re. "what is remaining in memory" to cause the issue in the first place: Nothing is remaining 'in memory' per se, it's more a case of the freed blocks being marked as free but not coalesced. The heap manager has a table/map of the address space, and it won't allow you to allocate anything which would force it to consolidate the free space into one contiguous block (presumably a performance heuristic).

There is absolutely no memory leak in your C++ program. The real culprit is memory fragmentation.
Just to be sure(regarding memory leak point), I ran this program on Valgrind, and it did not give any memory leak information in the report.
//Valgrind Report
mantosh#mantosh4u:~/practice$ valgrind ./basic
==3227== HEAP SUMMARY:
==3227== in use at exit: 0 bytes in 0 blocks
==3227== total heap usage: 20,017 allocs, 20,017 frees, 4,021,989,744 bytes allocated
==3227==
==3227== All heap blocks were freed -- no leaks are possible
Please find my response to your query/doubt asked in original question.
The culprit seems to be that after TestMemory(), the application's
virtual memory stays at 827125760 (regardless of number of times I
call it).
Yes, real culprit is hidden fragmentation done during the TestMemory() function.Just to understand the fragmentation, I have taken the snippet from wikipedia
"
when free memory is separated into small blocks and is interspersed by allocated memory. It is a weakness of certain storage allocation algorithms, when they fail to order memory used by programs efficiently. The result is that, although free storage is available, it is effectively unusable because it is divided into pieces that are too small individually to satisfy the demands of the application.
For example, consider a situation wherein a program allocates 3 continuous blocks of memory and then frees the middle block. The memory allocator can use this free block of memory for future allocations. However, it cannot use this block if the memory to be allocated is larger in size than this free block."
The above explains paragraph explains very nicely about memory fragmentation.Some allocation patterns(such as frequent allocation and deal location) would lead to memory fragmentation,but its end impact(.i.e. memory allocation 1.5GBgets failed) would greatly vary on different system as different OS/heap manager has different strategy and implementation.
As an example, your program ran perfectly fine on my machine(Linux) however you have encountered the memory allocation failure.
Regarding your observation on VM size remains constant: VM size seen in task manager is not directly proportional to our memory allocation calls. It mainly depends on the how much bytes is in committed state. When you allocate some dynamic memory(using new/malloc) and you do not write/initialize anything in those memory regions, it would not go committed state and hence VM size would not get impacted due to this. VM size depends on many other factors and bit complicated so we should not rely completely on this while understanding about dynamic memory allocation of our program.
As a consequence, there's no free VM regrion big enough to hold 1.5
GB.
Yes, due to fragmentation, there is no contiguous 1.5GB memory. It should be noted that total remaining(free) memory would be more than 1.5GB but not in fragmented state. Hence there is not big contiguous memory.
But I'm not sure why - since I'm definitely freeing the memory I used.
Is it some "performance optimization" CRT does to minimize OS calls?
I have explained about why it may happen even though you have freed all your memory. Now in order to fulfil user program request, OS will call to its virtual memory manager and try to allocate the memory which would be used by heap memory manager. But grabbing the additional memory does depend on many other complex factor which is not very easy to understand.
Possible Resolution of Memory Fragmentation
We should try to reuse the memory allocation rather than frequent memory allocation/free. There could be some patterns(like a particular request size allocation in particular order) which may lead overall memory into fragmented state. There could be substantial design change in your program in order to improve memory fragmentation. This is complex topic and require internal understanding of memory manager to understand the complete root cause of such things.
However there are tools exists on Windows based system which I am not much aware. But I found one excellent SO post regarding the which tool(on windows) can be useful to understand and check the fragmentation status of your program by yourself.
https://stackoverflow.com/a/1684521/2724703

This is not memory leak. The memory U used was allocated by C\C++ Runtime. The Runtime apply a a bulk of memory from OS once and then each new you called will allocated from that bulk memory. when delete one object, the Runtime not return memory to OS immediately, it may hold that memory for performance.

There is nothing here which indicates a genuine "leak". The pattern of memory you describe is not unexpected. Here are a few points which might help to understand. What happens is highly OS dependent.
A program often has a single heap which can be extended or shrunk in length. It is however one contiguous memory area, so changing the size is just changing where the end of the heap is. This makes it very difficult to ever "return" memory to the OS, since even one little tiny object in that space will prevent its shrinking. On Linux you can lookup the function 'brk' (I know you're on Windows, but I presume it does something similar).
Large allocations are often done with a different strategy. Rather than putting them in the general purpose heap, an extra block of memory is created. When it is deleted this memory can actually be "returned" to the OS since its guaranteed nothing is using it.
Large blocks of unused memory don't tend to consume a lot of resources. If you generally aren't using the memory any more they might just get paged to disk. Don't presume that because some API function says you're using memory that you are actually consuming significant resources.
APIs don't always report what you think. Due to a variety of optimizations and strategies it may not actually be possible to determine how much memory is in use and/or available on a system at a particular moment. Unless you have intimate details of the OS you won't know for sure what those values mean.
The first two points can explain why a bunch of small blocks and one large block result in different memory patterns. The latter points indicate why this approach to detecting leaks is not useful. To detect genuine object-based "leaks" you generally need a dedicated profiling tool which tracks allocations.
For example, in the code provided:
TestBigBlock allocates and deletes array, assume this uses a special memory block, so memory is returned to OS
TestMemory extends the heap for all the small objects, and never returns any heap to the OS. Here the heap is entirely available from the applications point-of-view, but from the OS's point of view it is assigned to the application.
TestBigBlock now fails, since although it would use a special memory block, it shares the overall memory space with heap, and there just isn't enough left after 2 is complete.

C++: Does this look like memory fragmentation?

SUMMARY:
I have an application which consumes way more memory that it should (roughly about 250% of the expected amount) but I can't seem to find any memory leaks. Calling the same function (which does a lot of allocations) will keep increasing memory usage to some point and then it will not change and stay there.
PROGRAM DETAILS:
The application uses a quadtree data structure to store 'Points'. It is possible to specify the maximum number of points to be stored in memory (cache size). The 'Points' are stored in 'PointBuckets' (arrays of points linked to the leaf nodes of the quadtree) which, if the maximum total number of points in the quadtree is reached, are serialized and saved to temporary files, to be retrieved when needed. This all seems to work fine.
Now when a file is loaded a new Quadtree is created and the old one is deleted if it exists, then points are read from the file and inserted into the quadtree one by one. A lot of memory allocations take place as buckets are being created and deleted during node splitting etc.
SYMPTOMS:
If I load a file that is expected to use 300MB of memory once, I get the expected amount of memory consumed. All good. If I keep loading the same file over and over again the memory usage keeps growing (I'm looking at the RES column in top, Linux) till about 700MB. That could indicate a memory leak. However if I then keep loading the files still, memory consumption just stays at 700MB.
Another thing: When I use valgrind massif and look at the memory usage it always stays within expected limit. For example if I specify cache size to be 1.5 GB and run my program alone, it will eventually consume 4GB of memory. If I run it in massif, it will stay below 2GB for all the time and then in the produced graphs I'll be able to see that it in fact never allocated more then the expected 1.5GB. My naive assumption is that this happens because massif uses a custom memory pool which somehow prevents fragmentation.
So what do you think is going on here? What kind of solution should I look for to solve this issue, if it is memory fragmentation?

I'd put it more at simple allocator and OS caching behaviours. They retain memory you allocated instead of freeing it so that it can be returned to you in a more prompt fashion the next time you request it. However, 250% does sound like a lot for this kind of effect- you could be looking at fragmentation problems.
Try swapping your allocator for a fragmentation-free allocator like object pool or memory arena.

Can't track down source of huge memory use

I've been trying to track down a memory problem for a couple of days - my program is using around 3GB of memory, when it should be using around 200MB-300MB. Valgrind is actually reporting that it is using ~300MB at its peak, and is not reporting any memory leaks.
The program reads an input file, and stores every unique word in that file. It is multi-threaded, and I've been running it using 4 threads. My major sources of data are:
Constant-size array of wchar_t (4MB total)
Map between words and a list of associated values. This grows with the size of input. If there are 1,000,000 unique words in the input file, there will be 1,000,000 entries in the tree.
I am doing a huge number of allocations and deallocations (using new and delete) -- at least two per unique word. Is it possible that memory I free is not being reused for some reason, causing the program to keep acquiring more and more memory? It consistently grabs more as it continues to run.
In general, any ideas about where I should go from here?
Edit 1 (based on advice from Graham):
One path I'll try is minimizing allocation. I'll work with a single string per thread (which may grow occasionally if a word is longer than this string is), but if I remember my code correctly this will eliminate a huge number of new/delete calls. If all goes well I'll be left with: one-time allocation of input buffer, one-time allocation of string-per-thread (with some reallocs), two allocs per map entry (one for key, one for value).
Thanks!

It's likely to be heap fragmentation. Because you are allocating and releasing small blocks in such huge quantities, it's probable that there are loads of small free chunks which are too small to be reused by subsequent allocations. Since these chunks are effectively wasted, the process has to keep grabbing more and more memory from the system to honour new allocations.
You may be able to mitigate the effect by first reserving a sufficiently large default capacity in each string with string::reserve(), and then clearing strings to empty when you're finished with them (rather than deleting). Then, keep a list of empty strings to be reused instead of allocating new ones all the time.
EDIT: the above suggestion assumes the objects being allocated are std::strings. If they're not, then you can probably still apply the general technique of keeping old empty objects around for reuse.

Memory your program frees should be returned to the heap where it can be allocated again.
However, that does not mean it is freed back to the operating system. Often, the app will continue to "own" memory that has been allocated and freed.
Is this a Windows app? How are you allocating and freeing the memory? And how are you determining how much memory the app is using?

You should try wrapping the resource allocations into a class if you can. Call new in the constructor, and delete in the destructor. Try and take advantage of scope so memory management is done more automatically.
http://en.wikipedia.org/wiki/RAII

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js