Is there any benefit to use multiple heaps for memory management purposes? - c++

I am a student of a system software faculty. Now I'm developing a memory manager for Windows. Here's my simple implementation of malloc() and free():
HANDLE heap = HeapCreate(0, 0, 0);
void* hmalloc(size_t size)
{
return HeapAlloc(heap, 0, size);
}
void hfree(void* memory)
{
HeapFree(heap, 0, memory);
}
int main()
{
int* ptr1 = (int*)hmalloc(100*sizeof(int));
int* ptr2 = (int*)hmalloc(100*sizeof(int));
int* ptr3 = (int*)hmalloc(100*sizeof(int));
hfree(ptr2);
hfree(ptr3);
hfree(ptr1);
return 0;
}
It works fine. But I can't understand is there a reason to use multiple heaps? Well, I can allocate memory in the heap and get the address to an allocated memory chunk. But here I use ONE heap. Is there a reason to use multiple heaps? Maybe for multi-threaded/multi-process applications? Please explain.

The main reason for using multiple heaps/custom allocators are for better memory control. Usually after lots of new/delete's the memory can get fragmented and loose performance for the application (also the app will consume more memory). Using the memory in a more controlled environment can reduce heap fragmentation.
Also another usage is for preventing memory leaks in the application, you could just free the entire heap you allocated and you don't need to bother with freeing all the object allocated there.
Another usage is for tightly allocated objects, if you have for example a list then you could allocate all the nodes in a smaller dedicated heap and the app will gain performance because there will be less cache misses when iterating the nodes.
Edit: memory management is however a hard topic and in some cases it is not done right. Andrei Alexandrescu had a talk at one point and he said that for some application replacing the custom allocator with the default one increased the performance of the application.

This is a good link that elaborates on why you may need multiple heap:
https://caligari.dartmouth.edu/doc/ibmcxx/en_US/doc/libref/concepts/cumemmng.htm
"Why Use Multiple Heaps?
Using a single runtime heap is fine for most programs. However, using multiple
heaps can be more efficient and can help you improve your program's performance
and reduce wasted memory for a number of reasons:
1- When you allocate from a single heap, you may end up with memory blocks on
different pages of memory. For example, you might have a linked list that
allocates memory each time you add a node to the list. If you allocate memory for
other data in between adding nodes, the memory blocks for the nodes could end up
on many different pages. To access the data in the list, the system may have to
swap many pages, which can significantly slow your program.
With multiple heaps, you can specify which heap you allocate from. For example,
you might create a heap specifically for the linked list. The list's memory blocks
and the data they contain would remain close together on fewer pages, reducing the
amount of swapping required.
2- In multithread applications, only one thread can access the heap at a time to
ensure memory is safely allocated and freed. For example, say thread 1 is
allocating memory, and thread 2 has a call to free. Thread 2 must wait until
thread 1 has finished its allocation before it can access the heap. Again, this
can slow down performance, especially if your program does a lot of memory
operations.
If you create a separate heap for each thread, you can allocate from them
concurrently, eliminating both the waiting period and the overhead required to
serialize access to the heap.
3- With a single heap, you must explicitly free each block that you allocate. If you
have a linked list that allocates memory for each node, you have to traverse the
entire list and free each block individually, which can take some time.
If you create a separate heap for that linked list, you can destroy it with a
single call and free all the memory at once.
4- When you have only one heap, all components share it (including the IBM C and
C++ Compilers runtime library, vendor libraries, and your own code). If one
component corrupts the heap, another component might fail. You may have trouble
discovering the cause of the problem and where the heap was damaged.
With multiple heaps, you can create a separate heap for each component, so if
one damages the heap (for example, by using a freed pointer), the others can
continue unaffected. You also know where to look to correct the problem."

A reason would be the scenario that you need to execute a program internally e.g. running simulation code. By creating your own heap you could allow that heap to have execution rights which by default for security reasons is turned off. (Windows)

You have some good thoughts and this'd work for C but in C++ you have destructors, it is VERY important they run.
You can think of all types as having constructors/destructors, just that logically "do nothing".
This is about allocators. See "The buddy algorithm" which uses powers of two to align and re-use stuff.
If I allocate 4 bytes somewhere, my allocator might allocate a 4kb section just for 4 byte allocations. That way I can fit 1024 4 byte things in the block, if I need more add another block and so forth.
Ask it for 4kb and it wont allocate that in the 4byte block, it might have a separate one for larger requests.
This means you can keep big things together. If I go 17 bytes then 13 bytes the 1 byte and the 13byte gets freed, I can only stick something in there of <=13 bytes.
Hence the buddy system and powers of 2, easy to do using lshifts, if I want a 2.5kb block, I allocate it as the smallest power of 2 that'll fit (4kb in this case) that way I can use the slot afterwards for <=4kb items.
This is not for garbage collection, this is just keeping things more compact and neat, using your own allocator can stop calls to the OS (depending on the default implementation of new and delete they might already do this for your compiler) and make new/delete very quick.
Heap-compacting is very different, you need a list of every pointer that points to your heap, or some way to traverse the entire memory graph (like spits Java) so when you move stuff round and "compact" it you can update everything that pointed to that thing to where it currently is.

The only time I ever used more than one heap was when I wrote a program that would build a complicated data structure. It would have been non-trivial to free the data structure by walking through it and freeing the individual nodes, but luckily for me the program only needed the data structure temporarily (while it performed a particular operation), so I used a separate heap for the data structure so that when I no longer needed it, I could free it with one call to HeapDestroy.

Related

Dynamic allocation store data in random location in the heap?

I know that local variables will be stored on the stack orderly.
but, when i dynamically allocate variable in the heap memory in c++ like this.
int * a = new int{1};
int * a2 = new int{2};
int * a3 = new int{3};
int * a4 = new int{4};
Question 1 : are these variable stored in contiguous memory location?
Question 2 : if not, is it because dynamic allocation store variables in random location in the heap memory?
Question3 : so does dynamic allocation increase possibility of cache miss and has low spatial locality?
Part 1: Are separate allocations contiguous?
The answer is probably not. How dynamic allocation occurs is implementation dependent. If you allocate memory like in the above example, two separate allocations might be contiguous, but there is no guarantee of this happening (and it should never be relied on to occur).
Different implementations of c++ use different algorithms for deciding how memory is allocated.
Part 2: Is allocation random?
Somewhat; but not entirely. Memory doesn’t get allocated in an intentionally random fashion. Oftentimes memory allocators will try to allocate blocks of memory near each other in order to minimize page faults and cache misses, but it’s not always possible to do so.
Allocation happens in two stages:
The allocator asks for a large chunk of memory from the OS
The takes pieces of that large chunk and returns them whenever you call new, until you ask for more memory than it has to give, in which case it asks for another large chunk from the OS.
This second stage is where an implementation can make attempts to give things you memory that’s near other recent allocations, however it has little control over the first stage (and the OS usually just provides whatever memory is available, without any knowledge of other allocations by your program).
Part 3: avoiding cache misses
If cache misses are a bottleneck in your code,
Try to reduce the amount of indirection (by having arrays store objects by value, rather than by pointer);
Ensure that the memory you’re operating on is as contiguous as the design permits (so use a std::array or std::vector, instead of a linked list, and prefer a few big allocations to lots of small ones); and
Try to design the algorithm so that it has to jump around in memory as little as possible.
A good general principle is to just use a std::vector of objects, unless you have a good reason to use something fancier. Because they have better cache locality, std::vector is faster at inserting and deleting elements than std::list, even up to dozens or even hundreds of elements.
Finally: try to take advantage of the stack. Unless there’s a good reason for something to be a pointer, just declare as a variable that lives on the stack. When possible,
Prefer to use MyClass x{}; instead of MyClass* x = new MyClass{};, and
Prefer std::vector<MyClass> instead of std::vector<MyClass*>.
By extension, if you can use static polymorphism (i.e, templates), use that instead of dynamic polymorphism.
IMHO this is Operating System specific / C++ standard library implementation.
new ultimately uses lower-level virtual memory allocation services and allocating several pages at once, using system calls like mmap and munmap. The implementation of new could reuse previously freed memory space when relevant.
The implementation of new could use various and different strategies for "large" and "small" allocations.
In the example you gave the first new results in a system call for memory allocation (usually several pages), the allocated memory could be large enough so that subsequent new calls results in contiguous allocation..But this depends on the implementation
In short:
not at all (there is padding due to alignment, heap housekeeping data, allocated chunks may be reused, etc.),
not at all (AFAIK, heap algorithms are deterministic without any randomness),
generally yes (e.g., memory pooling might help here).

The problems of basing an allocator on static memory

I've recently come up with an idea of pooling memory in a C++ program with a statically initialized array (ie. "static byte Memory[Size]"). The process of allocating blocks of memory would be similar to calling malloc up front for the pool. I then built this allocator and there seems to be no problems. Is there any issues with this allocator architecture other than limited memory space?
Note: The way I reserve blocks of memory is by creating a linked list of blocks that store size, pointer and neighbors. Not that it is important to the question.
TI's sysbios actually provides an implementation for their microcontrollers that looks similar to what you describe. They call this implementation heapbuf
The heapbuf is a single array that has been split up in n equally sized blocks. The heapbuf also holds the pointer to the first empty block, and every empty block holds the pointer to the next empty element.
This specific implementation is only able to allocate blocks of exactly that size, however has no fragmentation issues. If you have a lot of small objects of roughly the same size, it should offer great performance (as allocation is of constant cost and only requires you to update the pointer to the first empty block).
Your note however, is important to the question. And the issue with blocks of different sizes is as follows:
The first block might not fit, so you need to do a list traversal until you find a large enough open space.
After you have allocated all your memory in blocks of 1k, you decide to free them all and now you have a linked list of many free 1k blocks, have you decided on how to and when to merge them?
The developers of freeRTOS also ran into these issues and therefore provide multiple heaps that take different compromises (eg, fast, but no freeing at all, or a bit slower but no merging of freed blocks, or slow but merging of freed blocks). There is an overview of their implementations here: http://www.freertos.org/a00111.html

Why is memory not reusable after allocating/deallocating a number of small objects?

While investigating a memory link in one of our projects, I've run into a strange issue. Somehow, the memory allocated for objects (vector of shared_ptr to object, see below) is not fully reclaimed when the parent container goes out of scope and can't be used except for small objects.
The minimal example: when the program starts, I can allocate a single continuous block of 1.5Gb without problem. After I use the memory somewhat (by creating and destructing an number of small objects), I can no longer do big block allocation.
Test program:
#include <iostream>
#include <memory>
#include <vector>
using namespace std;
class BigClass
{
private:
double a[10000];
};
void TestMemory() {
cout<< "Performing TestMemory"<<endl;
vector<shared_ptr<BigClass>> list;
for (int i = 0; i<10000; i++) {
shared_ptr<BigClass> p(new BigClass());
list.push_back(p);
};
};
void TestBigBlock() {
cout<< "Performing TestBigBlock"<<endl;
char* bigBlock = new char [1024*1024*1536];
delete[] bigBlock;
}
int main() {
TestBigBlock();
TestMemory();
TestBigBlock();
}
Problem also repeats if using plain pointers with new/delete or malloc/free in cycle, instead of shared_ptr.
The culprit seems to be that after TestMemory(), the application's virtual memory stays at 827125760 (regardless of number of times I call it). As a consequence, there's no free VM regrion big enough to hold 1.5 GB. But I'm not sure why - since I'm definitely freeing the memory I used. Is it some "performance optimization" CRT does to minimize OS calls?
Environment is Windows 7 x64 + VS2012 + 32-bit app without LAA
Sorry for posting yet another answer since I am unable to comment; I believe many of the others are quite close to the answer really :-)
Anyway, the culprit is most likely address space fragmentation. I gather you are using Visual C++ on Windows.
The C / C++ runtime memory allocator (invoked by malloc or new) uses the Windows heap to allocate memory. The Windows heap manager has an optimization in which it will hold on to blocks under a certain size limit, in order to be able to reuse them if the application requests a block of similar size later. For larger blocks (I can't remember the exact value, but I guess it's around a megabyte) it will use VirtualAlloc outright.
Other long-running 32-bit applications with a pattern of many small allocations have this problem too; the one that made me aware of the issue is MATLAB - I was using the 'cell array' feature to basically allocate millions of 300-400 byte blocks, causing exactly this issue of address space fragmentation even after freeing them.
A workaround is to use the Windows heap functions (HeapCreate() etc.) to create a private heap, allocate your memory through that (passing a custom C++ allocator to your container classes as needed), and then destroy that heap when you want the memory back - This also has the happy side-effect of being very fast vs delete()ing a zillion blocks in a loop..
Re. "what is remaining in memory" to cause the issue in the first place: Nothing is remaining 'in memory' per se, it's more a case of the freed blocks being marked as free but not coalesced. The heap manager has a table/map of the address space, and it won't allow you to allocate anything which would force it to consolidate the free space into one contiguous block (presumably a performance heuristic).
There is absolutely no memory leak in your C++ program. The real culprit is memory fragmentation.
Just to be sure(regarding memory leak point), I ran this program on Valgrind, and it did not give any memory leak information in the report.
//Valgrind Report
mantosh#mantosh4u:~/practice$ valgrind ./basic
==3227== HEAP SUMMARY:
==3227== in use at exit: 0 bytes in 0 blocks
==3227== total heap usage: 20,017 allocs, 20,017 frees, 4,021,989,744 bytes allocated
==3227==
==3227== All heap blocks were freed -- no leaks are possible
Please find my response to your query/doubt asked in original question.
The culprit seems to be that after TestMemory(), the application's
virtual memory stays at 827125760 (regardless of number of times I
call it).
Yes, real culprit is hidden fragmentation done during the TestMemory() function.Just to understand the fragmentation, I have taken the snippet from wikipedia
"
when free memory is separated into small blocks and is interspersed by allocated memory. It is a weakness of certain storage allocation algorithms, when they fail to order memory used by programs efficiently. The result is that, although free storage is available, it is effectively unusable because it is divided into pieces that are too small individually to satisfy the demands of the application.
For example, consider a situation wherein a program allocates 3 continuous blocks of memory and then frees the middle block. The memory allocator can use this free block of memory for future allocations. However, it cannot use this block if the memory to be allocated is larger in size than this free block."
The above explains paragraph explains very nicely about memory fragmentation.Some allocation patterns(such as frequent allocation and deal location) would lead to memory fragmentation,but its end impact(.i.e. memory allocation 1.5GBgets failed) would greatly vary on different system as different OS/heap manager has different strategy and implementation.
As an example, your program ran perfectly fine on my machine(Linux) however you have encountered the memory allocation failure.
Regarding your observation on VM size remains constant: VM size seen in task manager is not directly proportional to our memory allocation calls. It mainly depends on the how much bytes is in committed state. When you allocate some dynamic memory(using new/malloc) and you do not write/initialize anything in those memory regions, it would not go committed state and hence VM size would not get impacted due to this. VM size depends on many other factors and bit complicated so we should not rely completely on this while understanding about dynamic memory allocation of our program.
As a consequence, there's no free VM regrion big enough to hold 1.5
GB.
Yes, due to fragmentation, there is no contiguous 1.5GB memory. It should be noted that total remaining(free) memory would be more than 1.5GB but not in fragmented state. Hence there is not big contiguous memory.
But I'm not sure why - since I'm definitely freeing the memory I used.
Is it some "performance optimization" CRT does to minimize OS calls?
I have explained about why it may happen even though you have freed all your memory. Now in order to fulfil user program request, OS will call to its virtual memory manager and try to allocate the memory which would be used by heap memory manager. But grabbing the additional memory does depend on many other complex factor which is not very easy to understand.
Possible Resolution of Memory Fragmentation
We should try to reuse the memory allocation rather than frequent memory allocation/free. There could be some patterns(like a particular request size allocation in particular order) which may lead overall memory into fragmented state. There could be substantial design change in your program in order to improve memory fragmentation. This is complex topic and require internal understanding of memory manager to understand the complete root cause of such things.
However there are tools exists on Windows based system which I am not much aware. But I found one excellent SO post regarding the which tool(on windows) can be useful to understand and check the fragmentation status of your program by yourself.
https://stackoverflow.com/a/1684521/2724703
This is not memory leak. The memory U used was allocated by C\C++ Runtime. The Runtime apply a a bulk of memory from OS once and then each new you called will allocated from that bulk memory. when delete one object, the Runtime not return memory to OS immediately, it may hold that memory for performance.
There is nothing here which indicates a genuine "leak". The pattern of memory you describe is not unexpected. Here are a few points which might help to understand. What happens is highly OS dependent.
A program often has a single heap which can be extended or shrunk in length. It is however one contiguous memory area, so changing the size is just changing where the end of the heap is. This makes it very difficult to ever "return" memory to the OS, since even one little tiny object in that space will prevent its shrinking. On Linux you can lookup the function 'brk' (I know you're on Windows, but I presume it does something similar).
Large allocations are often done with a different strategy. Rather than putting them in the general purpose heap, an extra block of memory is created. When it is deleted this memory can actually be "returned" to the OS since its guaranteed nothing is using it.
Large blocks of unused memory don't tend to consume a lot of resources. If you generally aren't using the memory any more they might just get paged to disk. Don't presume that because some API function says you're using memory that you are actually consuming significant resources.
APIs don't always report what you think. Due to a variety of optimizations and strategies it may not actually be possible to determine how much memory is in use and/or available on a system at a particular moment. Unless you have intimate details of the OS you won't know for sure what those values mean.
The first two points can explain why a bunch of small blocks and one large block result in different memory patterns. The latter points indicate why this approach to detecting leaks is not useful. To detect genuine object-based "leaks" you generally need a dedicated profiling tool which tracks allocations.
For example, in the code provided:
TestBigBlock allocates and deletes array, assume this uses a special memory block, so memory is returned to OS
TestMemory extends the heap for all the small objects, and never returns any heap to the OS. Here the heap is entirely available from the applications point-of-view, but from the OS's point of view it is assigned to the application.
TestBigBlock now fails, since although it would use a special memory block, it shares the overall memory space with heap, and there just isn't enough left after 2 is complete.

C++ heap organisation - which data structure?

C++ has two main memory types: heap and stack. With stack everything is clear to me, but concerning heap a question remains:
How is heap memory organised? I read about the heap data structure, but that seems not the applicable to the heap, because in C++ I can access any data from the heap, not just a minimum or maximum value.
So my question is: how is heap memory organized in C++? When we read "is allocated on the heap" what memory organization do we have in mind?
A common (though certainly not the only) implementation of the free store is a linked list of free blocks of memory (i.e., blocks not currently in use). The heap manager will typically allocate large blocks of memory (e.g., 1 megabyte at a time) from the operating system, then split these blocks up into pieces for use by your code when you use new and such. When you use delete, the block of memory you quit using will be added as a node on that linked list.
There are various strategies for how you use those free blocks. Three common ones are best fit, worst fit and first fit.
In best fit, you try to find a free block that's closest in size to the required allocation. If it's larger than required (typically after rounding the allocation size) you split it into two pieces: one to return for the allocation, and a new (smaller) free block to put back on the list to meet other allocation requests. Although this can seem like a good strategy, it's often problematic. The problem is that when you find the closest fit, the left-over block is often too small to be of much use. After a short time, you end up with a huge number of tiny blocks of free space, none of which is good for much of anything.
Worst fit combats that by instead finding the worst fit among the free blocks -- IOW, the largest block of free space. When it splits that block, what's left over will be as large as possible, maximizing the chance that it'll be useful for other allocations.
First fist just walks through the list of free blocks, and (as you'd guess from the name) uses the first block that's large enough to meet the requirement.
Quite a few also start with a search for an exact fit, and use that in preference to splitting a block.
Quite a few also keep (for example) a number of separate linked lists for different allocation sizes to minimize searching for a block of the right size.
In quite a few cases, the manager also has some code to walk through the list of free blocks, to find any that are right next to each other. If it finds them, it will join the two small blocks into one larger bock. Sometimes this is done right when you free/delete a block. More often, it's done lazily, to avoid joining then re-splitting blocks when/if you're using a lot of the same size of blocks (which is fairly common).
Another possibility that's common when dealing with a large number of identically-sized items (especially small ones) is an array of blocks, with a bitset to specify which are free or in use. In this case, you typically keep track of an index into the bitset where the last free block was found. When a block is needed, just search forward from the last index until you find the next one for which the bitset says the block is free.
Heap has two main meanings with two different concepts:
A data structure for arrange data, a tree.
A pool of available memory. It manages memory by allocate/free available memory blocks.
An introduction about Heap in Memory Management:
The heap is the other dynamic memory area, allocated/freed by
malloc/free and their variants....
Here heap doesn't mean the heap data structure. That memory is termed as heap memory where global, static variables are stored. Also when we allocate memory dynamically, it is allocated in the heap memory.
When you read "allocated on the heap", that's usually an implementation-specific implementation of C++'s "dynamic storage duration". It means you've used new to allocate memory, and have a pointer to it that you now have to keep track of til you delete (or delete[]) it.
As for how it's organized, there's no one set way of doing so. At one point, if memory serves, the "heap" used to actually be a min-heap (organized by block size). That's implementation specific, though, and doesn't have to be that way. Any data structure will do, some better than others for given circumstances.

Can't track down source of huge memory use

I've been trying to track down a memory problem for a couple of days - my program is using around 3GB of memory, when it should be using around 200MB-300MB. Valgrind is actually reporting that it is using ~300MB at its peak, and is not reporting any memory leaks.
The program reads an input file, and stores every unique word in that file. It is multi-threaded, and I've been running it using 4 threads. My major sources of data are:
Constant-size array of wchar_t (4MB total)
Map between words and a list of associated values. This grows with the size of input. If there are 1,000,000 unique words in the input file, there will be 1,000,000 entries in the tree.
I am doing a huge number of allocations and deallocations (using new and delete) -- at least two per unique word. Is it possible that memory I free is not being reused for some reason, causing the program to keep acquiring more and more memory? It consistently grabs more as it continues to run.
In general, any ideas about where I should go from here?
Edit 1 (based on advice from Graham):
One path I'll try is minimizing allocation. I'll work with a single string per thread (which may grow occasionally if a word is longer than this string is), but if I remember my code correctly this will eliminate a huge number of new/delete calls. If all goes well I'll be left with: one-time allocation of input buffer, one-time allocation of string-per-thread (with some reallocs), two allocs per map entry (one for key, one for value).
Thanks!
It's likely to be heap fragmentation. Because you are allocating and releasing small blocks in such huge quantities, it's probable that there are loads of small free chunks which are too small to be reused by subsequent allocations. Since these chunks are effectively wasted, the process has to keep grabbing more and more memory from the system to honour new allocations.
You may be able to mitigate the effect by first reserving a sufficiently large default capacity in each string with string::reserve(), and then clearing strings to empty when you're finished with them (rather than deleting). Then, keep a list of empty strings to be reused instead of allocating new ones all the time.
EDIT: the above suggestion assumes the objects being allocated are std::strings. If they're not, then you can probably still apply the general technique of keeping old empty objects around for reuse.
Memory your program frees should be returned to the heap where it can be allocated again.
However, that does not mean it is freed back to the operating system. Often, the app will continue to "own" memory that has been allocated and freed.
Is this a Windows app? How are you allocating and freeing the memory? And how are you determining how much memory the app is using?
You should try wrapping the resource allocations into a class if you can. Call new in the constructor, and delete in the destructor. Try and take advantage of scope so memory management is done more automatically.
http://en.wikipedia.org/wiki/RAII