Writing a memory manager and defragmenting memory - c++

The idea is writing a memory manager that allocates a bunch of memory at a time to minimize malloc and free calls, i've tried writing this by my self two times but both times i ran into the problem of defragmenting memory.
You could just check if a block is empty every so often, and if it is empty delete it. But let's say your blocks are 100 bytes each, first you allocate 20 bytes of memory, this will create a new 100 byte block because no blocks exist yet, then you allocate 80 bytes and this fills the first block, then you allocate another 20 bytes and this will create another new block because this first block is full, then you free the second allocation (80 bytes) and that leaves you with two blocks of which only the first 20 bytes are used, this means you have 100 bytes allocated that could be freed by moving the 20 bytes from the second block into the first block and deleting the second block.
These are the problems i ran into:
you can't move the memory around because this means all pointers to that memory will have to be updated, and for that to happen you need to know their addresses, which you don't;
100 bytes is a very small block size, what if i want to store a very low-res (64,64) ARGB image in memory? This will use 16KB of memory and moving all of that might be even slower than just not writing a memory manager at all.
Is it even worth it writing a custom memory manager after all that?

Is it even worth it writing a custom memory manager after all that?
This is asking for an opinion, but I'll try to give a factual answer.
The memory allocators that come with most operating systems and language support libraries are generally very high quality and are designed to address the types of problems you encountered (fragmentation and performance) as well as others. They are about as good as general purpose memory allocators can be.
You can do (a little) better than the provided memory allocator if your application has a particular allocation pattern that can be exploited. That's rare, but you can generally take advantage of it by making something substantially simpler than a general purpose memory manager.
you can't move the memory around
True. Most modern systems don't even try to move memory around--they try to avoid fragmentation to begin with (typically by clustering similarly sized allocations).
Old systems (ones without virtual memory managers) sometimes used memory managers that had an extra layer of indirection. Instead of returning a pointer to the allocated memory, the allocator would return an "handle", which could be as simple as an index into a table maintained by the memory manager. When the user wanted to actually access the memory, they would "lock" it. The memory manager was free to move around memory that wasn't locked (e.g., to eliminate fragmentation) because the handles gave an extra level of indirection.
what if i want to store a very low-res (64,64) ARGB image
Most memory managers provide a range of sizes so a large allocation wouldn't be split across n smaller blocks. Most will punt very large allocations to the system allocator, which, on a virtual memory operating system, can generally solve the problem unless the process address space is overly fragmented.

Related

Does new / malloc cause memory shuffling in the environment of sufficient but fragmented memory?

This is just out of curiousity.
For example, let's say we have used up 2 of 8 bytes of memory:
[xx------]
If I call new / malloc requesting for 3 bytes, it should work just fine, perhaps like so:
[xx--xxx-]
What then happens if I call new / malloc requesting for another 3 bytes? In terms of available memory, there are still 3 free bytes, even though they are not contiguous. Will the program then "defragment" memory to make space for the new allocation? Sounds impossible as I would still be holding on to references to the existing allocs.
If so, then by extension to an extreme case, if your memory somehow ends up super fragmented (eg every other byte is allocated ala [x-x-x-x- x-x-x-x-]), does that mean I cannot allocate even 2 bytes despite having 50% memory free?
I don't suppose the platform matters?
Sorry for the extended question, but would this also happen in other languages like Java/C#?
That's right, the memory can get so fragmented that no allocations are possible anymore. A good allocator will merge the released blocks when possible, to limit fragmentation.
The MS .NET managed framework solves this problem by allocating the memory blocks indirectly, i.e. via a pointer to a pointer to a block. This way, a block can be moved to defragment memory, without changing the pointer to the pointer. (Some care is taken by the framework to avoid concurrency problems, because .NET has a garbage collector running asynchronously.)

Sequential memory allocation

I'm planning a application that allocates a lot of variables in memory. In difference from another "regular" application, I want this memory be allocated in specific memory blocks of 4096 bytes. My allocated vars must be placed in memory sequentially. One after another, in order to fill the whole allocated memory.
For example, I'm allocating a region (4096 bytes) in memory and this region is ready for my further use. From now, each time that my application creates a new variable in memory (which is probably made in "regular" application with malloc), this variable will be placed in free space in my memory region.
This sequential memory allocation is similare to how an array allocation works. But, in my case, I need an array that will be able to contain many types of data (string, byte, int, ...).
One possible solution is to achieve this is by pointer arithmetics. I want to avoid this method, this may insert a lot of bugs in my application.
Maybe someone solved this problem before?
Thank you!
malloc() by no means guarantees that subsequent allocated blocks are on sequential memory address. Even worse, most implementations use a small number of bytes before and/or after the allocated block for 'housekeeping'. This means that, even if you're lucky that addresses are sequential, there will be small gaps in between the blocks. So the actual allocated blocks are slightly bigger to make space for those 'housekeeping' bytes.
As you suggest, you'll need to write some code yourself and write a few functions with malloc(), realloc(), ... You can hide all the logic in these functions and should not make your application code using these functions more complex compared to using malloc() if it did what you wanted.
Important questions: Why do you need to have these blocks adjacent to each other? What about freeing blocks?

Why is memory not reusable after allocating/deallocating a number of small objects?

While investigating a memory link in one of our projects, I've run into a strange issue. Somehow, the memory allocated for objects (vector of shared_ptr to object, see below) is not fully reclaimed when the parent container goes out of scope and can't be used except for small objects.
The minimal example: when the program starts, I can allocate a single continuous block of 1.5Gb without problem. After I use the memory somewhat (by creating and destructing an number of small objects), I can no longer do big block allocation.
Test program:
#include <iostream>
#include <memory>
#include <vector>
using namespace std;
class BigClass
{
private:
double a[10000];
};
void TestMemory() {
cout<< "Performing TestMemory"<<endl;
vector<shared_ptr<BigClass>> list;
for (int i = 0; i<10000; i++) {
shared_ptr<BigClass> p(new BigClass());
list.push_back(p);
};
};
void TestBigBlock() {
cout<< "Performing TestBigBlock"<<endl;
char* bigBlock = new char [1024*1024*1536];
delete[] bigBlock;
}
int main() {
TestBigBlock();
TestMemory();
TestBigBlock();
}
Problem also repeats if using plain pointers with new/delete or malloc/free in cycle, instead of shared_ptr.
The culprit seems to be that after TestMemory(), the application's virtual memory stays at 827125760 (regardless of number of times I call it). As a consequence, there's no free VM regrion big enough to hold 1.5 GB. But I'm not sure why - since I'm definitely freeing the memory I used. Is it some "performance optimization" CRT does to minimize OS calls?
Environment is Windows 7 x64 + VS2012 + 32-bit app without LAA
Sorry for posting yet another answer since I am unable to comment; I believe many of the others are quite close to the answer really :-)
Anyway, the culprit is most likely address space fragmentation. I gather you are using Visual C++ on Windows.
The C / C++ runtime memory allocator (invoked by malloc or new) uses the Windows heap to allocate memory. The Windows heap manager has an optimization in which it will hold on to blocks under a certain size limit, in order to be able to reuse them if the application requests a block of similar size later. For larger blocks (I can't remember the exact value, but I guess it's around a megabyte) it will use VirtualAlloc outright.
Other long-running 32-bit applications with a pattern of many small allocations have this problem too; the one that made me aware of the issue is MATLAB - I was using the 'cell array' feature to basically allocate millions of 300-400 byte blocks, causing exactly this issue of address space fragmentation even after freeing them.
A workaround is to use the Windows heap functions (HeapCreate() etc.) to create a private heap, allocate your memory through that (passing a custom C++ allocator to your container classes as needed), and then destroy that heap when you want the memory back - This also has the happy side-effect of being very fast vs delete()ing a zillion blocks in a loop..
Re. "what is remaining in memory" to cause the issue in the first place: Nothing is remaining 'in memory' per se, it's more a case of the freed blocks being marked as free but not coalesced. The heap manager has a table/map of the address space, and it won't allow you to allocate anything which would force it to consolidate the free space into one contiguous block (presumably a performance heuristic).
There is absolutely no memory leak in your C++ program. The real culprit is memory fragmentation.
Just to be sure(regarding memory leak point), I ran this program on Valgrind, and it did not give any memory leak information in the report.
//Valgrind Report
mantosh#mantosh4u:~/practice$ valgrind ./basic
==3227== HEAP SUMMARY:
==3227== in use at exit: 0 bytes in 0 blocks
==3227== total heap usage: 20,017 allocs, 20,017 frees, 4,021,989,744 bytes allocated
==3227==
==3227== All heap blocks were freed -- no leaks are possible
Please find my response to your query/doubt asked in original question.
The culprit seems to be that after TestMemory(), the application's
virtual memory stays at 827125760 (regardless of number of times I
call it).
Yes, real culprit is hidden fragmentation done during the TestMemory() function.Just to understand the fragmentation, I have taken the snippet from wikipedia
"
when free memory is separated into small blocks and is interspersed by allocated memory. It is a weakness of certain storage allocation algorithms, when they fail to order memory used by programs efficiently. The result is that, although free storage is available, it is effectively unusable because it is divided into pieces that are too small individually to satisfy the demands of the application.
For example, consider a situation wherein a program allocates 3 continuous blocks of memory and then frees the middle block. The memory allocator can use this free block of memory for future allocations. However, it cannot use this block if the memory to be allocated is larger in size than this free block."
The above explains paragraph explains very nicely about memory fragmentation.Some allocation patterns(such as frequent allocation and deal location) would lead to memory fragmentation,but its end impact(.i.e. memory allocation 1.5GBgets failed) would greatly vary on different system as different OS/heap manager has different strategy and implementation.
As an example, your program ran perfectly fine on my machine(Linux) however you have encountered the memory allocation failure.
Regarding your observation on VM size remains constant: VM size seen in task manager is not directly proportional to our memory allocation calls. It mainly depends on the how much bytes is in committed state. When you allocate some dynamic memory(using new/malloc) and you do not write/initialize anything in those memory regions, it would not go committed state and hence VM size would not get impacted due to this. VM size depends on many other factors and bit complicated so we should not rely completely on this while understanding about dynamic memory allocation of our program.
As a consequence, there's no free VM regrion big enough to hold 1.5
GB.
Yes, due to fragmentation, there is no contiguous 1.5GB memory. It should be noted that total remaining(free) memory would be more than 1.5GB but not in fragmented state. Hence there is not big contiguous memory.
But I'm not sure why - since I'm definitely freeing the memory I used.
Is it some "performance optimization" CRT does to minimize OS calls?
I have explained about why it may happen even though you have freed all your memory. Now in order to fulfil user program request, OS will call to its virtual memory manager and try to allocate the memory which would be used by heap memory manager. But grabbing the additional memory does depend on many other complex factor which is not very easy to understand.
Possible Resolution of Memory Fragmentation
We should try to reuse the memory allocation rather than frequent memory allocation/free. There could be some patterns(like a particular request size allocation in particular order) which may lead overall memory into fragmented state. There could be substantial design change in your program in order to improve memory fragmentation. This is complex topic and require internal understanding of memory manager to understand the complete root cause of such things.
However there are tools exists on Windows based system which I am not much aware. But I found one excellent SO post regarding the which tool(on windows) can be useful to understand and check the fragmentation status of your program by yourself.
https://stackoverflow.com/a/1684521/2724703
This is not memory leak. The memory U used was allocated by C\C++ Runtime. The Runtime apply a a bulk of memory from OS once and then each new you called will allocated from that bulk memory. when delete one object, the Runtime not return memory to OS immediately, it may hold that memory for performance.
There is nothing here which indicates a genuine "leak". The pattern of memory you describe is not unexpected. Here are a few points which might help to understand. What happens is highly OS dependent.
A program often has a single heap which can be extended or shrunk in length. It is however one contiguous memory area, so changing the size is just changing where the end of the heap is. This makes it very difficult to ever "return" memory to the OS, since even one little tiny object in that space will prevent its shrinking. On Linux you can lookup the function 'brk' (I know you're on Windows, but I presume it does something similar).
Large allocations are often done with a different strategy. Rather than putting them in the general purpose heap, an extra block of memory is created. When it is deleted this memory can actually be "returned" to the OS since its guaranteed nothing is using it.
Large blocks of unused memory don't tend to consume a lot of resources. If you generally aren't using the memory any more they might just get paged to disk. Don't presume that because some API function says you're using memory that you are actually consuming significant resources.
APIs don't always report what you think. Due to a variety of optimizations and strategies it may not actually be possible to determine how much memory is in use and/or available on a system at a particular moment. Unless you have intimate details of the OS you won't know for sure what those values mean.
The first two points can explain why a bunch of small blocks and one large block result in different memory patterns. The latter points indicate why this approach to detecting leaks is not useful. To detect genuine object-based "leaks" you generally need a dedicated profiling tool which tracks allocations.
For example, in the code provided:
TestBigBlock allocates and deletes array, assume this uses a special memory block, so memory is returned to OS
TestMemory extends the heap for all the small objects, and never returns any heap to the OS. Here the heap is entirely available from the applications point-of-view, but from the OS's point of view it is assigned to the application.
TestBigBlock now fails, since although it would use a special memory block, it shares the overall memory space with heap, and there just isn't enough left after 2 is complete.

Why the growth direction of heap address become opposite when allocating larger space?

I was doing some experiments on the heap address growth, and something interesting happened.
(OS: CentOS, )
But I don't understand, why this happened? Thanks!
This is what I did first:
double *ptr[1000];
for (int i=0;i<1000;i++){
ptr[i] = new double[**10000**];
cout << ptr[i] << endl;
}
The output is incremental(for the last few lines):
....
....
0x2481be0
0x2495470
0x24a8d00
0x24bc590
0x24cfe20
0x24e36b0
0x24f6f40
0x250a7d0
0x251e060
Then I changed 10000 to 20000:
double *ptr[1000];
for (int i=0;i<1000;i++){
ptr[i] = new double[**20000**];
cout << ptr[i] << endl;
}
The address became more like the address of stack space(and decremental):
....
....
0x7f69c4d8a010
0x7f69c4d62010
0x7f69c4d3a010
0x7f69c4d12010
0x7f69c4cea010
0x7f69c4cc2010
0x7f69c4c9a010
0x7f69c4c72010
0x7f69c4c4a010
0x7f69c4c22010
0x7f69c4bfa010
0x7f69c4bd2010
0x7f69c4baa010
0x7f69c4b82010
Different environments/implementations allocate memory using different strategies, so there is no one correct rule. However, a common pattern is to use different allocation strategies for small objects vs. large objects.
Often, a runtime will have multiple heaps for objects of different sizes, which are optimized for different usage patterns. For example, small objects tend to be allocated often and deleted quickly, while large objects tend to be created rarely and have a long life.
If you use a single heap for everything, then a few small objects will be quickly peppered throughout your memory space, leaving lots of medium sized blocks available but few or no large blocks needed for larger objects. This is referred to as memory fragmentation, and can cause your allocation to fail even if nominally your app has tons of memory available.
Another reason to use different heaps is to use a different usage tracking method for different object sizes. For example, an implementation might request a new memory block from the OS for large objects, and for small objects, use a few smaller OS memory blocks with sub-allocations handled by the C runtime heap manager. Memory usage tracking mechanisms that are very effective for large objects can be very expensive for smaller ones because the memory used for tracking usage becomes a significant fraction of the actual memory used by each object.
In your case, my guess is that the runtime is allocating small objects at the beginning of the memory space, bottom-up, and larger ones near the end, top-down, to avoid fragmentation.
You're not going to get a great answer here, because the new function can choose any method it wants to allocate memory. My guess would be that the algorithm here broke the pool into small and large allocation pools, and the big allocation pool grows downward so they can meet in the middle (so as to not waste any space).
On UNIX, allocators use sbrk(2) and mmap(2) to get memory from the OS. The addresses returned by sbrk are well defined, but the addresses from mmap are "whatever is available". On Windows, allocators use VirtualAlloc() which is kind of like mmap.
Implementations are free to have hybrids of different allocation schemes. In C++, it's normal for there to be thousands - even millions - of relatively small objects, so it can make sense for the library's memory allocation routines to make sure they pack well and are very lightweight. Your allocations for 10000 doubles do that: they're 80016 bytes apart - 80000 for 10000 8 byte variables and just 16 bytes padding. Node specifically that the size has no relationship to powers of two, whereas when allocation 20000 doubles they're decrementing by 163840 bytes each time... weirdly, exactly 10 * 2^14. That suggests to me that the former allocations are being satisfied from one heap designed to support efficient small-object allocation by the C++ allocation new function, while the latter has crossed a theshold and is probably being sent to malloc for memory coming from a distinct heap, with much more waste.
You were lucky in the sense that the sizes of 10000 doubles and 20000 doubles happen to lie on the opposite sides of a critical thresholds called MMAP_THRESHOLD.
MMAP_THRESHOLD is 128KB by default. So, 80KB (i.e., 10000 doubles) mem alloc requests are serviced over heap, and whereas 160KB (20000 doubles) mem alloc requests are serviced by anonymous memory mapping (through mmap sys call). (Note that using mem mapping for large mem alloc may incur additional penalties due to its different underlying mem alloc handling mechanism. You may want to tune MMAP_THRESHOLD for optimal performance of your apps.)
In Linux Man for malloc:
Normally, malloc() allocates memory from the heap, and adjusts the size of the heap as required, using sbrk(2). When allocating blocks of memory larger than MMAP_THRESHOLD bytes, the glibc malloc() implementation allocates the memory as a private anonymous mapping using mmap(2). MMAP_THRESHOLD is 128 kB by default, but is adjustable using mallopt(3). Allocations performed using mmap(2) are unaffected by the RLIMIT_DATA resource limit (see getrlimit(2)).

How will increasing each memory allocation size by a fixed number of bytes affect heap fragmentation?

I have operator new() replaced in my C++ program so that it allocates a slightly bigger block to store extra data. So the program performs exactly the same set of allocations except that now it requests several bytes more memory in each allocation. Otherwise its behavior is completely the same and it processes exactly same data. The program allocates lots of blocks (millions, I suppose) of various sizes during its runtime.
How will increasing each allocation size by a fixed number of bytes (same for every allocation) affect heap fragmentation?
Unless your program uses some "edge" block sizes (say, near to a power of two), I don't see that block size (or a small difference in block size compared to the program with standard allocation) may affect fragmentation. With millions of allocation, a good allocator fills up the space and manages it efficiently.
Thinking the other way around, imagine your program originally used the blocks of the sizes the same as the one with the modified allocator. Would you then bother about memory fragmentation in that case?
Heaps are normally implemented as linked lists of cell. On application startup there is only one large cell. Your first allocation breaks off a small piece at the beginning to create a new allocated heap cell. Subsequent allocations do the same. After a while some cells are freed leaving free holes between allocated block.
After running a while, when you request an allocation, the allocator walks the heap until it finds a free cell of equal size to that requested or bigger. Rounding up to larger cell allocation sizes may require more memory up front but increases the likelyhood of finding a suitable free cells meaning that new memory does not have to be added to the end of the heap. This may improve performance.
However, bear in mind heap operations are expensive and therefore should be minimized. You are most probably allocating and deallocating objects of the same type and therefore same size. Look into using specialized free lists for your object. This will save the heap operation and thus mimimize fragmentation. The STL has allocators for this very reason.
It depends on the implementation driving the memory allocator, for instance:
On widows, it pulls memory from the process heap, under under XP, this heap its not set to be the low fragmentation implementation, which could really throw a spanner in the works.
Under a bin or slab based allocator, your few extra bytes might push it up to the next block size, wasting memory madly and causing horrible virtual memory thrashing.
Depending on your memory usage needs, you might be better served by using a custom allacator to replace ::new, something like hoard or nedmalloc.
If your blocks (allocated and deallocated memory) are still smaller that a C library allocator handles without problems with fragmentation than you must not face any memory fragmentation. For example take a look at my own question about allocators: Small block allocator on Linux (or RedHat Linux) to avoid memory fragmentation.
In other words. You have implented your own ::operator new() and in it you call malloc() and pass a slighly bigger block size. malloc() is in a C library and it is responsible not only for allocating and deallocating but also for avoiding memory fragmentation. If you do not frequently allocate and free blocks with sizes bigger than the allocator can handle efficently then you can expect that there will be on memory fragmentation.