Designing and coding a non-fragmentizing static memory pool

Designing and coding a non-fragmentizing static memory pool - c++

I have heard the term before and I would like to know how to design and code one.
Should I use the STL allocator if available?
How can it be done on devices with no OS?
What are the tradeoffs between using it and using the regular compiler implemented malloc/new?

I would suggest that you should know that you need a non-fragmenting memory allocator before you put much effort into writing your own. The one provided by the std library is usually sufficient.
If you need one, the general idea of reducing fragmentation is to grab large blocks of memory at once and allocate from the pool rather than asking the OS to provide you with heap memory sporadically and at highly varying places within the heap and interspersed with many other objects with varying sizes. Since the author of the specialized memory allocator has more knowledge on the size of the objects allocated from the pool and how those allocations occur, the allocator can use the memory more efficiently than a general purpose allocator such as the one provided by the STL.
You can look at memory allocators such as Hoard which while reducing memory fragmentation, also can increase performance by providing thread specific heaps which reduce contention. This can help your application scale more linearly, especially on multi-core platforms.
More info on multi-threaded allocators can be found here.

Will try to describe what is essentially a memory pool - I'm just typing this off the top of my head, been a while since I've implemented one, if something is obviously stupid, it's just a suggestion! :)
1.
To reduce fragmentation, you need to create a memory pool that is specific to the type of object you are allocating in it. Essentially, you then restrict the size of each allocation to the size of the object you are interested in. You could implement a templated class which has a list of dynamically allocated blocks (the reason for the list being that you can grow the amount of space available). Each dynamically allocated block would essentially be an array of T.
You would then have a "free" list, which is a singly linked list, where the head points to the next available block. Allocation is then simply returning the head. You could overlay the linked list in the block itself, i.e. each "block" (which represents the aligned size of T), would essentially be a union of T and a node in the linked list, when allocated, it's T, when freed, a node in the list. !!There are obvious dangers!! Alternatively, you could allocate a separate (and protected block, which adds more overhead) to hold an array of addresses in the block.
Allocating is trivial, iterate through the list of blocks and allocate from first available, freeing is also trivial, the additional check you have to do is the find the block from which this is allocated and then update the head pointer. (note, you'll need to use either placement new or override the operator new/delete in T - there are ways around this, google is your friend)
The "static" I believe implies a singleton memory pool for all objects of type T. The downside is that for each T you have to have a separate memory pool. You could be smart, and have a single object that manages pools of different size (using an array of pointers to pool objects where the index is the size of the object for example).
The whole point of the previous paragraph is to outline exactly how complex this is, and like RC says above, be sure you need it before you do it - as it is likely to introduce more pain than may be necessary!
2.
If the STL allocator meets your needs, use it, it's designed by some very smart people who know what they are doing - however it is for the generic case and if you know how your objects are allocated, you could make the above perform faster.
3.
You need to be able to allocate memory somehow (hardware support or some sort of HAL - whatever) - else I'm not sure how your program would work?
4.
The regular malloc/new does a lot more stuff under the covers (google is your friend, my answer is already an essay!) The simple allocator I describe above isn't re-entrant, of course you could wrap it with a mutex to provide a bit of cover, and even then, I would hazard that the simple allocator would perform orders of magnitude faster than normal malloc/free.
But if you're at this stage of optimization - presumably you've exhausted the possibility of optimizing your algorithms and data structure usage?

Related

What are the differences between Block, Stack and Scratch Allocators?

In his talk "Solving the Right Problems for Engine Developers", Mike Acton says that
the vast majority of the time, all you're going to need are these three types of allocator: there's the block allocator, the stack allocator and the scratch allocator
However, he doesn't go into detail about what the differences between these types of allocator are.
I would presume a 'stack allocator' is just a stack-based allocator, but all the other types I've heard of (including 'arena') just sound like fancy ways of doing the same thing, that is 'allocate a big block and chunk it up in a nice efficient way, then free it when you're done'
So, what are the differences between these allocators, what are the advantages of each, why do I only need these three 'the vast majority of the time'?

As was pointed out in the comments, the terminology used in the talk is not well established around the industry, so there is some doubt left as to what exact allocation strategies are being referred to here. Taking into account what is commonly mentioned in game programming literature, here is my educated guess what is behind the three mentioned allocators:
Block Allocator
Also known as a pool allocator. This is an allocator that only hands out fized-sized blocks of memory, regardless of how much memory the user actually requested.
Let's say you have a block allocator with a block size of 100 bytes. You want to allocate memory for a single 64 bit integer? It gives you a block of 100 bytes. You want to allocate memory for an array of 20 single precision floats? It gives you a block of 100 bytes. You want to allocate memory for an ASCII string with 101 characters? It gives you an error, as it can't fit your string into 100 bytes.
Block allocators have several advantages. They are relatively easy to implement and they don't suffer from external memory fragmentation. They also usually exhibit a very predictable runtime behavior, which is often essential for video games. They are well suited for problems where most allocations are of roughly the same size and obviously less well suited for when that is not the case.
Apart from the simplest version described here, where each allocator supports only a single block size, extensions exist that are more flexible, supporting multiple block sizes, without compromising too heavily on the aforementioned advantages.
Stack Allocator
A stack allocator works like a stack: You can only deallocate in the inverse order of allocation. If you subsequently allocate objects A and then B, you cannot reclaim the memory for A without also giving up B.
Stack allocators are very easy to implement, as you only need to keep track of a single pointer that marks the separation between the used and unused regions of memory. Allocation moves that pointer into one direction and deallocation moves it the opposite way.
Stack allocators make optimally efficient use of memory and have fully predictable runtime behavior. They obviously work well only for problems where the required order of deallocations is easy to achieve. It is usually not trivial to enforce the correct deallocation order statically, so debugging them can be a pain if they are being used carelessly.
Scratch Allocator
Also known as a monotonic allocator. A scratch allocator works similar to a stack allocator. Allocation works exactly the same. Deallocation is a no-op. That is, once memory has been allocated it cannot be reclaimed.
If you want to get the memory back, you have to destroy the entire scratch allocator, thereby releasing all of its memory at once.
The advantages of the scratch allocator are the same as with the stack allocator. They are well suited for problems where you can naturally identify points at which all allocated objects are no longer needed. Similar to the stack allocator, when used carelessly, they can lead to nasty runtime errors if an allocator is destroyed while there are still active objects alive.
Why do I only need those three?
Experience shows that in a lot of domains, fully dynamic memory management is not required. Allocation lifetimes can either be grouped by common size (block allocator) or by common lifetimes (scratch and stack allocator). If an engineer working in such a domain is willing to go through the troubles of classifying each allocation accordingly, they can probably make due with just these three allocation strategies for the majority of their dynamic memory needs, without introducing unreasonable additional development efforts. As a reward for their efforts, they will benefit from the nice runtime properties of these algorithms, in particular very fast and predictable execution times, and predictable memory consumption.
If you are in a domain where it is harder to classify allocations along those terms; or if you can not or are unwilling to spend the additional engineering effort; or if you are dealing with a special use case that doesn't map well to those three allocators - you will probably still want to use a general purpose allocator, i.e. good old malloc.
The point that was being made in the talk is more that if you do need to worry about custom memory allocation - and especially in the domain of video games with its specific requirements and trade offs - those three types of allocators are very good answers to the specific problems that you may otherwise encounter when naïvely relying on the general purpose allocator alone.
I gave a long talk about allocators in C++ a while back where I explain all this in more detail if you still want to know more.

Allocator
An allocator in c++ can define a set of functions for allocating and deallocating memory dynamically. There are severals contaianers such as vector,lis,deque. The allocator is used by the containers. The std::allocator class template is used as a default allocator for most containers.
The are several allocators are given below
std::allocator
std::pmr::polymorphic_allocator
pool allocator/block allocator
stack allocator
scratch alocator
Block Allocator
Block allocator is an allcator that can manage a pool of a memory and
can allocate memory block from the pool. Pool allocators are useful for applications where memory allocation and deallocation are performance-critical and where the size of the memory blocks being allocated is known ahead of time.
Stack Allocator
Stack allocator working procedure is just like stack data structure. Stack data structure follows LIFO meaning that last in first out. We use stack structure to store data in memory in the last index one after another. And when we need to remove data from stack then we remove data from last index instead of other index.
We can use the stack allocator at the same process as stack data structure. Suppose a container x is allocated and when needed to add another container y then the container is added in the last of x. Similarly when needed to remove a container the at first the container y will be removed instead of first container.
Scratch Allocator
A scratch allocator is a type of memory allocator that can be used to manage temporary memory that is needed for the duration of a specific function or task.

Preallocate memory for dynamic data structure

I have a question/curiosity.
Let's say I want to implement a list, and for example I could basically use the cormen book approach. Where it is explained how to implement, insert, delete, key search etc.
However nothing is said for what the memory use is concerned. For example if I would like to insert an integer, in a list of integers. I could for example first create a node (I allocate memory there) insert the integer and then insert the node in the list. If I would like to delete an integers, once I know in which node is stored, I have to free the memory.
I was now wondering if instead it would be more convenient to preallocate memory to store, say, 10 nodes and keeping a pointer to a free node to be used. If the memory pool is full then I reallocate memory for 20 nodes, if the pool is the large I half the size of such pool (and so on and so forth). The pool is of course more complicated to manage since I'd need for example to handle possible memory fragmentation etc.
Does what I'm saying make any sense? Or is it no sense? I've read in a book, for game programming, that memory preallocation could improve performance, but I was wondering how.

This is both a simple and a complex question. If you operate within standard problems, you don't really need to worry about memory allocation. For example, preallocating memory for 10 nodes won't be efficient in any scale, and your performance problems might be elsewhere. However, if your program constantly allocates and deallocates hundreds or thousands of small objects per second, it could lead to memory fragmentation, and you might need to write your custom allocator.
Almost no standard containers don't have any methods to preallocate elements storage, except for std::vector::reserve function. All of them, however, allow to use custom allocators in constructors. Also, there's placement new operator.
You could try to experiment with such things, they're fun to write, just don't use them in production if you absolutely don't have to.

I was now wondering if instead it would be more convenient to preallocate memory to store, say, 10 nodes and keeping a pointer to a free node to be used.
You basically are describing what a pool allocator usually does (I assume you are talking about nodes of constant size). So, the short answer to your question is: yes you would improve performance by using a pool allocator with a list container.
Memory allocators shipped with common compilers are quite good for general purpose allocation (i.e. for allocation of random size objects). However, when your need is to allocate objects of constant size, you should consider using a custom pool allocator. You can easily understand why a constant size objects allocator performs faster than the standard one.
You might write your own pool allocator, however it's not an easy task and you should better consider using an existing one, such as boost pool_allocator or fast_pool_allocator.

Is it possible to implement a memory pool that works with arrays instead of single objects?

I know it's easy to make a memory pool for single objects, however I need to make a memory pool for arrays. The memory pool I have currently has a vector of addresses to contiguous memory blocks and a stack that points to each object from these blocks, so when you allocate from the pool you just pop the stack and when you free, you just push an object's address back to it. However I also need an array equivalent. Something like this:
template<typename T>
class ArrayPool
{
public:
ArrayPool();
~ArrayPool();
T* AllocateArray(int x); //Returns a pointer to a T array that contains 'x' elements.
void FreeArray(T* arr, int x); //Returns the array to the free address list/stack/whatever/
};
Has such a thing been implemented? I imagine a big problem from having such a pool - if make sure arrays returned by ALlocateArray are contiguous in memory, I'm basically doing the same as if not having a memorypool. Just allocating arrays on the spot. With the normal object pool every time I just allocate 1 object. With the arrays I may allocate a different sized array every time, so once an array is freed, it won't be compatible with a new one of different size, unless I stich arrays together with some linkedlist-like structure, but then they won't be contiguous.

Currently your allocator takes advantage of the fact that all allocations are the same size. This simplifies and speeds up allocation and freeing, and means memory fragmentation is impossible.
If you have to allocate arrays of any size, then what you want is a general-purpose allocator, not a pool allocator. What to do next depends why you're using a pool allocator in the first place. I can think of two other features of a pool allocator that might be relevant, and there may be others:
all memory comes from a particular region specified when you create the pool
all memory can be freed at once without freeing each individual allocation, by resetting the pool.
If you don't need any special features of controlling allocation yourself then just use vector or global operator new or malloc to allocate your memory. If you do need special features then you'll probably want to take an allocator off the shelf rather than implementing your own. If you really want to get into the details of how a good memory allocator works then look at http://g.oswego.edu/dl/html/malloc.html and perhaps adapt it to your use.
But if you really need to hand-roll an allocator for limited purposes, then the basic idea is that instead of a list of free nodes from which you can always take the first, you need some data structure (your choice what) containing free blocks of different sizes, that allows you to quickly find a block that's big enough to satisfy the current request. In the case where it's much bigger you might choose to split the block, return part of it, and keep the rest as a new smaller free block. In the case where two free blocks are adjacent you might choose to merge them into a single larger free block.
One common strategy is to keep pool-like lists of blocks of certain sizes (for example 16, 32, 64...). If the request is small enough, satisfy it using one of these. If not, do something more complex. But as I say, if you want to see a lot of tricks working together then look at dlmalloc.

What you could do is having fixed sizes and only work on those. For example 400st 32 byte arrays, 200 128b, 100 1024b, 50 8096b or something like that. When something ask for an array of size N you match to the closest size with a free array.
How many you need to each size is probably up for a lot of tweaking.
That would allow you to re-use arrays much more freely than allowing custom sizes.

What exactly are you trying to win from this? Why isn't it enough just to treat each array as an object? Unless you are direly strapped for memory or the time to construct the array elements is really excessive and not to be wasted, this sounds like a classic case of premature optimization. And if the above are your problems, I'd explore other data structures (not arrays) first before plunging into this.
Your time (getting this working and its quirks ironed out will be a week or so, methinks) is way more valuable than a few pennies of computer time or memory saved.

Massive amount of object creation in C++

Is there any pattern how to deal with a lot of object instantiations (40k per second) on a mobile device? I need these objects separately and they cannot be combined. A reusage of objects would probably be a solution. Any hints?

Yes. Keep old objects in a pool and re-use them, if you can.
You will save massive amounts of time due to the cost of memory allocation and deletion.

I think you could consider these design patterns:
Object Pool
Factory
Further info
I hope this help you too: Object Pooling for Generic C++ classes

If the objects are all the same size, try a simple cell allocator with an intrusive linked list of free nodes:
free:
add node to head of list
allocate:
if list is non-empty:
remove the head of the list and return it
else:
allocate a large block of memory
split it into cells of the required size
add all but one of them to the free list
return the other one
If allocation and freeing are all done in a single thread, then you don't need any synchronisation. If they're done in different threads, then possibly 40k context switches per second is a bigger worry than 40k allocations per second ;-)
You can make the cells be just "raw memory" (and either use placement new or overload operator new for your class), or else keep the objects initialized at all times, even when they're on the "free list", and assign whatever values you need to the members of "new" ones. Which you do depends how expensive initialization is, and probably is the technical difference between a cell allocator and an object pool.

You might be able to use the flyweight pattern if your objects are redundant. This pattern shares memory amongst similar objects. The classical example is the data structure used for graphical representation of characters in a word processing program.
Wikipedia has a summary.
There is an implementation in boost.

Hard to say exactly how to improve your code without more information, but you probably want to check out the Boost Pool libraries. They all provide different ways of quickly allocating memory for different, specific use cases. Choose the one that fits your use case best.

If the objects are the same size, you can allocate a large chunk of memory and use placement new, that will help with the allocate cost as it will all be in contiguous memory:
Object *pool = malloc( sizeof(Object) * numberOfObjects );
for(int i=0; i<numberOfObjects; i++)
new (&pool[i]) Object()

I've used similar patterns for programming stochastic reaction-diffusion systems (millions of object creations per second on a desktop computer) and for real-time image processing (again, hundreds of thousands or millions per second).
The basic idea is as follows:
Create an allocator that allocates large arrays of your desired object; require that this object have a "next" pointer (I usually create a template that wraps the object with a next pointer).
Every time you need an object, get one from this allocator (using the new-syntax that initializes from the block of memory you call).
Every time you're done, give it back to the allocator and place it on a stack.
The allocator gives you something off the stack if the stack is nonempty, or something from its array buffer otherwise. If you run out of buffer, you can either allocate another larger buffer and copy the existing used nodes, or have the allocator maintain a stack of fully-used allocation blocks.
When you are done with all the objects, delete the allocator. Side benefit: you don't need to be sure to free each individual object; they'll all go away. Side cost: you'd better be sure to allocate anything you want to preserve forever on the heap instead of in this temporary buffer (or have a permanent buffer you use).
I generally get performance about 10x better than raw malloc/new when using this approach.

Given an Array, is there an algorithm that can allocate memory out of it?

I'm doing some graphics programming and I'm using Vertex pools. I'd like to be able to allocate a range out of the pool and use this for drawing.
Whats different from the solution I need than from a C allocator is that I never call malloc. Instead I preallocate the array and then need an object that wraps that up and keeps track of the free space and allocates a range (a pair of begin/end pointers) from the allocation I pass in.
Much thanks.

in general: you're looking for a memory mangager, which uses a (see wikipedia) memory pool (like the boost::pool as answered by TokenMacGuy). They come in many flavours. Important considerations:
block size (fixed or variable; number of different block sizes; can the block size usage be predicted (statistically)?
efficiency (some managers have 2^n block sizes, i.e. for use in network stacks where they search for best fit block; very good performance and no fragementation at the cost of wasting memory)
administration overhead (I presume that you'll have many, very small blocks; so the number of ints and pointers maintainted by the memory manager is significant for efficiency)
In case of boost::pool, I think the simple segragated storage is worth a look.
It will allow you to configure a memory pool with many different block sizes for which a best-match is searched for.

boost::pool does this for you very nicely!

Instead I preallocate the array and then need an object that wraps that up and keeps track of the free space and allocates a range (a pair of begin/end pointers) from the allocation I pass in.
That's basically what malloc() does internally (malloc() can increase the size of this "preallocated array" if it gets full, though). So yes, there is an algorithm for it. There are many, in fact, and Wikipedia gives a basic overview. Different strategies can work better in different situations. (E.g. if all the blocks are a similar size, or if there's some pattern to allocation and freeing)
If you have many objects of the same size, look into obstacks.
You probably don't want to write the code yourself, it's not an easy task and bugs can be painful.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js