The problems of basing an allocator on static memory - c++

I've recently come up with an idea of pooling memory in a C++ program with a statically initialized array (ie. "static byte Memory[Size]"). The process of allocating blocks of memory would be similar to calling malloc up front for the pool. I then built this allocator and there seems to be no problems. Is there any issues with this allocator architecture other than limited memory space?
Note: The way I reserve blocks of memory is by creating a linked list of blocks that store size, pointer and neighbors. Not that it is important to the question.

TI's sysbios actually provides an implementation for their microcontrollers that looks similar to what you describe. They call this implementation heapbuf
The heapbuf is a single array that has been split up in n equally sized blocks. The heapbuf also holds the pointer to the first empty block, and every empty block holds the pointer to the next empty element.
This specific implementation is only able to allocate blocks of exactly that size, however has no fragmentation issues. If you have a lot of small objects of roughly the same size, it should offer great performance (as allocation is of constant cost and only requires you to update the pointer to the first empty block).
Your note however, is important to the question. And the issue with blocks of different sizes is as follows:
The first block might not fit, so you need to do a list traversal until you find a large enough open space.
After you have allocated all your memory in blocks of 1k, you decide to free them all and now you have a linked list of many free 1k blocks, have you decided on how to and when to merge them?
The developers of freeRTOS also ran into these issues and therefore provide multiple heaps that take different compromises (eg, fast, but no freeing at all, or a bit slower but no merging of freed blocks, or slow but merging of freed blocks). There is an overview of their implementations here: http://www.freertos.org/a00111.html

Related

Sequential memory allocation

I'm planning a application that allocates a lot of variables in memory. In difference from another "regular" application, I want this memory be allocated in specific memory blocks of 4096 bytes. My allocated vars must be placed in memory sequentially. One after another, in order to fill the whole allocated memory.
For example, I'm allocating a region (4096 bytes) in memory and this region is ready for my further use. From now, each time that my application creates a new variable in memory (which is probably made in "regular" application with malloc), this variable will be placed in free space in my memory region.
This sequential memory allocation is similare to how an array allocation works. But, in my case, I need an array that will be able to contain many types of data (string, byte, int, ...).
One possible solution is to achieve this is by pointer arithmetics. I want to avoid this method, this may insert a lot of bugs in my application.
Maybe someone solved this problem before?
Thank you!
malloc() by no means guarantees that subsequent allocated blocks are on sequential memory address. Even worse, most implementations use a small number of bytes before and/or after the allocated block for 'housekeeping'. This means that, even if you're lucky that addresses are sequential, there will be small gaps in between the blocks. So the actual allocated blocks are slightly bigger to make space for those 'housekeeping' bytes.
As you suggest, you'll need to write some code yourself and write a few functions with malloc(), realloc(), ... You can hide all the logic in these functions and should not make your application code using these functions more complex compared to using malloc() if it did what you wanted.
Important questions: Why do you need to have these blocks adjacent to each other? What about freeing blocks?

Is it possible to implement a memory pool that works with arrays instead of single objects?

I know it's easy to make a memory pool for single objects, however I need to make a memory pool for arrays. The memory pool I have currently has a vector of addresses to contiguous memory blocks and a stack that points to each object from these blocks, so when you allocate from the pool you just pop the stack and when you free, you just push an object's address back to it. However I also need an array equivalent. Something like this:
template<typename T>
class ArrayPool
{
public:
ArrayPool();
~ArrayPool();
T* AllocateArray(int x); //Returns a pointer to a T array that contains 'x' elements.
void FreeArray(T* arr, int x); //Returns the array to the free address list/stack/whatever/
};
Has such a thing been implemented? I imagine a big problem from having such a pool - if make sure arrays returned by ALlocateArray are contiguous in memory, I'm basically doing the same as if not having a memorypool. Just allocating arrays on the spot. With the normal object pool every time I just allocate 1 object. With the arrays I may allocate a different sized array every time, so once an array is freed, it won't be compatible with a new one of different size, unless I stich arrays together with some linkedlist-like structure, but then they won't be contiguous.
Currently your allocator takes advantage of the fact that all allocations are the same size. This simplifies and speeds up allocation and freeing, and means memory fragmentation is impossible.
If you have to allocate arrays of any size, then what you want is a general-purpose allocator, not a pool allocator. What to do next depends why you're using a pool allocator in the first place. I can think of two other features of a pool allocator that might be relevant, and there may be others:
all memory comes from a particular region specified when you create the pool
all memory can be freed at once without freeing each individual allocation, by resetting the pool.
If you don't need any special features of controlling allocation yourself then just use vector or global operator new or malloc to allocate your memory. If you do need special features then you'll probably want to take an allocator off the shelf rather than implementing your own. If you really want to get into the details of how a good memory allocator works then look at http://g.oswego.edu/dl/html/malloc.html and perhaps adapt it to your use.
But if you really need to hand-roll an allocator for limited purposes, then the basic idea is that instead of a list of free nodes from which you can always take the first, you need some data structure (your choice what) containing free blocks of different sizes, that allows you to quickly find a block that's big enough to satisfy the current request. In the case where it's much bigger you might choose to split the block, return part of it, and keep the rest as a new smaller free block. In the case where two free blocks are adjacent you might choose to merge them into a single larger free block.
One common strategy is to keep pool-like lists of blocks of certain sizes (for example 16, 32, 64...). If the request is small enough, satisfy it using one of these. If not, do something more complex. But as I say, if you want to see a lot of tricks working together then look at dlmalloc.
What you could do is having fixed sizes and only work on those. For example 400st 32 byte arrays, 200 128b, 100 1024b, 50 8096b or something like that. When something ask for an array of size N you match to the closest size with a free array.
How many you need to each size is probably up for a lot of tweaking.
That would allow you to re-use arrays much more freely than allowing custom sizes.
What exactly are you trying to win from this? Why isn't it enough just to treat each array as an object? Unless you are direly strapped for memory or the time to construct the array elements is really excessive and not to be wasted, this sounds like a classic case of premature optimization. And if the above are your problems, I'd explore other data structures (not arrays) first before plunging into this.
Your time (getting this working and its quirks ironed out will be a week or so, methinks) is way more valuable than a few pennies of computer time or memory saved.

Is there any benefit to use multiple heaps for memory management purposes?

I am a student of a system software faculty. Now I'm developing a memory manager for Windows. Here's my simple implementation of malloc() and free():
HANDLE heap = HeapCreate(0, 0, 0);
void* hmalloc(size_t size)
{
return HeapAlloc(heap, 0, size);
}
void hfree(void* memory)
{
HeapFree(heap, 0, memory);
}
int main()
{
int* ptr1 = (int*)hmalloc(100*sizeof(int));
int* ptr2 = (int*)hmalloc(100*sizeof(int));
int* ptr3 = (int*)hmalloc(100*sizeof(int));
hfree(ptr2);
hfree(ptr3);
hfree(ptr1);
return 0;
}
It works fine. But I can't understand is there a reason to use multiple heaps? Well, I can allocate memory in the heap and get the address to an allocated memory chunk. But here I use ONE heap. Is there a reason to use multiple heaps? Maybe for multi-threaded/multi-process applications? Please explain.
The main reason for using multiple heaps/custom allocators are for better memory control. Usually after lots of new/delete's the memory can get fragmented and loose performance for the application (also the app will consume more memory). Using the memory in a more controlled environment can reduce heap fragmentation.
Also another usage is for preventing memory leaks in the application, you could just free the entire heap you allocated and you don't need to bother with freeing all the object allocated there.
Another usage is for tightly allocated objects, if you have for example a list then you could allocate all the nodes in a smaller dedicated heap and the app will gain performance because there will be less cache misses when iterating the nodes.
Edit: memory management is however a hard topic and in some cases it is not done right. Andrei Alexandrescu had a talk at one point and he said that for some application replacing the custom allocator with the default one increased the performance of the application.
This is a good link that elaborates on why you may need multiple heap:
https://caligari.dartmouth.edu/doc/ibmcxx/en_US/doc/libref/concepts/cumemmng.htm
"Why Use Multiple Heaps?
Using a single runtime heap is fine for most programs. However, using multiple
heaps can be more efficient and can help you improve your program's performance
and reduce wasted memory for a number of reasons:
1- When you allocate from a single heap, you may end up with memory blocks on
different pages of memory. For example, you might have a linked list that
allocates memory each time you add a node to the list. If you allocate memory for
other data in between adding nodes, the memory blocks for the nodes could end up
on many different pages. To access the data in the list, the system may have to
swap many pages, which can significantly slow your program.
With multiple heaps, you can specify which heap you allocate from. For example,
you might create a heap specifically for the linked list. The list's memory blocks
and the data they contain would remain close together on fewer pages, reducing the
amount of swapping required.
2- In multithread applications, only one thread can access the heap at a time to
ensure memory is safely allocated and freed. For example, say thread 1 is
allocating memory, and thread 2 has a call to free. Thread 2 must wait until
thread 1 has finished its allocation before it can access the heap. Again, this
can slow down performance, especially if your program does a lot of memory
operations.
If you create a separate heap for each thread, you can allocate from them
concurrently, eliminating both the waiting period and the overhead required to
serialize access to the heap.
3- With a single heap, you must explicitly free each block that you allocate. If you
have a linked list that allocates memory for each node, you have to traverse the
entire list and free each block individually, which can take some time.
If you create a separate heap for that linked list, you can destroy it with a
single call and free all the memory at once.
4- When you have only one heap, all components share it (including the IBM C and
C++ Compilers runtime library, vendor libraries, and your own code). If one
component corrupts the heap, another component might fail. You may have trouble
discovering the cause of the problem and where the heap was damaged.
With multiple heaps, you can create a separate heap for each component, so if
one damages the heap (for example, by using a freed pointer), the others can
continue unaffected. You also know where to look to correct the problem."
A reason would be the scenario that you need to execute a program internally e.g. running simulation code. By creating your own heap you could allow that heap to have execution rights which by default for security reasons is turned off. (Windows)
You have some good thoughts and this'd work for C but in C++ you have destructors, it is VERY important they run.
You can think of all types as having constructors/destructors, just that logically "do nothing".
This is about allocators. See "The buddy algorithm" which uses powers of two to align and re-use stuff.
If I allocate 4 bytes somewhere, my allocator might allocate a 4kb section just for 4 byte allocations. That way I can fit 1024 4 byte things in the block, if I need more add another block and so forth.
Ask it for 4kb and it wont allocate that in the 4byte block, it might have a separate one for larger requests.
This means you can keep big things together. If I go 17 bytes then 13 bytes the 1 byte and the 13byte gets freed, I can only stick something in there of <=13 bytes.
Hence the buddy system and powers of 2, easy to do using lshifts, if I want a 2.5kb block, I allocate it as the smallest power of 2 that'll fit (4kb in this case) that way I can use the slot afterwards for <=4kb items.
This is not for garbage collection, this is just keeping things more compact and neat, using your own allocator can stop calls to the OS (depending on the default implementation of new and delete they might already do this for your compiler) and make new/delete very quick.
Heap-compacting is very different, you need a list of every pointer that points to your heap, or some way to traverse the entire memory graph (like spits Java) so when you move stuff round and "compact" it you can update everything that pointed to that thing to where it currently is.
The only time I ever used more than one heap was when I wrote a program that would build a complicated data structure. It would have been non-trivial to free the data structure by walking through it and freeing the individual nodes, but luckily for me the program only needed the data structure temporarily (while it performed a particular operation), so I used a separate heap for the data structure so that when I no longer needed it, I could free it with one call to HeapDestroy.

C++ heap organisation - which data structure?

C++ has two main memory types: heap and stack. With stack everything is clear to me, but concerning heap a question remains:
How is heap memory organised? I read about the heap data structure, but that seems not the applicable to the heap, because in C++ I can access any data from the heap, not just a minimum or maximum value.
So my question is: how is heap memory organized in C++? When we read "is allocated on the heap" what memory organization do we have in mind?
A common (though certainly not the only) implementation of the free store is a linked list of free blocks of memory (i.e., blocks not currently in use). The heap manager will typically allocate large blocks of memory (e.g., 1 megabyte at a time) from the operating system, then split these blocks up into pieces for use by your code when you use new and such. When you use delete, the block of memory you quit using will be added as a node on that linked list.
There are various strategies for how you use those free blocks. Three common ones are best fit, worst fit and first fit.
In best fit, you try to find a free block that's closest in size to the required allocation. If it's larger than required (typically after rounding the allocation size) you split it into two pieces: one to return for the allocation, and a new (smaller) free block to put back on the list to meet other allocation requests. Although this can seem like a good strategy, it's often problematic. The problem is that when you find the closest fit, the left-over block is often too small to be of much use. After a short time, you end up with a huge number of tiny blocks of free space, none of which is good for much of anything.
Worst fit combats that by instead finding the worst fit among the free blocks -- IOW, the largest block of free space. When it splits that block, what's left over will be as large as possible, maximizing the chance that it'll be useful for other allocations.
First fist just walks through the list of free blocks, and (as you'd guess from the name) uses the first block that's large enough to meet the requirement.
Quite a few also start with a search for an exact fit, and use that in preference to splitting a block.
Quite a few also keep (for example) a number of separate linked lists for different allocation sizes to minimize searching for a block of the right size.
In quite a few cases, the manager also has some code to walk through the list of free blocks, to find any that are right next to each other. If it finds them, it will join the two small blocks into one larger bock. Sometimes this is done right when you free/delete a block. More often, it's done lazily, to avoid joining then re-splitting blocks when/if you're using a lot of the same size of blocks (which is fairly common).
Another possibility that's common when dealing with a large number of identically-sized items (especially small ones) is an array of blocks, with a bitset to specify which are free or in use. In this case, you typically keep track of an index into the bitset where the last free block was found. When a block is needed, just search forward from the last index until you find the next one for which the bitset says the block is free.
Heap has two main meanings with two different concepts:
A data structure for arrange data, a tree.
A pool of available memory. It manages memory by allocate/free available memory blocks.
An introduction about Heap in Memory Management:
The heap is the other dynamic memory area, allocated/freed by
malloc/free and their variants....
Here heap doesn't mean the heap data structure. That memory is termed as heap memory where global, static variables are stored. Also when we allocate memory dynamically, it is allocated in the heap memory.
When you read "allocated on the heap", that's usually an implementation-specific implementation of C++'s "dynamic storage duration". It means you've used new to allocate memory, and have a pointer to it that you now have to keep track of til you delete (or delete[]) it.
As for how it's organized, there's no one set way of doing so. At one point, if memory serves, the "heap" used to actually be a min-heap (organized by block size). That's implementation specific, though, and doesn't have to be that way. Any data structure will do, some better than others for given circumstances.

Designing and coding a non-fragmentizing static memory pool

I have heard the term before and I would like to know how to design and code one.
Should I use the STL allocator if available?
How can it be done on devices with no OS?
What are the tradeoffs between using it and using the regular compiler implemented malloc/new?
I would suggest that you should know that you need a non-fragmenting memory allocator before you put much effort into writing your own. The one provided by the std library is usually sufficient.
If you need one, the general idea of reducing fragmentation is to grab large blocks of memory at once and allocate from the pool rather than asking the OS to provide you with heap memory sporadically and at highly varying places within the heap and interspersed with many other objects with varying sizes. Since the author of the specialized memory allocator has more knowledge on the size of the objects allocated from the pool and how those allocations occur, the allocator can use the memory more efficiently than a general purpose allocator such as the one provided by the STL.
You can look at memory allocators such as Hoard which while reducing memory fragmentation, also can increase performance by providing thread specific heaps which reduce contention. This can help your application scale more linearly, especially on multi-core platforms.
More info on multi-threaded allocators can be found here.
Will try to describe what is essentially a memory pool - I'm just typing this off the top of my head, been a while since I've implemented one, if something is obviously stupid, it's just a suggestion! :)
1.
To reduce fragmentation, you need to create a memory pool that is specific to the type of object you are allocating in it. Essentially, you then restrict the size of each allocation to the size of the object you are interested in. You could implement a templated class which has a list of dynamically allocated blocks (the reason for the list being that you can grow the amount of space available). Each dynamically allocated block would essentially be an array of T.
You would then have a "free" list, which is a singly linked list, where the head points to the next available block. Allocation is then simply returning the head. You could overlay the linked list in the block itself, i.e. each "block" (which represents the aligned size of T), would essentially be a union of T and a node in the linked list, when allocated, it's T, when freed, a node in the list. !!There are obvious dangers!! Alternatively, you could allocate a separate (and protected block, which adds more overhead) to hold an array of addresses in the block.
Allocating is trivial, iterate through the list of blocks and allocate from first available, freeing is also trivial, the additional check you have to do is the find the block from which this is allocated and then update the head pointer. (note, you'll need to use either placement new or override the operator new/delete in T - there are ways around this, google is your friend)
The "static" I believe implies a singleton memory pool for all objects of type T. The downside is that for each T you have to have a separate memory pool. You could be smart, and have a single object that manages pools of different size (using an array of pointers to pool objects where the index is the size of the object for example).
The whole point of the previous paragraph is to outline exactly how complex this is, and like RC says above, be sure you need it before you do it - as it is likely to introduce more pain than may be necessary!
2.
If the STL allocator meets your needs, use it, it's designed by some very smart people who know what they are doing - however it is for the generic case and if you know how your objects are allocated, you could make the above perform faster.
3.
You need to be able to allocate memory somehow (hardware support or some sort of HAL - whatever) - else I'm not sure how your program would work?
4.
The regular malloc/new does a lot more stuff under the covers (google is your friend, my answer is already an essay!) The simple allocator I describe above isn't re-entrant, of course you could wrap it with a mutex to provide a bit of cover, and even then, I would hazard that the simple allocator would perform orders of magnitude faster than normal malloc/free.
But if you're at this stage of optimization - presumably you've exhausted the possibility of optimizing your algorithms and data structure usage?