Malloc memory check if contiguous - c++

I am implementing a memory pool - type class. One of the methods allocates B bytes of memory and returns a void pointer to it, while internally handling buffers and moving around older memory to ensure all memory managed by the object during its lifetime is contiguous (similar to how a std::vector will have a pre allocated buffer and allocate extra space once the buffer runs out, copying the information from the old buffer to the new one to ensure all memory is contiguous). My question is, how do I ensure, or check for, all the allocated memory being continuous? If I wish to jump from object to object manually, using
static_cast<desired_type*>(buffer_pointer + N)
This method is naturally going to fail if the location for an object is offset by some amount that isn't just the sum of the sizes of the previous objects. I am new to writing custom memory pools, so I am just wondering, how do I either ensure allocated memory is non fragmented, or access the location of the new fragment so that I can still manually index through a block of malloc()-ed memory? Thank you.

If I understand your question, you're asking if you can have multiple calls to malloc return contiguous memory.
The answer is no, memory will not be contiguous across multiple mallocs as most memory managers will put head/tail data around the allocated memory for both their own management and to put protection markers around the edges to detect overruns - the details are heavily implementation dependent.
for your own memory management you need to allocate a big enough block with malloc and then split it and manage the internals yourself.
You can look at this github proj as an example of the management required:
https://github.com/bcorriveau/memblock

Related

Writing a memory manager and defragmenting memory

The idea is writing a memory manager that allocates a bunch of memory at a time to minimize malloc and free calls, i've tried writing this by my self two times but both times i ran into the problem of defragmenting memory.
You could just check if a block is empty every so often, and if it is empty delete it. But let's say your blocks are 100 bytes each, first you allocate 20 bytes of memory, this will create a new 100 byte block because no blocks exist yet, then you allocate 80 bytes and this fills the first block, then you allocate another 20 bytes and this will create another new block because this first block is full, then you free the second allocation (80 bytes) and that leaves you with two blocks of which only the first 20 bytes are used, this means you have 100 bytes allocated that could be freed by moving the 20 bytes from the second block into the first block and deleting the second block.
These are the problems i ran into:
you can't move the memory around because this means all pointers to that memory will have to be updated, and for that to happen you need to know their addresses, which you don't;
100 bytes is a very small block size, what if i want to store a very low-res (64,64) ARGB image in memory? This will use 16KB of memory and moving all of that might be even slower than just not writing a memory manager at all.
Is it even worth it writing a custom memory manager after all that?
Is it even worth it writing a custom memory manager after all that?
This is asking for an opinion, but I'll try to give a factual answer.
The memory allocators that come with most operating systems and language support libraries are generally very high quality and are designed to address the types of problems you encountered (fragmentation and performance) as well as others. They are about as good as general purpose memory allocators can be.
You can do (a little) better than the provided memory allocator if your application has a particular allocation pattern that can be exploited. That's rare, but you can generally take advantage of it by making something substantially simpler than a general purpose memory manager.
you can't move the memory around
True. Most modern systems don't even try to move memory around--they try to avoid fragmentation to begin with (typically by clustering similarly sized allocations).
Old systems (ones without virtual memory managers) sometimes used memory managers that had an extra layer of indirection. Instead of returning a pointer to the allocated memory, the allocator would return an "handle", which could be as simple as an index into a table maintained by the memory manager. When the user wanted to actually access the memory, they would "lock" it. The memory manager was free to move around memory that wasn't locked (e.g., to eliminate fragmentation) because the handles gave an extra level of indirection.
what if i want to store a very low-res (64,64) ARGB image
Most memory managers provide a range of sizes so a large allocation wouldn't be split across n smaller blocks. Most will punt very large allocations to the system allocator, which, on a virtual memory operating system, can generally solve the problem unless the process address space is overly fragmented.

Sequential memory allocation

I'm planning a application that allocates a lot of variables in memory. In difference from another "regular" application, I want this memory be allocated in specific memory blocks of 4096 bytes. My allocated vars must be placed in memory sequentially. One after another, in order to fill the whole allocated memory.
For example, I'm allocating a region (4096 bytes) in memory and this region is ready for my further use. From now, each time that my application creates a new variable in memory (which is probably made in "regular" application with malloc), this variable will be placed in free space in my memory region.
This sequential memory allocation is similare to how an array allocation works. But, in my case, I need an array that will be able to contain many types of data (string, byte, int, ...).
One possible solution is to achieve this is by pointer arithmetics. I want to avoid this method, this may insert a lot of bugs in my application.
Maybe someone solved this problem before?
Thank you!
malloc() by no means guarantees that subsequent allocated blocks are on sequential memory address. Even worse, most implementations use a small number of bytes before and/or after the allocated block for 'housekeeping'. This means that, even if you're lucky that addresses are sequential, there will be small gaps in between the blocks. So the actual allocated blocks are slightly bigger to make space for those 'housekeeping' bytes.
As you suggest, you'll need to write some code yourself and write a few functions with malloc(), realloc(), ... You can hide all the logic in these functions and should not make your application code using these functions more complex compared to using malloc() if it did what you wanted.
Important questions: Why do you need to have these blocks adjacent to each other? What about freeing blocks?

Managing a Contiguous Chunk of Memory without Malloc/New or Free/Delete

How would one go about creating a custom MemoryManager to manage a given, contiguous chunk of memory without the aid of other memory managers (such as Malloc/New) in C++?
Here's some more context:
MemManager::MemManager(void* memory, unsigned char totalsize)
{
Memory = memory;
MemSize = totalsize;
}
I need to be able to allocate and free up blocks of this contiguous memory using a MemManager. The constructor is given the total size of the chunk in bytes.
An Allocate function should take in the amount of memory required in bytes and return a pointer to the start of that block of memory. If no memory is remaining, a NULL pointer is returned.
A Deallocate function should take in the pointer to the block of memory that must be freed and give it back to the MemManager for future use.
Note the following constraints:
-Aside from the chunk of memory given to it, the MemManager cannot use ANY dynamic memory
-As originally specified, the MemManager CANNOT use other memory managers to perform its functions, including new/malloc and delete/free
I have received this question on several job interviews already, but even hours of researching online did not help me and I have failed every time. I have found similar implementations, but they have all either used malloc/new or were general-purpose and requested memory from the OS, which I am not allowed to do.
Note that I am comfortable using malloc/new and free/delete and have little trouble working with them.
I have tried implementations that utilize node objects in a LinkedList fashion that point to the block of memory allocated and state how many bytes were used. However, with those implementations I was always forced to create new nodes onto the stack and insert them into the list, but as soon as they went out of scope the entire program broke since the addresses and memory sizes were lost.
If anyone has some sort of idea of how to implement something like this, I would greatly appreciate it. Thanks in advance!
EDIT: I forgot to directly specify this in my original post, but the objects allocated with this MemManager can be different sizes.
EDIT 2: I ended up using homogenous memory chunks, which was actually very simple to implement thanks to the information provided by the answers below. The exact rules regarding the implementation itself were not specified, so I separated each block into 8 bytes. If the user requested more than 8 bytes, I would be unable to give it, but if the user requested fewer than 8 bytes (but > 0) then I would give extra memory. If the amount of memory passed in was not divisible by 8 then there would be wasted memory at the end, which I suppose is much better than using more memory than you're given.
I have tried implementations that utilize node objects in a LinkedList
fashion that point to the block of memory allocated and state how many
bytes were used. However, with those implementations I was always
forced to create new nodes onto the stack and insert them into the
list, but as soon as they went out of scope the entire program broke
since the addresses and memory sizes were lost.
You're on the right track. You can embed the LinkedList node in the block of memory you're given with reinterpret_cast<>. Since you're allowed to store variables in the memory manager as long as you don't dynamically allocate memory, you can track the head of the list with a member variable. You might need to pay special attention to object size (Are all objects the same size? Is the object size greater than the size of your linked list node?)
Assuming the answers to the previous questions to be true, you can then process the block of memory and split it off into smaller, object sized chunks using a helper linked list that tracks free nodes. Your free node struct will be something like
struct FreeListNode
{
FreeListNode* Next;
};
When allocating, all you do is remove the head node from the free list and return it. Deallocating is just inserting the freed block of memory into the free list. Splitting the block of memory up is just a loop:
// static_cast only needed if constructor takes a void pointer; can't perform pointer arithmetic on void*
char* memoryEnd = static_cast<char*>(memory) + totalSize;
for (char* blockStart = block; blockStart < memoryEnd; blockStart += objectSize)
{
FreeListNode* freeNode = reinterpret_cast<FreeListNode*>(blockStart);
freeNode->Next = freeListHead;
freeListHead = freeNode;
}
As you mentioned the Allocate function takes in the object size, the above will need to be modified to store metadata. You can do this by including the size of the free block in the free list node data. This removes the need to split up the initial block, but introduces complexity in Allocate() and Deallocate(). You'll also need to worry about memory fragmentation, because if you don't have a free block with enough memory to store the requested amount, there's nothing that you can do other than to fail the allocation. A couple of Allocate() algorithms might be:
1) Just return the first available block large enough to hold the request, updating the free block as necessary. This is O(n) in terms of searching the free list, but might not need to search a lot of free blocks and could lead to fragmentation problems down the road.
2) Search the free list for the block that has the smallest amount free in order to hold the memory. This is still O(n) in terms of searching the free list because you have to look at every node to find the least wasteful one, but can help delay fragmentation problems.
Either way, with variable size, you have to store metadata for allocations somewhere as well. If you can't dynamically allocate at all, the best place is before or after the user requested block; you can add features to detect buffer overflows/underflows during Deallocate() if you want to add padding that is initialized to a known value and check the padding for a difference. You can also add a compact step as mentioned in another answer if you want to handle that.
One final note: you'll have to be careful when adding metadata to the FreeListNode helper struct, as the smallest free block size allowed is sizeof(FreeListNode). This is because you are storing the metadata in the free memory block itself. The more metadata you find yourself needing to store for your internal purposes, the more wasteful your memory manager will be.
When you manage memory, you generally want to use the memory you manage to store any metadata you need. If you look at any of the implementations of malloc (ptmalloc, phkmalloc, tcmalloc, etc...), you'll see that this is how they're generally implemented (neglecting any static data of course). The algorithms and structures are very different, for different reasons, but I'll try to give a little insight into what goes into generic memory management.
Managing homogeneous chunks of memory is different than managing non-homogeneous chunks, and it can be a lot simpler. An example...
MemoryManager::MemoryManager() {
this->map = std::bitset<count>();
this->mem = malloc(size * count);
for (int i = 0; i < count; i++)
this->map.set(i);
}
Allocating is a matter of finding the next bit in the std::bitset (compiler might optimize), marking the chunk as allocated and returning it. De-allocation just requires calculating the index, and marking as unallocated. A free list is another way (what's described here), but it's a little less memory efficient, and might not use CPU cache well.
A free list can be the basis for managing non-homogenous chunks of memory though. With this, you need to store the size of the chunks, in addition to the next pointer in the chunk of memory. The size lets you split larger chunks into smaller chunks. This generally leads to fragmentation though, since merging chunks is non-trivial. This is why most data structures keep lists of same sized chunks, and try to map requests as closely as possible.

Is there any benefit to use multiple heaps for memory management purposes?

I am a student of a system software faculty. Now I'm developing a memory manager for Windows. Here's my simple implementation of malloc() and free():
HANDLE heap = HeapCreate(0, 0, 0);
void* hmalloc(size_t size)
{
return HeapAlloc(heap, 0, size);
}
void hfree(void* memory)
{
HeapFree(heap, 0, memory);
}
int main()
{
int* ptr1 = (int*)hmalloc(100*sizeof(int));
int* ptr2 = (int*)hmalloc(100*sizeof(int));
int* ptr3 = (int*)hmalloc(100*sizeof(int));
hfree(ptr2);
hfree(ptr3);
hfree(ptr1);
return 0;
}
It works fine. But I can't understand is there a reason to use multiple heaps? Well, I can allocate memory in the heap and get the address to an allocated memory chunk. But here I use ONE heap. Is there a reason to use multiple heaps? Maybe for multi-threaded/multi-process applications? Please explain.
The main reason for using multiple heaps/custom allocators are for better memory control. Usually after lots of new/delete's the memory can get fragmented and loose performance for the application (also the app will consume more memory). Using the memory in a more controlled environment can reduce heap fragmentation.
Also another usage is for preventing memory leaks in the application, you could just free the entire heap you allocated and you don't need to bother with freeing all the object allocated there.
Another usage is for tightly allocated objects, if you have for example a list then you could allocate all the nodes in a smaller dedicated heap and the app will gain performance because there will be less cache misses when iterating the nodes.
Edit: memory management is however a hard topic and in some cases it is not done right. Andrei Alexandrescu had a talk at one point and he said that for some application replacing the custom allocator with the default one increased the performance of the application.
This is a good link that elaborates on why you may need multiple heap:
https://caligari.dartmouth.edu/doc/ibmcxx/en_US/doc/libref/concepts/cumemmng.htm
"Why Use Multiple Heaps?
Using a single runtime heap is fine for most programs. However, using multiple
heaps can be more efficient and can help you improve your program's performance
and reduce wasted memory for a number of reasons:
1- When you allocate from a single heap, you may end up with memory blocks on
different pages of memory. For example, you might have a linked list that
allocates memory each time you add a node to the list. If you allocate memory for
other data in between adding nodes, the memory blocks for the nodes could end up
on many different pages. To access the data in the list, the system may have to
swap many pages, which can significantly slow your program.
With multiple heaps, you can specify which heap you allocate from. For example,
you might create a heap specifically for the linked list. The list's memory blocks
and the data they contain would remain close together on fewer pages, reducing the
amount of swapping required.
2- In multithread applications, only one thread can access the heap at a time to
ensure memory is safely allocated and freed. For example, say thread 1 is
allocating memory, and thread 2 has a call to free. Thread 2 must wait until
thread 1 has finished its allocation before it can access the heap. Again, this
can slow down performance, especially if your program does a lot of memory
operations.
If you create a separate heap for each thread, you can allocate from them
concurrently, eliminating both the waiting period and the overhead required to
serialize access to the heap.
3- With a single heap, you must explicitly free each block that you allocate. If you
have a linked list that allocates memory for each node, you have to traverse the
entire list and free each block individually, which can take some time.
If you create a separate heap for that linked list, you can destroy it with a
single call and free all the memory at once.
4- When you have only one heap, all components share it (including the IBM C and
C++ Compilers runtime library, vendor libraries, and your own code). If one
component corrupts the heap, another component might fail. You may have trouble
discovering the cause of the problem and where the heap was damaged.
With multiple heaps, you can create a separate heap for each component, so if
one damages the heap (for example, by using a freed pointer), the others can
continue unaffected. You also know where to look to correct the problem."
A reason would be the scenario that you need to execute a program internally e.g. running simulation code. By creating your own heap you could allow that heap to have execution rights which by default for security reasons is turned off. (Windows)
You have some good thoughts and this'd work for C but in C++ you have destructors, it is VERY important they run.
You can think of all types as having constructors/destructors, just that logically "do nothing".
This is about allocators. See "The buddy algorithm" which uses powers of two to align and re-use stuff.
If I allocate 4 bytes somewhere, my allocator might allocate a 4kb section just for 4 byte allocations. That way I can fit 1024 4 byte things in the block, if I need more add another block and so forth.
Ask it for 4kb and it wont allocate that in the 4byte block, it might have a separate one for larger requests.
This means you can keep big things together. If I go 17 bytes then 13 bytes the 1 byte and the 13byte gets freed, I can only stick something in there of <=13 bytes.
Hence the buddy system and powers of 2, easy to do using lshifts, if I want a 2.5kb block, I allocate it as the smallest power of 2 that'll fit (4kb in this case) that way I can use the slot afterwards for <=4kb items.
This is not for garbage collection, this is just keeping things more compact and neat, using your own allocator can stop calls to the OS (depending on the default implementation of new and delete they might already do this for your compiler) and make new/delete very quick.
Heap-compacting is very different, you need a list of every pointer that points to your heap, or some way to traverse the entire memory graph (like spits Java) so when you move stuff round and "compact" it you can update everything that pointed to that thing to where it currently is.
The only time I ever used more than one heap was when I wrote a program that would build a complicated data structure. It would have been non-trivial to free the data structure by walking through it and freeing the individual nodes, but luckily for me the program only needed the data structure temporarily (while it performed a particular operation), so I used a separate heap for the data structure so that when I no longer needed it, I could free it with one call to HeapDestroy.

How will increasing each memory allocation size by a fixed number of bytes affect heap fragmentation?

I have operator new() replaced in my C++ program so that it allocates a slightly bigger block to store extra data. So the program performs exactly the same set of allocations except that now it requests several bytes more memory in each allocation. Otherwise its behavior is completely the same and it processes exactly same data. The program allocates lots of blocks (millions, I suppose) of various sizes during its runtime.
How will increasing each allocation size by a fixed number of bytes (same for every allocation) affect heap fragmentation?
Unless your program uses some "edge" block sizes (say, near to a power of two), I don't see that block size (or a small difference in block size compared to the program with standard allocation) may affect fragmentation. With millions of allocation, a good allocator fills up the space and manages it efficiently.
Thinking the other way around, imagine your program originally used the blocks of the sizes the same as the one with the modified allocator. Would you then bother about memory fragmentation in that case?
Heaps are normally implemented as linked lists of cell. On application startup there is only one large cell. Your first allocation breaks off a small piece at the beginning to create a new allocated heap cell. Subsequent allocations do the same. After a while some cells are freed leaving free holes between allocated block.
After running a while, when you request an allocation, the allocator walks the heap until it finds a free cell of equal size to that requested or bigger. Rounding up to larger cell allocation sizes may require more memory up front but increases the likelyhood of finding a suitable free cells meaning that new memory does not have to be added to the end of the heap. This may improve performance.
However, bear in mind heap operations are expensive and therefore should be minimized. You are most probably allocating and deallocating objects of the same type and therefore same size. Look into using specialized free lists for your object. This will save the heap operation and thus mimimize fragmentation. The STL has allocators for this very reason.
It depends on the implementation driving the memory allocator, for instance:
On widows, it pulls memory from the process heap, under under XP, this heap its not set to be the low fragmentation implementation, which could really throw a spanner in the works.
Under a bin or slab based allocator, your few extra bytes might push it up to the next block size, wasting memory madly and causing horrible virtual memory thrashing.
Depending on your memory usage needs, you might be better served by using a custom allacator to replace ::new, something like hoard or nedmalloc.
If your blocks (allocated and deallocated memory) are still smaller that a C library allocator handles without problems with fragmentation than you must not face any memory fragmentation. For example take a look at my own question about allocators: Small block allocator on Linux (or RedHat Linux) to avoid memory fragmentation.
In other words. You have implented your own ::operator new() and in it you call malloc() and pass a slighly bigger block size. malloc() is in a C library and it is responsible not only for allocating and deallocating but also for avoiding memory fragmentation. If you do not frequently allocate and free blocks with sizes bigger than the allocator can handle efficently then you can expect that there will be on memory fragmentation.