Custom malloc implementation - c++

Recently I was asked a question to implement a very simple malloc with the following restrictions and initial conditions.
#define HEAP_SIZE 2048
int main()
{
privateHeap = malloc(HEAP_SIZE + 256); //extra 256 bytes for heap metadata
void* ptr = mymalloc( size_t(750) );
myfree( ptr );
return 0;
}
I need to implement mymalloc and myfree here using the exact space provided. 256 bytes is nicely mapping to 2048 bits, and I can have a bit array storing if a byte is allocated or if it is free. But when I make a myfree call with ptr, I cannot tell how much size was allocated to begin with. I cannot use any extra bits.
I don't seem to think there is a way around this, but I've been reiterated that it can be done. Any suggestions ?
EDIT 1:
Alignment restrictions don't exist. I assumed I am not going to align anything.
There was a demo program that did a series of mallocs and frees to test this, and it didn't have any memory blocks that were small. But that doesn't guarantee anything.
EDIT 2:
The guidelines from the documentation:
Certain Guidelines on your code:
Manage the heap metadata in the private heap; do not create extra linked lists outside of the provided private heap;
Design mymalloc, myrealloc, myFree to work for all possible inputs.
myrealloc should do the following like the realloc in C++ library:
void* myrealloc( void* C, size_t newSize ):
If newSize is bigger than the size of chunk in reallocThis:
It should first try to allocate a chunk of size newSize in place so that new chunk's base pointer also is reallocThis;
If there is no free space available to do in place allocation, it should allocate a chunk of requested size in a different region;
and then it should copy the contents from the previous chunk.
If the function failed to allocate the requested block of memory, a NULL pointer is returned, and the memory block pointed to
by argument reallocThis is left unchanged.
If newSize is smaller, realloc should shrink the size of the chunk and should always succeed.
If newSize is 0, it should work like free.
If reallocThis is NULL, it should work like malloc.
If reallocThis is pointer that was already freed, then it should fail gracefully by returning NULL
myFree should not crash when it is passed a pointer that has already been freed.

A common way malloc implementations keep track of the size of memory allocations so free knows how big they are is to store the size in the bytes before pointer return by malloc. So say you only need two bytes to store the length, when the caller of malloc requests n bytes of memory, you actually allocate n + 2 bytes. You then store the length in the first two bytes, and return a pointer to the byte just past where you stored the size.
As for your algorithm generally, a simple and naive implementation is to keep track of unallocated memory with a linked list of free memory blocks that are kept in order of their location in memory. To allocate space you search for a free block that's big enough. You then modify the free list to exclude that allocation. To free a block you add it back to the free list, coalescing adjacent free blocks.
This isn't a good malloc implementation by modern standards, but a lot of old memory allocators worked this way.

You seem to be thinking of the 256 bytes of meta-data as a bit-map to track free/in-use on a byte-by-byte basis.
I'd consider the following as only one possible alternative:
I'd start by treating the 2048-byte heap as a 1024 "chunks" of 2 bytes each. This gives you 2 bits of information for each chunk. You can treat the first of those as signifying whether that chunk is in use, and the second as signifying whether the following chunk is part of the same logical block as the current one.
When your free function is called, you use the passed address to find the correct beginning point in your bitmap. You then walk through bits marking each chunk as free until you reach one where the second bit is set to 0, indicating the end of the current logical block (i.e., that the next 2 byte chunk is not part of the current logical block).
[Oops: just noticed that Ross Ridge already suggested nearly the same basic idea in a comment.]

Related

Managing a Contiguous Chunk of Memory without Malloc/New or Free/Delete

How would one go about creating a custom MemoryManager to manage a given, contiguous chunk of memory without the aid of other memory managers (such as Malloc/New) in C++?
Here's some more context:
MemManager::MemManager(void* memory, unsigned char totalsize)
{
Memory = memory;
MemSize = totalsize;
}
I need to be able to allocate and free up blocks of this contiguous memory using a MemManager. The constructor is given the total size of the chunk in bytes.
An Allocate function should take in the amount of memory required in bytes and return a pointer to the start of that block of memory. If no memory is remaining, a NULL pointer is returned.
A Deallocate function should take in the pointer to the block of memory that must be freed and give it back to the MemManager for future use.
Note the following constraints:
-Aside from the chunk of memory given to it, the MemManager cannot use ANY dynamic memory
-As originally specified, the MemManager CANNOT use other memory managers to perform its functions, including new/malloc and delete/free
I have received this question on several job interviews already, but even hours of researching online did not help me and I have failed every time. I have found similar implementations, but they have all either used malloc/new or were general-purpose and requested memory from the OS, which I am not allowed to do.
Note that I am comfortable using malloc/new and free/delete and have little trouble working with them.
I have tried implementations that utilize node objects in a LinkedList fashion that point to the block of memory allocated and state how many bytes were used. However, with those implementations I was always forced to create new nodes onto the stack and insert them into the list, but as soon as they went out of scope the entire program broke since the addresses and memory sizes were lost.
If anyone has some sort of idea of how to implement something like this, I would greatly appreciate it. Thanks in advance!
EDIT: I forgot to directly specify this in my original post, but the objects allocated with this MemManager can be different sizes.
EDIT 2: I ended up using homogenous memory chunks, which was actually very simple to implement thanks to the information provided by the answers below. The exact rules regarding the implementation itself were not specified, so I separated each block into 8 bytes. If the user requested more than 8 bytes, I would be unable to give it, but if the user requested fewer than 8 bytes (but > 0) then I would give extra memory. If the amount of memory passed in was not divisible by 8 then there would be wasted memory at the end, which I suppose is much better than using more memory than you're given.
I have tried implementations that utilize node objects in a LinkedList
fashion that point to the block of memory allocated and state how many
bytes were used. However, with those implementations I was always
forced to create new nodes onto the stack and insert them into the
list, but as soon as they went out of scope the entire program broke
since the addresses and memory sizes were lost.
You're on the right track. You can embed the LinkedList node in the block of memory you're given with reinterpret_cast<>. Since you're allowed to store variables in the memory manager as long as you don't dynamically allocate memory, you can track the head of the list with a member variable. You might need to pay special attention to object size (Are all objects the same size? Is the object size greater than the size of your linked list node?)
Assuming the answers to the previous questions to be true, you can then process the block of memory and split it off into smaller, object sized chunks using a helper linked list that tracks free nodes. Your free node struct will be something like
struct FreeListNode
{
FreeListNode* Next;
};
When allocating, all you do is remove the head node from the free list and return it. Deallocating is just inserting the freed block of memory into the free list. Splitting the block of memory up is just a loop:
// static_cast only needed if constructor takes a void pointer; can't perform pointer arithmetic on void*
char* memoryEnd = static_cast<char*>(memory) + totalSize;
for (char* blockStart = block; blockStart < memoryEnd; blockStart += objectSize)
{
FreeListNode* freeNode = reinterpret_cast<FreeListNode*>(blockStart);
freeNode->Next = freeListHead;
freeListHead = freeNode;
}
As you mentioned the Allocate function takes in the object size, the above will need to be modified to store metadata. You can do this by including the size of the free block in the free list node data. This removes the need to split up the initial block, but introduces complexity in Allocate() and Deallocate(). You'll also need to worry about memory fragmentation, because if you don't have a free block with enough memory to store the requested amount, there's nothing that you can do other than to fail the allocation. A couple of Allocate() algorithms might be:
1) Just return the first available block large enough to hold the request, updating the free block as necessary. This is O(n) in terms of searching the free list, but might not need to search a lot of free blocks and could lead to fragmentation problems down the road.
2) Search the free list for the block that has the smallest amount free in order to hold the memory. This is still O(n) in terms of searching the free list because you have to look at every node to find the least wasteful one, but can help delay fragmentation problems.
Either way, with variable size, you have to store metadata for allocations somewhere as well. If you can't dynamically allocate at all, the best place is before or after the user requested block; you can add features to detect buffer overflows/underflows during Deallocate() if you want to add padding that is initialized to a known value and check the padding for a difference. You can also add a compact step as mentioned in another answer if you want to handle that.
One final note: you'll have to be careful when adding metadata to the FreeListNode helper struct, as the smallest free block size allowed is sizeof(FreeListNode). This is because you are storing the metadata in the free memory block itself. The more metadata you find yourself needing to store for your internal purposes, the more wasteful your memory manager will be.
When you manage memory, you generally want to use the memory you manage to store any metadata you need. If you look at any of the implementations of malloc (ptmalloc, phkmalloc, tcmalloc, etc...), you'll see that this is how they're generally implemented (neglecting any static data of course). The algorithms and structures are very different, for different reasons, but I'll try to give a little insight into what goes into generic memory management.
Managing homogeneous chunks of memory is different than managing non-homogeneous chunks, and it can be a lot simpler. An example...
MemoryManager::MemoryManager() {
this->map = std::bitset<count>();
this->mem = malloc(size * count);
for (int i = 0; i < count; i++)
this->map.set(i);
}
Allocating is a matter of finding the next bit in the std::bitset (compiler might optimize), marking the chunk as allocated and returning it. De-allocation just requires calculating the index, and marking as unallocated. A free list is another way (what's described here), but it's a little less memory efficient, and might not use CPU cache well.
A free list can be the basis for managing non-homogenous chunks of memory though. With this, you need to store the size of the chunks, in addition to the next pointer in the chunk of memory. The size lets you split larger chunks into smaller chunks. This generally leads to fragmentation though, since merging chunks is non-trivial. This is why most data structures keep lists of same sized chunks, and try to map requests as closely as possible.

Memory Demands: Heap vs Stack in C++

So I had a strange experience this evening.
I was working on a program in C++ that required some way of reading a long list of simple data objects from file and storing them in the main memory, approximately 400,000 entries. The object itself is something like:
class Entry
{
public:
Entry(int x, int y, int type);
Entry(); ~Entry();
// some other basic functions
private:
int m_X, m_Y;
int m_Type;
};
Simple, right? Well, since I needed to read them from file, I had some loop like
Entry** globalEntries;
globalEntries = new Entry*[totalEntries];
entries = new Entry[totalEntries];// totalEntries read from file, about 400,000
for (int i=0;i<totalEntries;i++)
{
globalEntries[i] = new Entry(.......);
}
That addition to the program added about 25 to 35 megabytes to the program when I tracked it on the task manager. A simple change to stack allocation:
Entry* globalEntries;
globalEntries = new Entry[totalEntries];
for (int i=0;i<totalEntries;i++)
{
globalEntries[i] = Entry(.......);
}
and suddenly it only required 3 megabytes. Why is that happening? I know pointer objects have a little bit of extra overhead to them (4 bytes for the pointer address), but it shouldn't be enough to make THAT much of a difference. Could it be because the program is allocating memory inefficiently, and ending up with chunks of unallocated memory in between allocated memory?
Your code is wrong, or I don't see how this worked. With new Entry [count] you create a new array of Entry (type is Entry*), yet you assign it to Entry**, so I presume you used new Entry*[count].
What you did next was to create another new Entry object on the heap, and storing it in the globalEntries array. So you need memory for 400.000 pointers + 400.000 elements. 400.000 pointers take 3 MiB of memory on a 64-bit machine. Additionally, you have 400.000 single Entry allocations, which will all require sizeof (Entry) plus potentially some more memory (for the memory manager -- it might have to store the size of allocation, the associated pool, alignment/padding, etc.) These additional book-keeping memory can quickly add up.
If you change your second example to:
Entry* globalEntries;
globalEntries = new Entry[count];
for (...) {
globalEntries [i] = Entry (...);
}
memory usage should be equal to the stack approach.
Of course, ideally you'll use a std::vector<Entry>.
First of all, without specifying which column exactly you were watching, the number in task manager means nothing. On a modern operating system it's difficult even to define what you mean with "used memory" - are we talking about private pages? The working set? Only the stuff that stays in RAM? does reserved but not committed memory count? Who pays for memory shared between processes? Are memory mapped file included?
If you are watching some meaningful metric, it's impossible to see 3 MB of memory used - your object is at least 12 bytes (assuming 32 bit integers and no padding), so 400000 elements will need about 4.58 MB. Also, I'd be surprised if it worked with stack allocation - the default stack size in VC++ is 1 MB, you should already have had a stack overflow.
Anyhow, it is reasonable to expect a different memory usage:
the stack is (mostly) allocated right from the beginning, so that's memory you nominally consume even without really using it for anything (actually virtual memory and automatic stack expansion makes this a bit more complicated, but it's "true enough");
the CRT heap is opaque to the task manager: all it sees is the memory given by the operating system to the process, not what the C heap has "really" in use; the heap grows (requesting memory to the OS) more than strictly necessary to be ready for further memory requests - so what you see is how much memory it is ready to give away without further syscalls;
your "separate allocations" method has a significant overhead. The all-contiguous array you'd get with new Entry[size] costs size*sizeof(Entry) bytes, plus the heap bookkeeping data (typically a few integer-sized fields); the separated allocations method costs at least size*sizeof(Entry) (size of all the "bare elements") plus size*sizeof(Entry *) (size of the pointer array) plus size+1 multiplied by the cost of each allocation. If we assume a 32 bit architecture with a cost of 2 ints per allocation, you quickly see that this costs size*24+8 bytes of memory, instead of size*12+8 for the contiguous array in the heap;
the heap normally really gives away blocks that aren't really the size you asked for, because it manages blocks of fixed size; so, if you allocate single objects like that you are probably paying also for some extra padding - supposing it has 16 bytes blocks, you are paying 4 bytes extra per element by allocating them separately; this moves out memory estimation to size*28+8, i.e. an overhead of 16 bytes per each 12-byte element.

Address of start block storage

The allocation function attempts to allocate the requested amount of
storage.
If it is successful, it shall return the address of the start of a
block of storage whose length in bytes shall be at least as large as
the requested size.
What does that constraint mean? Could you get an example it violates?
It seems, my question is unclear.
UPD:
Why "at least"? What is the point of allocation more than requested size? Could you get suitable example?
The allowance for allocating "more than required" is there to allow:
Good alignment of the next block of data.
Reduce restrictions on what platforms are able to run code compiled from C and C++.
Flexibility in the design of the memory allocation functionality.
An example of point one is:
char *p1 = new char[1];
int *p2 = new int[1];
If we allocate exactly 1 byte at address 0x1000 for the first allocation, and follow that exactly with a second allocation of 4 bytes for an int, the int will start at address 0x1001. This is "valid" on some architectures, but often leads to a "slower load of the value", on other architectures it will directly lead to a crash, because an int is not accessible on an address that isn't an even multiple of 4. Since the underlying architecture of new doesn't actually know what the memory is eventually going to be used for, it's best to allocate it at "the highest alignment", which in most architectures means 8 or 16 bytes. (If the memory is used for example to store SSE data, it will need an alignment of 16 bytes)
The second case would be where "pointers can only point to whole blocks of 32-bit words". There have been architectures like that in the past. In this case, even if we ignore the above problem with alignment, the memory location specified by a generic pointer is two parts, one for the actual address, and one for the "which byte within that word". In the memory allocator, since typical allocations are much larger than a single byte, we decide to only use the "full word" pointer, so all allocations are by design always rounded up to whole words.
The third case, for example, would be to use a "pre-sized block" allocator. Some real-time OS's for example will have a fixed number of predefined sizes that they allocate - for example 16, 32, 64, 256, 1024, 16384, 65536, 1M, 16M bytes. Allocations are then rounded up to the nearest equal or larger size, so an allocation for 257 bytes would be allocated from the 1024 size. The idea here is to a) provide fast allocation, by keeping track of free blocks in each size, rather than the traditional model of having a large number of blocks in any size to search through to see if there is a big enough block. It also helps against fragmentation (when lots of memory is "free", but the wrong size, so can't be used - for example, if run a loop until the system is out of memory that allocates blocks of 64 bytes, then free every other, and try to allocate a 128 byte block, there is not a single 128 byte block free, because ALL of the memory is carved up into little 64-byte sections).
It means that the allocation function shall return an address of a block of memory whose size is at least the size you requested.
Most of the allocation functions, however, shall return an address of a memory block whose size is bigger than the one you requested and the next allocations will return an address inside this block until it reaches the end of it.
The main reasons for this behavior are:
Minimize the number of new memory block allocations (each block can contain several allocations), which are expensive in terms of time complexity.
Specific alignment issues.
The two most common reasons why allocation
returns a block larger than requested is
Alignment
Bookkeeping
It may be such because in modern operating systems it is much effective to allocate memory page that can be of 512 kb. So internal malloc functionality can just alloc this page of memory, fill its beginning with some service information as to in how many sub-blocks it's divided into, theirs sizes, etc. And only after that it returns to you an address suitable for your needs. Next call to malloc will return another portion of this allocated page for instance. It doesn't matter that the block of memory is restricted to the size you requested. In fact you can heap-overflow this buffer because of no safe mechanism to prevent this sort of activity. Also you can consider alignment questions that other responders stated above. There are enough of types of memory management. You can google it if you are interested enough(Best Fit, First Fit, Last Fit, correct me if I'm wrong).

Why is the heap after array allocation so large

I've got a very basic application that boils down to the following code:
char* gBigArray[200][200][200];
unsigned int Initialise(){
for(int ta=0;ta<200;ta++)
for(int tb=0;tb<200;tb++)
for(int tc=0;tc<200;tc++)
gBigArray[ta][tb][tc]=new char;
return sizeof(gBigArray);
}
The function returns the expected value of 32000000 bytes, which is approximately 30MB, yet in the Windows Task Manager (and granted it's not 100% accurate) gives a Memory (Private Working Set) value of around 157MB. I've loaded the application into VMMap by SysInternals and have the following values:
I'm unsure what Image means (listed under Type), although irrelevant of that its value is around what I'm expecting. What is really throwing things out for me is the Heap value, which is where the apparent enormous size is coming from.
What I don't understand is why this is? According to this answer if I've understood it correctly, gBigArray would be placed in the data or bss segment - however I'm guessing as each element is an uninitialised pointer it would be placed in the bss segment. Why then would the heap value be larger by a silly amount than what is required?
It doesn't sound silly if you know how memory allocators work. They keep track of the allocated blocks so there's a field storing the size and also a pointer to the next block, perhaps even some padding. Some compilers place guarding space around the allocated area in debug builds so if you write beyond or before the allocated area the program can detect it at runtime when you try to free the allocated space.
you are allocating one char at a time. There is typically a space overhead per allocation
Allocate the memory on one big chunk (or at least in a few chunks)
Do not forget that char* gBigArray[200][200][200]; allocates space for 200*200*200=8000000 pointers, each word size. That is 32 MB on a 32 bit system.
Add another 8000000 char's to that for another 8MB. Since you are allocating them one by one it probably can't allocate them at one byte per item so they'll probably also take the word size per item resulting in another 32MB (32 bit system).
The rest is probably overhead, which is also significant because the C++ system must remember how many elements an array allocated with new contains for delete [].
Owww! My embedded systems stuff would roll over and die if faced with that code. Each allocation has quite a bit of extra info associated with it and either is spaced to a fixed size, or is managed via a linked list type object. On my system, that 1 char new would become a 64 byte allocation out of a small object allocator such that management would be in O(1) time. But in other systems, this could easily fragment your memory horribly, make subsequent new and deletes run extremely slowly O(n) where n is number of things it tracks, and in general bring doom upon an app over time as each char would become at least a 32 byte allocation and be placed in all sorts of cubby holes in memory, thus pushing your allocation heap out much further than you might expect.
Do a single large allocation and map your 3D array over it if you need to with a placement new or other pointer trickery.
Allocating 1 char at a time is probably more expensive. There are metadata headers per allocation so 1 byte for a character is smaller than the header metadata so you might actually save space by doing one large allocation (if possible) that way you mitigate the overhead of each individual allocation having its own metadata.
Perhaps this is an issue of memory stride? What size of gaps are between values?
30 MB is for the pointers. The rest is for the storage you allocated with the new call that the pointers are pointing to. Compilers are allowed to allocate more than one byte for various reasons, like to align on word boundaries, or give some growing room in case you want it later. If you want 8 MB worth of characters, leave the * off your declaration for gBigArray.
Edited out of the above post into a community wiki post:
As the answers below say, the issue here is I am creating a new char 200^3 times, and although each char is only 1 byte, there is overhead for every object on the heap. It seems creating a char array for all chars knocks the memory down to a more believable level:
char* gBigArray[200][200][200];
char* gCharBlock=new char[200*200*200];
unsigned int Initialise(){
unsigned int mIndex=0;
for(int ta=0;ta<200;ta++)
for(int tb=0;tb<200;tb++)
for(int tc=0;tc<200;tc++)
gBigArray[ta][tb][tc]=&gCharBlock[mIndex++];
return sizeof(gBigArray);
}

delete & new in c++

This may be very simple question,But please help me.
i wanted to know what exactly happens when i call new & delete , For example in below code
char * ptr=new char [10];
delete [] ptr;
call to new returns me memory address. Does it allocate exact 10 bytes on heap, Where information about size is stored.When i call delete on same pointer,i see in debugger that there are a lot of byte get changed before and after the 10 Bytes.
Is there any header for each new which contain information about number of byte allocated by new.
Thanks a lot
Do it allocate exact 10 bytes
That's implementation dependant. The guarantee is "at least 10 chars".
Where information about size is stored?
That's implementation dependant.
Is there any header for each new which contain information about number of byte allocated by new?
That's implementation dependant.
By "that's implementation dependant" I mean it's not defined in the standard.
That's all up to the compiler and your runtime library. It's only exactly defined what effects new and delete have on your program, but how exactly these are acieved is not specified.
In your case it seems like a little more memory than requested is allocated and it will probably store management information like the size of the current chunk of memory, information about adjacent areas of free space or information to help the debugger try to detect buffer overflows and similar problems.
It is completely implementation-dependent. In general case you have to store the number of elements elsewhere. The implementation must allocate enough space for at least the number of elements specified, but it can allocate more.
Is there any header for each new which contain information about number of byte allocated by new.
That's platform dependent but yes, on many platforms there are.
Precisely, according to the standard, new char[10] will alloc at least 10 bytes in the heap.
The internals of new and delete are implementation dependent. So it will vary from compiler to compiler, and platform to platform. Additionally, you can find a variety of allocator algorithms (e.g: TCMalloc).
I'll give you an overview of how it could work internally, but don't take it as absolute truth. It's written for the solely purpose of this explanation.
In short, the new operator internally invokes malloc. The malloc uses a really long linked list of available memory blocks, aka free chain. When malloc is invoked, it lookups this list for the first block that's big enough to hold the requested size. After that, it splits the block in two parts, one with the size you requested, and the other with the rest, which is then added back to the free chain. Finally, it returns the block with the request size.
The inverse occurs in a free call, which is invoked by delete/delete[]. In short, it puts the provided block back to the free chain.
There could be fancy tricks during the processes I described above, like sorting the free chain, rounding the requested size to the next power of two to reduce memory fragmentation, and so on.
char * ptr=new char [10];
You are creating an array of 10 character's in heap and storing the address of 0th element in a pointer.this is similar to doing an malloc in C
delete [] ptr;
You are deleting(freeing the memory) the heap memory which was allocated by the earlier statement.this is similar to doing a free in c.
It is implementation dependent, but mostly the metadata for a block of memory is usually stored in the area before the memory address returned. The change that you observed before the 10 bytes was likely metadata being updated for this block (likely the size of the block being written into the meta data), and after the 10 bytes were metadata being updated for the next block (still unallocated, likely the pointer to the next chunk on the free list).
It is not a good idea to mess with the heap as it is not portable. However, if you want to do such heap magic, I suggest you implement your own memory pools (just get a large chunk of memory from the heap and manage it yourself). A possible place to start would be to look at libmm.
While the specifics are implementation dependent, one piece of information the implementation will need to store is the number of elements in the array. Or if it does not store it directly, it will need to accurately derive it from the block size allocated.
The reason for this because if an array of objects is allocated with new[], when they are deleted with delete[], the destructor of each object in the array will need to be called. delete[] will need to know how many objects to destruct. This is why it is necessary to match new with delete and new[] with delete[].