I'm trying to implement the freelist algorithm to allocate memory. The two functions I'm trying to write can be described as shown below.
// allocates a block of memory of at least size words and returns the address of that memory or 0 if no memory could be allocated.
int64_t *mymalloc(int64_t size)
// deallocates the memory stored at addr. the address will either be one allocated by mymalloc or the value 0.
void myfree(int64_t *addr)
The implementations of these functions should only use memory returned by the function pool(), whose signature is described below. Thus it cannot use the functions new, delete, malloc, calloc, realloc, etc.
// pool is a function that returns the address of a beginning // of a block of RAM that may be used for dynamic memory
// allocation. The size of the pool in bytes is stored in the // first word, which can be assumed to be a multiple of 8.
// When pool() is called, the first word isn't always overwritten with its size.
// Each word is an int64_t *, and so is 8 bytes.
// Assume this function works.
int64_t *pool();
I think defining some global variables like freelst, which points to the start of the freelst, may be helpful. It can be defined as
int64_t *freelst = pool();
I know that when allocating memory, there are some steps to follow:
The free list pointer should be updated accordingly.
The number of allocated blocks should be incremented.
The amount of memory allocated should be subtracted from the first word of the freelist, so that the first word always stores the size of memory available.
One needs to check if the current block of memory has been previously freed.
When deallocating memory, one needs to ensure addresses are inserted into the freelist in increasing order so that neighbours can be determined. If neighbours (which differ by 8) are free, they need to be merged, and as many times as necessary until no free neighbours are encountered to reduce fragmentation. Also, the second word of the freelst should be a pointer to the next word of the free
Below is some code I've come up with for this problem. It's incomplete, but the basic ideas are there.
#include <iostream>
#include <cstdint>
#include "pool.h" // place where pool is defined
const int NODE_SIZE = 8;
int64_t *freelst = pool();
int64_t *start_of_pool = freelst; // just keep this fixed I guess
// assume that the pool function works.
int64_t *mymalloc(int64_t size) {
int64_t *currentBlock = freelst;
while (currentBlock) {
if (*currentBlock >= size) { // if the currentBlock is large enough, set it to this value (we're doing first fit).
break;
}
currentBlock = currentBlock + 1; // since incrementing involves moving to the address that's one word past the current one.
}
// assuming we've found a large enough block, we now have to allocate it
if (currentBlock == 0) {
return 0; // I think this should occur because not enough memory was found
}
int64_t *prev_val = freelst; // save the previous value of the freelist
freelst = freelst + 1 + *currentBlock; // assuming *currentBlock is the size of currentBlock.
*freelst -= NODE_SIZE + *currentBlock; // update the size of the freelst here (though likely this was done incorrectly)
return currentBlock + 1; // return address one word after currentBlock
// is this all if we're trying to implement a linked list using raw pointers?
// I don't think so, but I'm not sure what else to add.
}
void myfree(int64_t *p) {
if (p == 0) {
return; // of course if we're freeing a nullptr, we should return 0.
}
// assume the freelst is already in ascending order of course.
// sort the freelst in linear time by positioning the currentBlock into the right place.
// the basic idea is to use insertion sort.
// find where the address p is in the free list.
// I think another method would be to update the prevBlock as the currentBlock is being updated.
int64_t *currentBlock = freelst;
int64_t *prevBlock = freelst;
while (currentBlock != 0 && currentBlock + 1 <= p) { // comparing addresses
prevBlock = currentBlock; // so it's set to the previous block
currentBlock = (int64_t *)*(currentBlock + 1); // set it to the next address
// as a linked list, I'm thinking of doing something like:
// prevBlock = currentBlock;
// currentBlock = currentBlock->next;
}
// after exiting, either currentBlock = 0, in which case p is the largest address,
// or currentBlock + 1 > p, so it's smaller than the current address.
if (currentBlock == 0) { // then p is the largest address
if ((int64_t *)*(prevBlock + 1) != currentBlock) throw std::invalid_argument("A likely error occurred as prevBlock + 1 != currentBlock.");
*(prevBlock + 1) = (int64_t)p;
*(p + 1) = 0;
// p->next = 0
// prevBlock->next = p;
} else {
if (prevBlock == currentBlock) { // in this case currentBlock was the start of the freelst
int64_t *temp = (int64_t *)*(currentBlock + 1);
*(prevBlock + 1) = (int64_t)p; // cast so it passes type-checking
*(p + 1) = (int64_t)temp;
// here I'm trying to mimic what's done for a linked list:
// int64_t *temp = currentBlock->next;
// prevBlock->next = p;
// p->next = temp;
} else {
*(prevBlock + 1) = (int64_t)p;
*(p + 1) = (int64_t)currentBlock;
// here's what I think might be the equivalent for a linked list:
// prevBlock->next = p;
// p->next = currentBlock;
}
}
if (currentBlock != 0) { // if it not null
if (currentBlock + 1 + *currentBlock == (int64_t *)(currentBlock + 1)) { // check if currentBlock is adjacent to prevBlock
*currentBlock += *(int64_t *)*(currentBlock + 1) + NODE_SIZE;
}
// link current block to next next block
*(currentBlock + 1) = (int64_t)((int64_t *)*(currentBlock + 1) + 1);
}
// assuming sorting was done correctly, check if addresses are adjacent
if (prevBlock + 1 + *prevBlock == currentBlock) { // if you add one word plus the size of the previous block to get the
// currentBlock
if (currentBlock == 0) throw std::invalid_argument("A likely error occurred. currentBlock was 0 even though it should have been defined.");
*prevBlock += *currentBlock + NODE_SIZE; // add the sizes of both the currentBlock and previous block,
// assuming they aren't null of course.
// so currentBlock->next->size + NODE_SIZE;
// link previous block to next block
*(prevBlock + 1) = (int64_t)(currentBlock + 1);
}
}
Any help as to how to implement these functions/cases to consider that I've missed with code that deals with them would be appreciated. I can also clarify things if necessary.
I tried looking at this website for some help too, but I'm still having issues.
how to implement a memory allocator
At high level, there are essentially two ways to acquire memory for a custom allocator:
Allocate memory using an implementation defined way. The exact details depend on the target system, so first step is to find out what system you are targeting.
Or allocate memory using a standard way (standard allocator, new, malloc, static storage, ...)
Once you've acquired the memory, you need some data structure to keep track of memory that has been allocated through the allocator. You seem to have roughly described the "free list" structure, which is commonly used for this purpose.
Related
Suppose I have some global:
std::atomic_int next_free_block;
and a number of threads each with access to a
std::atomic_int child_offset;
that may be shared between threads. I would like to allocate free blocks to child offsets in a contiguous manner, that is, I want to perform the following operation atomically:
if (child_offset != 0) child_offset = next_free_block++;
Obviously the above implementation does not work as multiple threads may enter the body of the if statement and then try to assign different blocks to child_offset.
I have also considered the following:
int expected = child_offset;
do {
if (expected == 0) break;
int updated = next_free_block++;
} while (!child_offset.compare_exchange_weak(&expected, updated);
But this also doesn't work because if the CAS fails, the side effect of incrementing next_free_block remains even if nothing is assigned to child_offset. This leaves gaps in the allocation of free blocks.
I am aware that I could do this with a mutex (or some kind of spin lock) around each child_offset and potentially DCLP, but I would like to know if this is possible to implement efficiently with atomic operations.
The use case for this is as follows: I have a large tree that I'm building in parallel. The tree is an array of the following:
struct tree_page {
atomic<uint32_t> allocated;
uint32_t child_offset[8];
uint32_t nodes[1015];
};
The tree is built level by level: first the nodes at depth 0 are created, then at depth 1, etc. A separate thread is dispatched for each non-leaf node at the previous step. If no more space is left in a page, a new page is allocated from the global next_free_page which points to the first unused page in the array of struct tree_page and is assigned to an element of child_ptr. A bit field is then set in the node word that indicates which element of the child_ptr array should be used to find the node's children.
The code I am trying to write looks like this:
int expected = allocated.load(relaxed), updated;
do {
updated = expected + num_children;
if (updated > NODES_PER_PAGE) {
expected = -1; break;
}
} while (!allocated.compare_exchange_weak(&expected, updated));
if (expected != -1) {
// successfully allocated in the same page
} else {
for (int i = 0; i < 8; ++i) {
// this is the operation I would like to be atomic
if (child_offset[i] == 0)
child_offset[i] = next_free_block++;
int offset = try_allocating_at_page(pages[child_offset[i]]);
if (offset != -1) {
// successfully allocated at child_offset i
// ...
break;
}
}
}
As far as I understood from you description you array of child_offset is filled with 0 initially and then filled with some concrete values concurrently by different threads.
In this case you can atomically "tag" value first and if you are successful assign valid value. Something like this:
constexpr int INVALID_VALUE = -1;
for (int i = 0; i < 8; ++i) {
int expected = 0;
// this is the operation I would like to be atomic
if (child_offset[i].compare_exchange_weak(expected, INVALID_VALUE)) {
child_offset[i] = next_free_block++;
}
// Not sure if this is needed in your environment, but just in case
if (child_offset[i] == INVALID_VALUE) continue;
...
}
This doesn't guarantee that all values in child_offset array will be in ascending order. But if you need that why not fill it without multithreading involved?
I want to successfully allocate an Array in my Memory Manager. I am having a hard time getting the data setup successfully in my Heap. I don't know how to instantiate the elements of the array, and then set the pointer that is passed in to that Array. Any help would be greatly appreciated. =)
Basically to sum it up, I want to write my own new[#] function using my own Heap block instead of the normal heap. Don't even want to think about what would be required for a dynamic array. o.O
// Parameter 1: Pointer that you want to pointer to the Array.
// Parameter 2: Amount of Array Elements requested.
// Return: true if Allocation was successful, false if it failed.
template <typename T>
bool AllocateArray(T*& data, unsigned int count)
{
if((m_Heap.m_Pool == nullptr) || count <= 0)
return false;
unsigned int allocSize = sizeof(T)*count;
// If we have an array, pad an extra 16 bytes so that it will start the data on a 16 byte boundary and have room to store
// the number of items allocated within this pad space, and the size of the original data type so in a delete call we can move
// the pointer by the appropriate size and call a destructor(potentially a base class destructor) on each element in the array
allocSize += 16;
unsigned int* mem = (unsigned int*)(m_Heap.Allocate(allocSize));
if(!mem)
{
return false;
}
mem[2] = count;
mem[3] = sizeof(T);
T* iter = (T*)(&(mem[4]));
data = iter;
iter++;
for(unsigned int i = 0; i < count; ++i,++iter)
{
// I have tried a bunch of stuff, not sure what to do. :(
}
return true;
}
Heap Allocate function:
void* Heap::Allocate(unsigned int allocSize)
{
Header* HeadPtr = FindBlock(allocSize);
Footer* FootPtr = (Footer*)HeadPtr;
FootPtr = (Footer*)((char*)FootPtr + (HeadPtr->size + sizeof(Header)));
// Right Split Free Memory if there is enough to make another block.
if((HeadPtr->size - allocSize) >= MINBLOCKSIZE)
{
// Create the Header for the Allocated Block and Update it's Footer
Header* NewHead = (Header*)FootPtr;
NewHead = (Header*)((char*)NewHead - (allocSize + sizeof(Header)));
NewHead->size = allocSize;
NewHead->next = NewHead;
NewHead->prev = NewHead;
FootPtr->size = NewHead->size;
// Create the Footer for the remaining Free Block and update it's size
Footer* NewFoot = (Footer*)NewHead;
NewFoot = (Footer*)((char*)NewFoot - sizeof(Footer));
HeadPtr->size -= (allocSize + HEADANDFOOTSIZE);
NewFoot->size = HeadPtr->size;
// Turn new Header and Old Footer High Bits On
(NewHead->size |= (1 << 31));
(FootPtr->size |= (1 << 31));
// Return actual allocated memory's location
void* MemAddress = NewHead;
MemAddress = ((char*)MemAddress + sizeof(Header));
m_PoolSizeTotal = HeadPtr->size;
return MemAddress;
}
else
{
// Updating descriptors
HeadPtr->prev->next = HeadPtr->next;
HeadPtr->next->prev = HeadPtr->prev;
HeadPtr->next = NULL;
HeadPtr->prev = NULL;
// Turning Header and Footer High Bits On
(HeadPtr->size |= (1 << 31));
(FootPtr->size |= (1 << 31));
// Return actual allocated memory's location
void* MemAddress = HeadPtr;
MemAddress = ((char*)MemAddress + sizeof(Header));
m_PoolSizeTotal = HeadPtr->size;
return MemAddress;
}
}
Main.cpp
int* TestArray;
MemoryManager::GetInstance()->CreateHeap(1); // Allocates 1MB
MemoryManager::GetInstance()->AllocateArray(TestArray, 3);
MemoryManager::GetInstance()->DeallocateArray(TestArray);
MemoryManager::GetInstance()->DestroyHeap();
As far as these two specific points:
Instantiate the elements of the array
Set the pointer that is passed in to that Array.
For (1): there is no definitive notion of "initializing" the elements of the array in C++. There are at least two reasonable behaviors, this depends on the semantics you want. The first is to simply zero the array (see memset). The other would be to call the default constructor for each element of the array -- I would not recommend this option as the default (zero argument) constructor may not exist.
EDIT: Example initialization using inplace-new
for (i = 0; i < len; i++)
new (&arr[i]) T();
For (2): It is not exactly clear what you mean by "and then set the pointer that is passed in to that Array." You could "set" the memory returned as data = static_cast<T*>(&mem[4]), which you already do.
A few other words of cautioning (having written my own memory managers), be very careful about byte alignment (reinterpret_cast(mem) % 16); you'll want to ensure you are returning points that are word (or even 16 byte) aligned. Also, I would recommend using inttypes.h to explicitly use uint64_t to be explicit about sizing -- current it looks like this allocator will break for >4GB allocations.
EDIT:
Speaking from experiment -- writing a memory allocator is a very difficult thing to do, and it is even more painful to debug. As commenters have stated, a memory allocator is specific to the kernel -- so information about your platform would be very helpful.
I am using a ringbuffer to hold samples for a streaming audio application. I copied the ringbuffer implementation from Ken Greenebaum's Audio Anecdotes 2 book.
After running Intel's Vtune analyzer on my code, it tells me that most of the time is being spent in the functions getSamplesAvailable() and getSpaceAvailable().
Can anyone advise as to how I might optimise these functions?
RingBuffer::getSamplesAvailable(void)
{
int count = (mTail - mHead + mSize) % mSize;
return(count);
}
unsigned int RingBuffer::getSpaceAvailable(void)
{
int free = (mHead - mTail + mSize - 1)%mSize;
int underMark = mHighWaterMark - getSamplesAvailable();
int spaceAvailable = min(underMark, free);
return(spaceAvailable);
}
int RingBuffer::push(int value)
{
int status = 1;
if(getSpaceAvailable()) {
// next two operations do NOT have to be atomic!
// do NOT have to worry about collision with _tail
mBuffer[mTail] = value; // store value
mTail = ++mTail % mSize; // increment tail
} else {
status = 0;
}
return(status);
}
int RingBuffer::pop(int *value)
{
int status = 1;
if(getSamplesAvailable()) {
*value = mBuffer[mHead];
mHead = ++mHead % mSize; // increment head
} else {
status = 0;
}
return(status);
}
If you can make mSize a power of two, you can replace
(mTail - mHead + mSize) % mSize
by
(mTail - mHead) & (mSize-1)
and
(mHead - mTail + mSize - 1) % mSize
by
(mHead - mTail - 1) & (mSize - 1)
I think the problem is not their complexity, they are just basic integer arithmetic, but how many times they are called.
Is there any possibility of doing "batch" (inserting or retrieving various values at once) updates on the buffer? That way you could save some calculations.
Using a power of two as Henrik proposed is the first thing to do. There is also the possibility to change the way you code the mTail and mHead indexes. Instead of keeping them in the [0, mSize[ range, you can let them run freely as uint32_t.
When accessing an element you will need to do a modulo mSize which will slow down each access.
mBuffer[mTail % mSize] = value;
But it will simpify for instance the count of samples (even if your indexes wrap over the uint32_t max value):
int count = mTail - mHead;
It will also allow you to fully use the ring buffer, instead of loosing one element to differentiate the cases where the buffer is full or empty.
If speed is the most important thing for you and you can live with the fact that it is a) non portable (only Windows, although linux has the same basic functionality as well so that should work there as well) and b) only works in release builds (well has more to do with how VC++ allocates memory in debug mode - probably there's some compile flag for this?) you can use the following:
DWORD size = 64 * 1024; // HAS to be a multiple of 64k due to how win allocates memory
HANDLE mapped_memory = CreateFileMapping(INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0, size, NULL);
int *p1 = (int*)MapViewOfFile(mapped_memory, FILE_MAP_WRITE, 0, 0, size);
int *p2 = (int*)MapViewOfFile(mapped_memory, FILE_MAP_WRITE, 0, 0, size);
// p1 and p2 should be adjacent in memory, if not try again.. no idea if there's some
// better method under windows
Basically you now have two adjacent memory blocks in virtual memory that point to the same physical memory. Ie if you write through pdw1 you'll see the changes in pdw2 and vice-versa.
The advantage is that you can now more efficiently read and write to the buffer and also larger amounts than only one word at a time. You just have to decrement the pointers correctly - shouldn't be too hard to implement.
Edit: Now see that - there's even a POSIX implementation on wiki.
I am working on a query processor that reads in long lists of document id's from memory and looks for matching id's. When it finds one, it creates a DOC struct containing the docid (an int) and the document's rank (a double) and pushes it on to a priority queue. My problem is that when the word(s) searched for has a long list, when I try to push the DOC on to the queue, I get the following exception:
Unhandled exception at 0x7c812afb in QueryProcessor.exe: Microsoft C++ exception: std::bad_alloc at memory location 0x0012ee88..
When the word has a short list, it works fine. I tried pushing DOC's onto the queue in several places in my code, and they all work until a certain line; after that, I get the above error. I am completely at a loss as to what is wrong because the longest list read in is less than 1 MB and I free all memory that I allocate. Why should there suddenly be a bad_alloc exception when I try to push a DOC onto a queue that has a capacity to hold it (I used a vector with enough space reserved as the underlying data structure for the priority queue)?
I know that questions like this are almost impossible to answer without seeing all the code, but it's too long to post here. I'm putting as much as I can and am anxiously hoping that someone can give me an answer, because I am at my wits' end.
The NextGEQ function reads a list of compressed blocks of docids block by block. That is, if it sees that the lastdocid in the block (in a separate list) is larger than the docid passed in, it decompresses the block and searches until it finds the right one. Each list starts with metadata about the list with the lengths of each compressed chunk and the last docid in the chunk. data.iquery points to the beginning of the metadata; data.metapointer points to wherever in the metadata the function currently is; and data.blockpointer points to the beginning of the block of uncompressed docids, if there is one. If it sees that it was already decompressed, it just searches. Below, when I call the function the first time, it decompresses a block and finds the docid; the push onto the queue after that works. The second time, it doesn't even need to decompress; that is, no new memory is allocated, but after that time, pushing on to the queue gives a bad_alloc error.
Edit: I cleaned up my code some more so that it should compile. I also added in the OpenList() and NextGEQ functions, although the latter is long, because I think the problem is caused by a heap corruption somewhere in it. Thanks a lot!
struct DOC{
long int docid;
long double rank;
public:
DOC()
{
docid = 0;
rank = 0.0;
}
DOC(int num, double ranking)
{
docid = num;
rank = ranking;
}
bool operator>( const DOC & d ) const {
return rank > d.rank;
}
bool operator<( const DOC & d ) const {
return rank < d.rank;
}
};
struct listnode{
int* metapointer;
int* blockpointer;
int docposition;
int frequency;
int numberdocs;
int* iquery;
listnode* nextnode;
};
void QUERYMANAGER::SubmitQuery(char *query){
listnode* startlist;
vector<DOC> docvec;
docvec.reserve(20);
DOC doct;
//create a priority queue to use as a min-heap to store the documents and rankings;
priority_queue<DOC, vector<DOC>,std::greater<DOC>> q(docvec.begin(), docvec.end());
q.push(doct);
//do some processing here; startlist is a pointer to a listnode struct that starts the //linked list
//point the linked list start pointer to the node returned by the OpenList method
startlist = &OpenList(value);
listnode* minpointer;
q.push(doct);
//start by finding the first docid in the shortest list
int i = 0;
q.push(doct);
num = NextGEQ(0, *startlist);
q.push(doct);
while(num != -1)
{
q.push(doct);
//the is where the problem starts - every previous q.push(doct) works; the one after
//NextGEQ(num +1, *startlist) gives the bad_alloc error
num = NextGEQ(num + 1, *startlist);
//this is where the exception is thrown
q.push(doct);
}
}
//takes a word and returns a listnode struct with a pointer to the beginning of the list
//and metadata about the list
listnode QUERYMANAGER::OpenList(char* word)
{
long int numdocs;
//create a new node in the linked list and initialize its variables
listnode n;
n.iquery = cache -> GetiList(word, &numdocs);
n.docposition = 0;
n.frequency = 0;
n.numberdocs = numdocs;
//an int pointer to point to where in the metadata you are
n.metapointer = n.iquery;
n.nextnode = NULL;
//an int pointer to point to the uncompressed block of data, if there is one
n.blockpointer = NULL;
return n;
}
int QUERYMANAGER::NextGEQ(int value, listnode& data)
{
int lengthdocids;
int lengthfreqs;
int lengthpos;
int* temp;
int lastdocid;
lastdocid = *(data.metapointer + 2);
while(true)
{
//if it's not the first chunk in the list, the blockpointer will be pointing to the
//most recently opened block and docpos to the current position in the block
if( data.blockpointer && lastdocid >= value)
{
//if the last docid in the chunk is >= the docid we're looking for,
//go through the chunk to look for a match
//the last docid in the block is in lastdocid; keep going until you hit it
while(*(data.blockpointer + data.docposition) <= lastdocid)
{
//compare each docid with the docid passed in; if it's greater than or equal to it, return a pointer to the docid
if(*(data.blockpointer + data.docposition ) >= value)
{
//return the next greater than or equal docid
return *(data.blockpointer + data.docposition);
}
else
{
++data.docposition;
}
}
//read through the whole block; couldn't find matching docid; increment metapointer to the next block;
//free the block's memory
data.metapointer += 3;
lastdocid = *(data.metapointer + 3);
free(data.blockpointer);
data.blockpointer = NULL;
}
//reached the end of a block; check the metadata to find where the next block begins and ends and whether
//the last docid in the block is smaller or larger than the value being searched for
//first make sure that you haven't reached the end of the list
//if the last docid in the chunk is still smaller than the value passed in, move the metadata pointer
//to the beginning of the next chunk's metadata; read in the new metadata
while(true)
// while(*(metapointers[index]) != 0 )
{
if(lastdocid < value && *(data.metapointer) !=0)
{
data.metapointer += 3;
lastdocid = *(data.metapointer + 2);
}
else if(*(data.metapointer) == 0)
{
return -1;
}
else
//we must have hit a chunk whose lastdocid is >= value; read it in
{
//read in the metadata
//the length of the chunk of docid's is cumulative, so subtract the end of the last chunk
//from the end of this chunk to get the length
//find the end of the metadata
temp = data.metapointer;
while(*temp != 0)
{
temp += 3;
}
temp += 2;
//temp is now pointing to the beginning of the list of compressed data; use the location of metapointer
//to calculate where to start reading and how much to read
//if it's the first chunk in the list,the corresponding metapointer is pointing to the beginning of the query
//so the number of bytes of docid's is just the first integer in the metadata
if( data.metapointer == data.iquery)
{
lengthdocids = *data.metapointer;
}
else
{
//start reading from the offset of the end of the last chunk (saved in metapointers[index] - 3)
//plus 1 = the beginning of this chunk
lengthdocids = *(data.metapointer) - (*(data.metapointer - 3));
temp += (*(data.metapointer - 3)) / sizeof(int);
}
//allocate memory for an array of integers - the block of docid's uncompressed
int* docblock = (int*)malloc(lengthdocids * 5 );
//decompress docid's into the block of memory allocated
s9decompress((int*)temp, lengthdocids /4, (int*) docblock, true);
//set the blockpointer to point to the beginning of the block
//and docpositions[index] to 0
data.blockpointer = docblock;
data.docposition = 0;
break;
}
}
}
}
Thank you very much, bsg.
QUERYMANAGER::OpenList returns a listnode by value. In startlist = &OpenList(value); you then proceed to take the address of the temporary object that's returned. When the temporary goes away, you may be able to access the data for a time and then it's overwritten. Could you just declare a non-pointer listnode startlist on the stack and assign it the return value directly? Then remove the * in front of other uses and see if that fixes the problem.
Another thing you can try is replacing all pointers with smart pointers, specifically something like boost::shared_ptr<>, depending on how much code this really is and how much you're comfortable automating the task. Smart pointers aren't the answer to everything, but they're at least safer than raw pointers.
Assuming you have heap corruption and are not in fact exhausting memory, the commonest way a heap can get corrupted is by deleting (or freeing) the same pointer twice. You can quite easily find out if this is the issue by simply commenting out all your calls to delete (or free). This will cause your program to leak like a sieve, but if it doesn't actually crash you have probably identified the problem.
The other common cause cause of a corrupt heap is deleting (or freeing) a pointer that wasn't ever allocated on the heap. Differentiating between the two causes of corruption is not always easy, but your first priority should be to find out if corruption is actually the problem.
Note this approach won't work too well if the things you are deleting have destructors which if not called break the semantics of your program.
Thanks for all your help. You were right, Neil - I must have managed to corrupt my heap. I'm still not sure what was causing it, but when I changed the malloc(numdocids * 5) to malloc(256) it magically stopped crashing. I suppose I should have checked whether or not my mallocs were actually succeeding! Thanks again!
Bsg
Anyone thought about how to write a memory manager (in C++) that is completely branch free? I've written a pool, a stack, a queue, and a linked list (allocating from the pool), but I am wondering how plausible it is to write a branch free general memory manager.
This is all to help make a really reusable framework for doing solid concurrent, in-order CPU, and cache friendly development.
Edit: by branchless I mean without doing direct or indirect function calls, and without using ifs. I've been thinking that I can probably implement something that first changes the requested size to zero for false calls, but haven't really got much more than that.
I feel that it's not impossible, but the other aspect of this exercise is then profiling it on said "unfriendly" processors to see if it's worth trying as hard as this to avoid branching.
While I don't think this is a good idea, one solution would be to have pre-allocated buckets of various log2 sizes, stupid pseudocode:
class Allocator {
void* malloc(size_t size) {
int bucket = log2(size + sizeof(int));
int* pointer = reinterpret_cast<int*>(m_buckets[bucket].back());
m_buckets[bucket].pop_back();
*pointer = bucket; //Store which bucket this was allocated from
return pointer + 1; //Dont overwrite header
}
void free(void* pointer) {
int* temp = reinterpret_cast<int*>(pointer) - 1;
m_buckets[*temp].push_back(temp);
}
vector< vector<void*> > m_buckets;
};
(You would of course also replace the std::vector with a simple array + counter).
EDIT: In order to make this robust (i.e. handle the situation where the bucket is empty) you would have to add some form of branching.
EDIT2: Here's a small branchless log2 function:
//returns the smallest x such that value <= (1 << x)
int
log2(int value) {
union Foo {
int x;
float y;
} foo;
foo.y = value - 1;
return ((foo.x & (0xFF << 23)) >> 23) - 126; //Extract exponent (base 2) of floating point number
}
This gives the correct result for allocations < 33554432 bytes. If you need larger allocations you'll have to switch to doubles.
Here's a link to how floating point numbers are represented in memory.
The only way I know to create a truly branchless allocator is to reserve all the memory it will potentially use in advance. Otherwise there's always going to be some hidden code somewhere to see if we're exceeding some current capacity whether it's in a hidden push_back in a vector checking if the size exceeds capacity used to implement it or something of that sort.
Here is one such crude example of a fixed alloc which has a completely branchless malloc and free method.
class FixedAlloc
{
public:
FixedAlloc(int element_size, int num_reserve)
{
element_size = max(element_size, sizeof(Chunk));
mem = new char[num_reserve * element_size];
char* ptr = mem;
free_chunk = reinterpret_cast<Chunk*>(ptr);
free_chunk->next = 0;
Chunk* last_chunk = free_chunk;
for (int j=1; j < num_reserve; ++j)
{
ptr += element_size;
Chunk* chunk = reinterpret_cast<Chunk*>(ptr);
chunk->next = 0;
last_chunk->next = chunk;
last_chunk = chunk;
}
}
~FixedAlloc()
{
delete[] mem;
}
void* malloc()
{
assert(free_chunk && free_chunk->next && "Reserve memory exhausted!");
Chunk* chunk = free_chunk;
free_chunk = free_chunk->next;
return chunk->mem;
}
void free(void* mem)
{
Chunk* chunk = static_cast<Chunk*>(mem);
chunk->next = free_chunk;
free_chunk = chunk;
}
private:
union Chunk
{
Chunk* next;
char mem[1];
};
char* mem;
Chunk* free_chunk;
};
Since it's totally branchless, it simply segfaults if you try to allocate more memory than initially reserved. It also has undefined behavior for trying to free a null pointer. I also avoided dealing with alignment for the sake of a simpler example.