lock free arena allocator implementation - correct? - c++

for a simple pointer-increment allocator (do they have an official name?) I am looking for a lock-free algorithm. It seems trivial, but I'd like to get soem feedback whether my implementaiton is correct.
not threadsafe implementation:
byte * head; // current head of remaining buffer
byte * end; // end of remaining buffer
void * Alloc(size_t size)
{
if (end-head < size)
return 0; // allocation failure
void * result = head;
head += size;
return head;
}
My attempt at a thread safe implementation:
void * Alloc(size_t size)
{
byte * current;
do
{
current = head;
if (end - current < size)
return 0; // allocation failure
} while (CMPXCHG(&head, current+size, current) != current));
return current;
}
where CMPXCHG is an interlocked compare exchange with (destination, exchangeValue, comparand) arguments, returning the original value
Looks good to me - if another thread allocates between the get-current and cmpxchg, the loop attempts again. Any comments?

Your current code appears to work. Your code behaves the same as the below code, which is a simple pattern that you can use for implementing any lock-free algorithm that operates on a single word of data without side-effects
do
{
original = *data; // Capture.
result = DoOperation(original); // Attempt operation
} while (CMPXCHG(data, result, original) != original);
EDIT: My original suggestion of interlocked add won't quite work here because you support trying to allocate and failing if not enough space left. You've already modified the pointer and causing subsequent allocs to fail if you used InterlockedAdd.

Related

What are the correct memory orders to use when inserting a node at the beginning of a lock free singly linked list?

I have a simple linked list. There is no danger of the ABA problem, I'm happy with Blocking category and I don't care if my list is FIFO, LIFO or randomized. At long as the inserting succeeds without making others fails.
The code for that looks something like this:
class Class {
std::atomic<Node*> m_list;
...
};
void Class::add(Node* node)
{
node->next = m_list.load(std::memory_order_acquire);
while (!m_list.compare_exchange_weak(node->next, node, std::memory_order_acq_rel, std::memory_order_acquire));
}
where I more or less randomly filled in the used memory_order's.
What are the right memory orders to use here?
I've seen people use std::memory_order_relaxed in all places, one guy on SO used that too, but then std::memory_order_release for the success case of compare_exchange_weak -- and the genmc project uses memory_order_acquire / twice memory_order_acq_rel in a comparable situation, but I can't get genmc to work for a test case :(.
Using the excellent tool from Michalis Kokologiannakis genmc, I was able to verify the required memory orders with the following test code. Unfortunately, genmc currently requires C code, but that doesn't matter for figuring out what the memory orders need to be of course.
// Install https://github.com/MPI-SWS/genmc
//
// Then test with:
//
// genmc -unroll 5 -- genmc_sll_test.c
// These header files are replaced by genmc (see /usr/local/include/genmc):
#include <pthread.h>
#include <stdlib.h>
#include <stddef.h>
#include <assert.h>
#include <stdatomic.h>
#include <stdio.h>
#define PRODUCER_THREADS 3
#define CONSUMER_THREADS 2
struct Node
{
struct Node* next;
};
struct Node* const deleted = (struct Node*)0xd31373d;
_Atomic(struct Node*) list;
void* producer_thread(void* node_)
{
struct Node* node = (struct Node*)node_;
// Insert node at beginning of the list.
node->next = atomic_load_explicit(&list, memory_order_relaxed);
while (!atomic_compare_exchange_weak_explicit(&list, &node->next,
node, memory_order_release, memory_order_relaxed))
;
return NULL;
}
void* consumer_thread(void* param)
{
// Replace the whole list with an empty list.
struct Node* head = atomic_exchange_explicit(&list, NULL, memory_order_acquire);
// Delete each node that was in the list.
while (head)
{
struct Node* orphan = head;
head = orphan->next;
// Mark the node as deleted.
assert(orphan->next != deleted);
orphan->next = deleted;
}
return NULL;
}
pthread_t t[PRODUCER_THREADS + CONSUMER_THREADS];
struct Node n[PRODUCER_THREADS]; // Initially filled with zeroes -->
// none of the Node's is marked as deleted.
int main()
{
// Start PRODUCER_THREADS threads that each append one node to the queue.
for (int i = 0; i < PRODUCER_THREADS; ++i)
if (pthread_create(&t[i], NULL, producer_thread, &n[i]))
abort();
// Start CONSUMER_THREAD threads that each delete all nodes that were added so far.
for (int i = 0; i < CONSUMER_THREADS; ++i)
if (pthread_create(&t[PRODUCER_THREADS + i], NULL, consumer_thread, NULL))
abort();
// Wait till all threads finished.
for (int i = 0; i < PRODUCER_THREADS + CONSUMER_THREADS; ++i)
if (pthread_join(t[i], NULL))
abort();
// Count number of elements still in the list.
struct Node* l = list;
int count = 0;
while (l)
{
++count;
l = l->next;
}
// Count the number of deleted elements.
int del_count = 0;
for (int i = 0; i < PRODUCER_THREADS; ++i)
if (n[i].next == deleted)
++del_count;
assert(count + del_count == PRODUCER_THREADS);
//printf("count = %d; deleted = %d\n", count, del_count);
return 0;
}
The output of which is
$ genmc -unroll 5 -- genmc_sll_test.c
Number of complete executions explored: 6384
Total wall-clock time: 1.26s
Replacing either the memory_order_release or memory_order_acquire with memory_order_relaxed causes an assertion.
In fact, it can be checked that using exclusive memory_order_relaxed when just inserting nodes is sufficient to get them all cleanly in the list (although in a 'random' order - there is nothing sequential consistent, so the order in which they are added is not necessarily the same as that the threads try to add them, if such correlation exists for other reasons).
However, the memory_order_release is required so that when head is read with memory_order_acquire we can be certain that all non-atomic next pointers are visible in the "consumer" thread.
Note there is no ABA problem here because values used for head and next cannot be "reused" before they are deleted by the 'consumer_thread' function, which is the only place where these node are allowed to be deleted (therefore), implying that there can only be one consumer thread (this test code does NOT check for the ABA problem, so it also works using 2 CONSUMER_THREADS).
The actual code is a garbage collection mechanism where multiple "producer" threads add pointers to a singly linked list when those can be deleted, but where it is only safe to actually do so in one specific thread (in that case there is only one "consumer" thread thus, which performs this garbage collection at a well-known place in a main-loop).

What is the difference between user defined stack and built in stack in use of memory?

I want to use user defined stack for my program which has a large number of recursive calls ? Will it be useful to define user defined stack?
There are a few ways to do this.
Primarily, two:
(1) Use the CPU/processor stack. There are some variants, each with its own limitations.
(2) Or, recode your function(s) to use a "stack frame" struct that simulates a "stack". The actual function ceases to be recursive. This can be virtually limitless up to whatever the heap will permit
For (1) ...
(A) If your system permits, you can issue a syscall to extend the process's stack size. There may be limits on how much you can do this and collisions with shared library addresses.
(B) You can malloc a large area. With some [somewhat] intricate inline asm trickery, you can swap this area for the stack [and back again] and call your function with this malloc area as the stack. Doable, but not for the faint of heart ...
(C) An easier way is to malloc a large area. Pass this area to pthread_attr_setstack. Then, run your recursive function as a thread using pthread_create. Note, you don't really care about multiple threads, it's just an easy way to avoid the "messy" asm trickery.
With (A), assuming the stack extend syscall permits, the limit could be all of available memory permitted for stack [up to some system-wide or RLIMIT_* parameter].
With (B) and (C), you have to "guess" and make the malloc large enough before you start. After it has been done, the size is fixed and can not be extended further.
Actually, that's not quite true. Using the asm trickery repeatedly [when needed], you could simulate a near infinite stack. But, IMO, the overhead of keeping track of these large malloc areas is high enough that I'd opt for (2) below.
For (2) ...
This can literally expand/contract as needed. One of the advantages is that you don't need to guess beforehand at how much memory you'll need. The [pseudo] stack can just keep growing as needed [until malloc returns NULL :-)].
Here is a sample recursive function [treat loosely as pseudo code]:
int
myfunc(int a,int b,int c,int d)
{
int ret;
// do some stuff ...
if (must_recurse)
ret = myfunc(a + 5,b + 7,c - 6,d + 8);
else
ret = 0;
return ret;
}
Here is that function changed to use a struct as a stack frame [again, loose pseudo code]:
typedef struct stack_frame frame_t;
struct stack_frame {
frame_t *prev;
int a;
int b;
int c;
int d;
};
stack_t *free_pool;
#define GROWCOUNT 1000
frame_t *
frame_push(frame_t *prev)
{
frame_t *cur;
// NOTE: we can maintain a free pool ...
while (1) {
cur = free_pool;
if (cur != NULL) {
free_pool = cur->prev;
break;
}
// refill free pool from heap ...
free_pool = calloc(GROWCOUNT,sizeof(stack_t));
if (free_pool == NULL) {
printf("frame_push: no memory\n");
exit(1);
}
cur = free_pool;
for (int count = GROWCOUNT; count > 0; --count, ++cur)
cur->prev = cur + 1;
cur->prev = NULL;
}
if (prev != NULL) {
*cur = *prev;
cur->prev = prev;
cur->a += 5;
cur->b += 7;
cur->c += 6;
cur->d += 8;
}
else
memset(cur,0,sizeof(frame_t));
return cur;
}
frame_t *
frame_pop(frame_t *cur)
{
frame_t *prev;
prev = cur->prev;
cur->prev = free_pool;
free_pool = cur;
return prev;
}
int
myfunc(void)
{
int ret;
stack_t *cur;
cur = frame_push(NULL);
// set initial conditions in cur...
while (1) {
// do stuff ...
if (must_recurse) {
cur = frame_push(cur);
must_recurse = 0;
continue;
}
// pop stack
cur = frame_pop(cur);
if (cur == NULL)
break;
}
return ret;
}
All of functions, objects, variable and user defined structures use memory spaces which is control by OS and compiler. So, it means your defined stack works under a general memory space which is specified for the stack of your process in OS. As a result, it does not have a big difference, but you can define an optimized structure with high efficiency to use this general stack much more better.

Segmentation fault when free the linked list node

LNode * deleteNext (LNode *L) {
if (L == NULL) { return L; }
LNode *deleted = L->next;
L->next = L->next->next;
//L->next->next = NULL;
delete deleted;
return L->next;
}
This is a function to delete the next node of pointed node, simple logic. The current code works fine. But if I uncomment the commented line, there will be a segmentation-fault, which seems weird to me. Thanks in advance.
It is a wrong implementation. What if L->next is NULL?
Here is one possible (correct) implementation:
LNode * deleteNext (LNode *L)
{
if (L == NULL || L->next == NULL) return NULL;
LNode *deleted = L->next; //L->next is NOT NULL
L->next = L->next->next;
//^^^^^^^^^^^^ could be NULL though
delete deleted;
return L->next; //L->next could be NULL here
}
Now it is up to you what you want to return from the function. You could return L instead of L->next, or you could return std::pair<LNode*, bool> containing L and a boolean value indicating whether delete is done or not.
It all depends how your list's head and tail are implemented. I will assume the last element of the list has its next link set to null (i.e. the list is not a ring closing on itself).
The call is conceptually wrong. You cannot handle a single-linked list without keeping a reference to its head (first element), unless you use the first element as the head, which is ugly and inefficient.
Also you must decide what to do with the removed element. Deleting it and then returning a pointer to its still warm corpse is at any rate not the best choice.
I will assume the caller might be interrested in retrieving the element (in which case it's the caller that will have to delete it once he's done using it).
LNode * removeNext (LNode *L)
{
if (L == NULL) panic ("Caller gave me a null pointer. What was he thinking?");
// should panic if the caller passes anything but a valid element pointer,
// be it NULL or 0x12345678
LNode * removed = L->next;
if (removed = NULL) return NULL; // L is the end of list: nothing to remove
L->next = removed->next; // removed does exist, so its next field is valid
// delete removed; // use this for the void deleteNext() variant
return removed;
}
This will be unable to empty the list completely. At least a single element will remain stuck in it (the pseudo-head, so to speak).
Also you will have to initialize the list with the said pseudo-head. Calling removeNext with the pseudo-head is safe, it will be equivalent to using the list as a LIFO.
This implementation will not allow an easy use as a FIFO though, since there will be no easy way to maintain a fixed reference to the tail (last element) of the list.
The way I would do it is rather like so:
typedef struct _buffer {
struct _buffer * next;
unsigned long data;
} tBuffer;
typedef struct {
tBuffer * head;
} tLIST;
/* ---------------------------------------------------------------------
Put a buffer into a list
--------------------------------------------------------------------- */
static void list_put (tLIST * mbx, tBuffer * msg)
{
msg->next = mbx->head;
mbx->head = msg;
}
/* ---------------------------------------------------------------------
get a buffer from a list
--------------------------------------------------------------------- */
static tBuffer * list_get (tLIST * mbx)
{
tBuffer * res;
/* get first message from the mailbox */
res = mbx->head;
if (res != NULL)
{
/* unlink the buffer */
mbx->head = res->next;
}
return res;
}
(I wrote this back in the mid-90's. Genuine vintage ANSI C. Ah, those were the days...)
It boils down to this: if you're going to implement a singly-linked list, don't try to use it like a random access data structure. It's inefficient at best, and more often than not a nest of bugs. A single-linked list can be used as a FIFO or possibly a stack, and that's about it.
std:: templates offer you everything you could dream of in terms of storage structures, and they have been tested and optimized over the last 20 years or so. No man alive (except Donald Knuth maybe) could do better with a design from scratch.

How can I improve the performance of my ring buffer code?

I am using a ringbuffer to hold samples for a streaming audio application. I copied the ringbuffer implementation from Ken Greenebaum's Audio Anecdotes 2 book.
After running Intel's Vtune analyzer on my code, it tells me that most of the time is being spent in the functions getSamplesAvailable() and getSpaceAvailable().
Can anyone advise as to how I might optimise these functions?
RingBuffer::getSamplesAvailable(void)
{
int count = (mTail - mHead + mSize) % mSize;
return(count);
}
unsigned int RingBuffer::getSpaceAvailable(void)
{
int free = (mHead - mTail + mSize - 1)%mSize;
int underMark = mHighWaterMark - getSamplesAvailable();
int spaceAvailable = min(underMark, free);
return(spaceAvailable);
}
int RingBuffer::push(int value)
{
int status = 1;
if(getSpaceAvailable()) {
// next two operations do NOT have to be atomic!
// do NOT have to worry about collision with _tail
mBuffer[mTail] = value; // store value
mTail = ++mTail % mSize; // increment tail
} else {
status = 0;
}
return(status);
}
int RingBuffer::pop(int *value)
{
int status = 1;
if(getSamplesAvailable()) {
*value = mBuffer[mHead];
mHead = ++mHead % mSize; // increment head
} else {
status = 0;
}
return(status);
}
If you can make mSize a power of two, you can replace
(mTail - mHead + mSize) % mSize
by
(mTail - mHead) & (mSize-1)
and
(mHead - mTail + mSize - 1) % mSize
by
(mHead - mTail - 1) & (mSize - 1)
I think the problem is not their complexity, they are just basic integer arithmetic, but how many times they are called.
Is there any possibility of doing "batch" (inserting or retrieving various values at once) updates on the buffer? That way you could save some calculations.
Using a power of two as Henrik proposed is the first thing to do. There is also the possibility to change the way you code the mTail and mHead indexes. Instead of keeping them in the [0, mSize[ range, you can let them run freely as uint32_t.
When accessing an element you will need to do a modulo mSize which will slow down each access.
mBuffer[mTail % mSize] = value;
But it will simpify for instance the count of samples (even if your indexes wrap over the uint32_t max value):
int count = mTail - mHead;
It will also allow you to fully use the ring buffer, instead of loosing one element to differentiate the cases where the buffer is full or empty.
If speed is the most important thing for you and you can live with the fact that it is a) non portable (only Windows, although linux has the same basic functionality as well so that should work there as well) and b) only works in release builds (well has more to do with how VC++ allocates memory in debug mode - probably there's some compile flag for this?) you can use the following:
DWORD size = 64 * 1024; // HAS to be a multiple of 64k due to how win allocates memory
HANDLE mapped_memory = CreateFileMapping(INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0, size, NULL);
int *p1 = (int*)MapViewOfFile(mapped_memory, FILE_MAP_WRITE, 0, 0, size);
int *p2 = (int*)MapViewOfFile(mapped_memory, FILE_MAP_WRITE, 0, 0, size);
// p1 and p2 should be adjacent in memory, if not try again.. no idea if there's some
// better method under windows
Basically you now have two adjacent memory blocks in virtual memory that point to the same physical memory. Ie if you write through pdw1 you'll see the changes in pdw2 and vice-versa.
The advantage is that you can now more efficiently read and write to the buffer and also larger amounts than only one word at a time. You just have to decrement the pointers correctly - shouldn't be too hard to implement.
Edit: Now see that - there's even a POSIX implementation on wiki.

c++ stl priority queue insert bad_alloc exception

I am working on a query processor that reads in long lists of document id's from memory and looks for matching id's. When it finds one, it creates a DOC struct containing the docid (an int) and the document's rank (a double) and pushes it on to a priority queue. My problem is that when the word(s) searched for has a long list, when I try to push the DOC on to the queue, I get the following exception:
Unhandled exception at 0x7c812afb in QueryProcessor.exe: Microsoft C++ exception: std::bad_alloc at memory location 0x0012ee88..
When the word has a short list, it works fine. I tried pushing DOC's onto the queue in several places in my code, and they all work until a certain line; after that, I get the above error. I am completely at a loss as to what is wrong because the longest list read in is less than 1 MB and I free all memory that I allocate. Why should there suddenly be a bad_alloc exception when I try to push a DOC onto a queue that has a capacity to hold it (I used a vector with enough space reserved as the underlying data structure for the priority queue)?
I know that questions like this are almost impossible to answer without seeing all the code, but it's too long to post here. I'm putting as much as I can and am anxiously hoping that someone can give me an answer, because I am at my wits' end.
The NextGEQ function reads a list of compressed blocks of docids block by block. That is, if it sees that the lastdocid in the block (in a separate list) is larger than the docid passed in, it decompresses the block and searches until it finds the right one. Each list starts with metadata about the list with the lengths of each compressed chunk and the last docid in the chunk. data.iquery points to the beginning of the metadata; data.metapointer points to wherever in the metadata the function currently is; and data.blockpointer points to the beginning of the block of uncompressed docids, if there is one. If it sees that it was already decompressed, it just searches. Below, when I call the function the first time, it decompresses a block and finds the docid; the push onto the queue after that works. The second time, it doesn't even need to decompress; that is, no new memory is allocated, but after that time, pushing on to the queue gives a bad_alloc error.
Edit: I cleaned up my code some more so that it should compile. I also added in the OpenList() and NextGEQ functions, although the latter is long, because I think the problem is caused by a heap corruption somewhere in it. Thanks a lot!
struct DOC{
long int docid;
long double rank;
public:
DOC()
{
docid = 0;
rank = 0.0;
}
DOC(int num, double ranking)
{
docid = num;
rank = ranking;
}
bool operator>( const DOC & d ) const {
return rank > d.rank;
}
bool operator<( const DOC & d ) const {
return rank < d.rank;
}
};
struct listnode{
int* metapointer;
int* blockpointer;
int docposition;
int frequency;
int numberdocs;
int* iquery;
listnode* nextnode;
};
void QUERYMANAGER::SubmitQuery(char *query){
listnode* startlist;
vector<DOC> docvec;
docvec.reserve(20);
DOC doct;
//create a priority queue to use as a min-heap to store the documents and rankings;
priority_queue<DOC, vector<DOC>,std::greater<DOC>> q(docvec.begin(), docvec.end());
q.push(doct);
//do some processing here; startlist is a pointer to a listnode struct that starts the //linked list
//point the linked list start pointer to the node returned by the OpenList method
startlist = &OpenList(value);
listnode* minpointer;
q.push(doct);
//start by finding the first docid in the shortest list
int i = 0;
q.push(doct);
num = NextGEQ(0, *startlist);
q.push(doct);
while(num != -1)
{
q.push(doct);
//the is where the problem starts - every previous q.push(doct) works; the one after
//NextGEQ(num +1, *startlist) gives the bad_alloc error
num = NextGEQ(num + 1, *startlist);
//this is where the exception is thrown
q.push(doct);
}
}
//takes a word and returns a listnode struct with a pointer to the beginning of the list
//and metadata about the list
listnode QUERYMANAGER::OpenList(char* word)
{
long int numdocs;
//create a new node in the linked list and initialize its variables
listnode n;
n.iquery = cache -> GetiList(word, &numdocs);
n.docposition = 0;
n.frequency = 0;
n.numberdocs = numdocs;
//an int pointer to point to where in the metadata you are
n.metapointer = n.iquery;
n.nextnode = NULL;
//an int pointer to point to the uncompressed block of data, if there is one
n.blockpointer = NULL;
return n;
}
int QUERYMANAGER::NextGEQ(int value, listnode& data)
{
int lengthdocids;
int lengthfreqs;
int lengthpos;
int* temp;
int lastdocid;
lastdocid = *(data.metapointer + 2);
while(true)
{
//if it's not the first chunk in the list, the blockpointer will be pointing to the
//most recently opened block and docpos to the current position in the block
if( data.blockpointer && lastdocid >= value)
{
//if the last docid in the chunk is >= the docid we're looking for,
//go through the chunk to look for a match
//the last docid in the block is in lastdocid; keep going until you hit it
while(*(data.blockpointer + data.docposition) <= lastdocid)
{
//compare each docid with the docid passed in; if it's greater than or equal to it, return a pointer to the docid
if(*(data.blockpointer + data.docposition ) >= value)
{
//return the next greater than or equal docid
return *(data.blockpointer + data.docposition);
}
else
{
++data.docposition;
}
}
//read through the whole block; couldn't find matching docid; increment metapointer to the next block;
//free the block's memory
data.metapointer += 3;
lastdocid = *(data.metapointer + 3);
free(data.blockpointer);
data.blockpointer = NULL;
}
//reached the end of a block; check the metadata to find where the next block begins and ends and whether
//the last docid in the block is smaller or larger than the value being searched for
//first make sure that you haven't reached the end of the list
//if the last docid in the chunk is still smaller than the value passed in, move the metadata pointer
//to the beginning of the next chunk's metadata; read in the new metadata
while(true)
// while(*(metapointers[index]) != 0 )
{
if(lastdocid < value && *(data.metapointer) !=0)
{
data.metapointer += 3;
lastdocid = *(data.metapointer + 2);
}
else if(*(data.metapointer) == 0)
{
return -1;
}
else
//we must have hit a chunk whose lastdocid is >= value; read it in
{
//read in the metadata
//the length of the chunk of docid's is cumulative, so subtract the end of the last chunk
//from the end of this chunk to get the length
//find the end of the metadata
temp = data.metapointer;
while(*temp != 0)
{
temp += 3;
}
temp += 2;
//temp is now pointing to the beginning of the list of compressed data; use the location of metapointer
//to calculate where to start reading and how much to read
//if it's the first chunk in the list,the corresponding metapointer is pointing to the beginning of the query
//so the number of bytes of docid's is just the first integer in the metadata
if( data.metapointer == data.iquery)
{
lengthdocids = *data.metapointer;
}
else
{
//start reading from the offset of the end of the last chunk (saved in metapointers[index] - 3)
//plus 1 = the beginning of this chunk
lengthdocids = *(data.metapointer) - (*(data.metapointer - 3));
temp += (*(data.metapointer - 3)) / sizeof(int);
}
//allocate memory for an array of integers - the block of docid's uncompressed
int* docblock = (int*)malloc(lengthdocids * 5 );
//decompress docid's into the block of memory allocated
s9decompress((int*)temp, lengthdocids /4, (int*) docblock, true);
//set the blockpointer to point to the beginning of the block
//and docpositions[index] to 0
data.blockpointer = docblock;
data.docposition = 0;
break;
}
}
}
}
Thank you very much, bsg.
QUERYMANAGER::OpenList returns a listnode by value. In startlist = &OpenList(value); you then proceed to take the address of the temporary object that's returned. When the temporary goes away, you may be able to access the data for a time and then it's overwritten. Could you just declare a non-pointer listnode startlist on the stack and assign it the return value directly? Then remove the * in front of other uses and see if that fixes the problem.
Another thing you can try is replacing all pointers with smart pointers, specifically something like boost::shared_ptr<>, depending on how much code this really is and how much you're comfortable automating the task. Smart pointers aren't the answer to everything, but they're at least safer than raw pointers.
Assuming you have heap corruption and are not in fact exhausting memory, the commonest way a heap can get corrupted is by deleting (or freeing) the same pointer twice. You can quite easily find out if this is the issue by simply commenting out all your calls to delete (or free). This will cause your program to leak like a sieve, but if it doesn't actually crash you have probably identified the problem.
The other common cause cause of a corrupt heap is deleting (or freeing) a pointer that wasn't ever allocated on the heap. Differentiating between the two causes of corruption is not always easy, but your first priority should be to find out if corruption is actually the problem.
Note this approach won't work too well if the things you are deleting have destructors which if not called break the semantics of your program.
Thanks for all your help. You were right, Neil - I must have managed to corrupt my heap. I'm still not sure what was causing it, but when I changed the malloc(numdocids * 5) to malloc(256) it magically stopped crashing. I suppose I should have checked whether or not my mallocs were actually succeeding! Thanks again!
Bsg