c++ stl priority queue insert bad_alloc exception - c++

I am working on a query processor that reads in long lists of document id's from memory and looks for matching id's. When it finds one, it creates a DOC struct containing the docid (an int) and the document's rank (a double) and pushes it on to a priority queue. My problem is that when the word(s) searched for has a long list, when I try to push the DOC on to the queue, I get the following exception:
Unhandled exception at 0x7c812afb in QueryProcessor.exe: Microsoft C++ exception: std::bad_alloc at memory location 0x0012ee88..
When the word has a short list, it works fine. I tried pushing DOC's onto the queue in several places in my code, and they all work until a certain line; after that, I get the above error. I am completely at a loss as to what is wrong because the longest list read in is less than 1 MB and I free all memory that I allocate. Why should there suddenly be a bad_alloc exception when I try to push a DOC onto a queue that has a capacity to hold it (I used a vector with enough space reserved as the underlying data structure for the priority queue)?
I know that questions like this are almost impossible to answer without seeing all the code, but it's too long to post here. I'm putting as much as I can and am anxiously hoping that someone can give me an answer, because I am at my wits' end.
The NextGEQ function reads a list of compressed blocks of docids block by block. That is, if it sees that the lastdocid in the block (in a separate list) is larger than the docid passed in, it decompresses the block and searches until it finds the right one. Each list starts with metadata about the list with the lengths of each compressed chunk and the last docid in the chunk. data.iquery points to the beginning of the metadata; data.metapointer points to wherever in the metadata the function currently is; and data.blockpointer points to the beginning of the block of uncompressed docids, if there is one. If it sees that it was already decompressed, it just searches. Below, when I call the function the first time, it decompresses a block and finds the docid; the push onto the queue after that works. The second time, it doesn't even need to decompress; that is, no new memory is allocated, but after that time, pushing on to the queue gives a bad_alloc error.
Edit: I cleaned up my code some more so that it should compile. I also added in the OpenList() and NextGEQ functions, although the latter is long, because I think the problem is caused by a heap corruption somewhere in it. Thanks a lot!
struct DOC{
long int docid;
long double rank;
public:
DOC()
{
docid = 0;
rank = 0.0;
}
DOC(int num, double ranking)
{
docid = num;
rank = ranking;
}
bool operator>( const DOC & d ) const {
return rank > d.rank;
}
bool operator<( const DOC & d ) const {
return rank < d.rank;
}
};
struct listnode{
int* metapointer;
int* blockpointer;
int docposition;
int frequency;
int numberdocs;
int* iquery;
listnode* nextnode;
};
void QUERYMANAGER::SubmitQuery(char *query){
listnode* startlist;
vector<DOC> docvec;
docvec.reserve(20);
DOC doct;
//create a priority queue to use as a min-heap to store the documents and rankings;
priority_queue<DOC, vector<DOC>,std::greater<DOC>> q(docvec.begin(), docvec.end());
q.push(doct);
//do some processing here; startlist is a pointer to a listnode struct that starts the //linked list
//point the linked list start pointer to the node returned by the OpenList method
startlist = &OpenList(value);
listnode* minpointer;
q.push(doct);
//start by finding the first docid in the shortest list
int i = 0;
q.push(doct);
num = NextGEQ(0, *startlist);
q.push(doct);
while(num != -1)
{
q.push(doct);
//the is where the problem starts - every previous q.push(doct) works; the one after
//NextGEQ(num +1, *startlist) gives the bad_alloc error
num = NextGEQ(num + 1, *startlist);
//this is where the exception is thrown
q.push(doct);
}
}
//takes a word and returns a listnode struct with a pointer to the beginning of the list
//and metadata about the list
listnode QUERYMANAGER::OpenList(char* word)
{
long int numdocs;
//create a new node in the linked list and initialize its variables
listnode n;
n.iquery = cache -> GetiList(word, &numdocs);
n.docposition = 0;
n.frequency = 0;
n.numberdocs = numdocs;
//an int pointer to point to where in the metadata you are
n.metapointer = n.iquery;
n.nextnode = NULL;
//an int pointer to point to the uncompressed block of data, if there is one
n.blockpointer = NULL;
return n;
}
int QUERYMANAGER::NextGEQ(int value, listnode& data)
{
int lengthdocids;
int lengthfreqs;
int lengthpos;
int* temp;
int lastdocid;
lastdocid = *(data.metapointer + 2);
while(true)
{
//if it's not the first chunk in the list, the blockpointer will be pointing to the
//most recently opened block and docpos to the current position in the block
if( data.blockpointer && lastdocid >= value)
{
//if the last docid in the chunk is >= the docid we're looking for,
//go through the chunk to look for a match
//the last docid in the block is in lastdocid; keep going until you hit it
while(*(data.blockpointer + data.docposition) <= lastdocid)
{
//compare each docid with the docid passed in; if it's greater than or equal to it, return a pointer to the docid
if(*(data.blockpointer + data.docposition ) >= value)
{
//return the next greater than or equal docid
return *(data.blockpointer + data.docposition);
}
else
{
++data.docposition;
}
}
//read through the whole block; couldn't find matching docid; increment metapointer to the next block;
//free the block's memory
data.metapointer += 3;
lastdocid = *(data.metapointer + 3);
free(data.blockpointer);
data.blockpointer = NULL;
}
//reached the end of a block; check the metadata to find where the next block begins and ends and whether
//the last docid in the block is smaller or larger than the value being searched for
//first make sure that you haven't reached the end of the list
//if the last docid in the chunk is still smaller than the value passed in, move the metadata pointer
//to the beginning of the next chunk's metadata; read in the new metadata
while(true)
// while(*(metapointers[index]) != 0 )
{
if(lastdocid < value && *(data.metapointer) !=0)
{
data.metapointer += 3;
lastdocid = *(data.metapointer + 2);
}
else if(*(data.metapointer) == 0)
{
return -1;
}
else
//we must have hit a chunk whose lastdocid is >= value; read it in
{
//read in the metadata
//the length of the chunk of docid's is cumulative, so subtract the end of the last chunk
//from the end of this chunk to get the length
//find the end of the metadata
temp = data.metapointer;
while(*temp != 0)
{
temp += 3;
}
temp += 2;
//temp is now pointing to the beginning of the list of compressed data; use the location of metapointer
//to calculate where to start reading and how much to read
//if it's the first chunk in the list,the corresponding metapointer is pointing to the beginning of the query
//so the number of bytes of docid's is just the first integer in the metadata
if( data.metapointer == data.iquery)
{
lengthdocids = *data.metapointer;
}
else
{
//start reading from the offset of the end of the last chunk (saved in metapointers[index] - 3)
//plus 1 = the beginning of this chunk
lengthdocids = *(data.metapointer) - (*(data.metapointer - 3));
temp += (*(data.metapointer - 3)) / sizeof(int);
}
//allocate memory for an array of integers - the block of docid's uncompressed
int* docblock = (int*)malloc(lengthdocids * 5 );
//decompress docid's into the block of memory allocated
s9decompress((int*)temp, lengthdocids /4, (int*) docblock, true);
//set the blockpointer to point to the beginning of the block
//and docpositions[index] to 0
data.blockpointer = docblock;
data.docposition = 0;
break;
}
}
}
}
Thank you very much, bsg.

QUERYMANAGER::OpenList returns a listnode by value. In startlist = &OpenList(value); you then proceed to take the address of the temporary object that's returned. When the temporary goes away, you may be able to access the data for a time and then it's overwritten. Could you just declare a non-pointer listnode startlist on the stack and assign it the return value directly? Then remove the * in front of other uses and see if that fixes the problem.

Another thing you can try is replacing all pointers with smart pointers, specifically something like boost::shared_ptr<>, depending on how much code this really is and how much you're comfortable automating the task. Smart pointers aren't the answer to everything, but they're at least safer than raw pointers.

Assuming you have heap corruption and are not in fact exhausting memory, the commonest way a heap can get corrupted is by deleting (or freeing) the same pointer twice. You can quite easily find out if this is the issue by simply commenting out all your calls to delete (or free). This will cause your program to leak like a sieve, but if it doesn't actually crash you have probably identified the problem.
The other common cause cause of a corrupt heap is deleting (or freeing) a pointer that wasn't ever allocated on the heap. Differentiating between the two causes of corruption is not always easy, but your first priority should be to find out if corruption is actually the problem.
Note this approach won't work too well if the things you are deleting have destructors which if not called break the semantics of your program.

Thanks for all your help. You were right, Neil - I must have managed to corrupt my heap. I'm still not sure what was causing it, but when I changed the malloc(numdocids * 5) to malloc(256) it magically stopped crashing. I suppose I should have checked whether or not my mallocs were actually succeeding! Thanks again!
Bsg

Related

Dynamically changing the size of an array and reading in values. (w/o vectors)

Hello I am having the following difficulty,
I am trying to read in a table of doubles (1 entry per line) and store it in an array, while dynamically changing this array's size (for each line/entry). This is for a school assignment and it forbids the use of vectors(would be much easier...). The main idea that I had is to have a main array which stores the value, then store the previous values and the next one into a new array and do this iteratively. Currently, the problem that I am having is that only the last value of the table is being stored. I am aware, that somehow I need to be passing the data by refference to the global function and that the pointers that I am working with become null ater they exit the following iteration of the while. However, since the exact length of the data is unknown, this seems impossible since intializing an array in the main() is impossible (exact length not known). Any help would be appreciated.
Code posted below.
EDIT: after consideration of the two comments I made the following changes to the code, however I am not sure, whether they will behave appropriately. I added a new function called add_new_datapoint, that should globally change the values of the pointer/length and this is done by passing the values by refference. Called in the problematic else statement as add_new_datapoint(data_ptr, data_len, new_dp). Also, I am not sure that reallocating new memory to the pointer variable, will not result in a memory leak. In essence (after I reallocate data_ptr is the memory that was 'being pointed to' released or do I have to delete it and then re-inialise it in the . In such case, can I refference the pointer 'data_ptr' again in the next iteration of the loop?
I think it will be easier to simplify your posted code than trying to find all the places where you could have errors.
If you expect to see only double values in your file, you can simplify the code for reading data from the file to:
while ( data_file >> new_data_pt )
{
// Use new_data_pt
}
If you expect that there might be values other than doubles, then you can use:
while ( getline(data_file, line) )
{
std::istringstream str(line);
while ( str >> new_data_pt )
{
// Use new_data_pt
}
}
but then you have to understand the code will not read any more values from a line after it encounters an error. If your line contains
10.2 K 25.4
the code will read 10.2, encounter an error at K, and will not process 25.4.
The code to process new_data_pt is that it needs to be stored in a dynamically allocated array. I would suggest putting that in a function.
double* add_point(double* data_ptr, int data_len, double new_data_pt)
Call that function as:
data_ptr = add_point(data_ptr, data_len, new_data_pt);
Assuming the first while loop, the contents of main become:
int main()
{
std::fstream data_file{ "millikan2.dat" };
// It is possible that the file has nothing in it.
// In that case, data_len needs to be zero.
int data_len{ 0 };
// There is no need to allocate memory when there is nothing in the file.
// Allocate memory only when data_len is greater than zero.
double* data_ptr = nullptr;
double new_data_pt;
if (!data_file.good()) {
std::cerr << "Cannot open file";
return 1;
}
while ( data_file >> new_data_pt )
{
++data_len;
data_ptr = add_point(data_ptr, data_len, new_data_pt);
}
// No need of this.
// The file will be closed when the function returns.
// data_file.close();
}
add_point can be implemented as:
double* add_point(double* data_ptr, int data_len, double new_data_pt)
{
double* new_data_ptr = new double[data_len];
// This works even when data_ptr is nullptr.
// When data_ptr is null_ptr, (data_len - 1) is zero. Hence,
// the call to std::copy becomes a noop.
std::copy(data_ptr, data_ptr + (data_len - 1); new_data_ptr);
// Deallocate old memory.
if ( data_ptr != nullptr )
{
delete [] data_ptr;
}
new_data_ptr[data_len-1] = new_data_pt;
return new_data_ptr;
}
The code to track the number of bad points is a lot more complex. Unless you are required to do it, I would advise to ignore it.
You already got an excellent answer but I figured it may be helpful to point out a few mistakes in your code, so you can understand why it won't work.
In the second else scope you declare data_ptr again, even though it is visible from the outer scope. (delete[] doesn't delete the pointer itself, it just deallocates the memory the pointer points to.)
else {
double* data_temp { new double[data_len] };
std::copy(data_ptr, data_ptr + data_len - 2, data_temp);
*(data_temp + data_len - 1) = new_data_pt;
delete[] data_ptr;
double* data_ptr{ new double[data_len] }; // <- Right here
//for (int j{1}; j < data_len; j++) *(data_ptr + j) = *(data_temp + j);
std::cout << std::endl;
}
Instead you could just write data_ptr = new double[data_len]. However, that alone won't make this work.
All of your data disappears because on every iteration you create a new array, pointed to by data_temp and copy the data there, and on the next iteration you set data_temp to point to a new array again. This means that on every iteration you lose all data from previous iterations. This also causes a memory leak, since you allocate more memory every time you hit this line:
double* data_temp { new double[data_len] };
but you don't call delete[] data_temp afterwards.
I hope this helps to understand why it doesn't work.

Double Free or Corruption error when re-sizing Priority Queue

I've run into this error before, but the circumstances baffle me as I have run nearly this exact set of functions without having this issue.
Let me break it down:
The error is being caused by the resize() private member function of a custom priority queue I am working on. It is all centered around de-allocating the pointer to the old queue array. Before I explain any further, let me list the handful of relatively small functions I've isolated the problem to.
void unfairQ::enqueue(int val)
{
if (isFull())
resize();
numElements++;
ageCount++;
heapArr[numElements].data = val;
heapArr[numElements].age = 1;
heapArr[numElements].priority = heapArr[numElements].data;
heapifyUp(numElements);
if (ageCount == 100) {
heapSort();
ageCount = 0;
}
return;
}
bool unfairQ::isFull()
{
return (numElements == capacity);
}
void unfairQ::resize()
{
int newCap = (capacity * 1.5);
queueNode *tempHeap = new queueNode[newCap];
for (int i = 1; i <= numElements; i++) {
tempHeap[i].data = heapArr[i].data;
tempHeap[i].age = heapArr[i].age;
tempHeap[i].priority = heapArr[i].priority;
}
// delete [] heapArr;
capacity = newCap;
heapArr = tempHeap;
return;
}
The commented out line in the resize function is the one causing problems. If I do delete the pointer to the array I get the "double free" error, however if I remove that line I get a "free(): invalid next size (normal):" if I enqueue enough values to require a second resize().
Please let me know if you need any more information or if I need to clarify anything.
You seem to be using your array with indexes starting from 1, c++ uses indexes starting from 0. This can cause a buffer overflow.
For example:
If capacity is currently 5 (so heapArray can have 5 entries) andnumElementsis currently 4, yourisFullwill returnfalse(correctly), however yourenqueuecode then incrementsnumElements(from 4 to 5) and attempts to write toheapArray[5]` which is out of bounds and may overwrite some other memory.
Solution: start your indexes from 0, e.g. in the enqueue function, increment numElements after you write the data heapArray[numElements]
I found the problem, while I was referencing/incrementing/decrementing all the indices correctly and calling the appropriate functions at the appropriate times, I was operating under the notion that I was working with indices 1-size, but in the constructor (something I hadn't glanced at for a while) I'd initialized numElements as 0 which broke the whole gosh darned thing.
Fixed that and now everything is hunky dory!
Thanks for the help guys.

C++ inserting (and shifting) data into an array

I am trying to insert data into a leaf node (an array) of a B-Tree. Here is the code I have so far:
void LeafNode::insertCorrectPosLeaf(int num)
{
for (int pos=count; pos>=0; pos--) // goes through values in leaf node
{
if (num < values[pos-1]) // if inserting num < previous value in leaf node
{continue;} // conitnue searching for correct place
else // if inserting num >= previous value in leaf node
{
values[pos] = num; // inserts in position
break;
}
}
count++;
} // insertCorrectPos()
Before the line values[pos] = num, I think need to write some code that shifts the existing data instead of overwriting it. I am trying to use memmove but have a question about it. Its third parameter is the number of bytes to copy. If I am moving a single int on a 64 bit machine, does this mean I would put a "4" here? If I am going about this completely wrong any any help would be greatly appreciated. Thanks
The easiest way (and probably the most efficient) would be to use one of the standard libraries predefined structures to implement "values". I suggest either list or vector. This is because both list and vector has an insert function that does it for you. I suggest the vector class specifically is because it has the same kind of interface that an array has. However, if you want to optimize for speed of this action specifically, then I would suggest the list class because of the way it is implemented.
If you would rather to it the hard way, then here goes...
First, you need to make sure that you have the space to work in. You can either allocate dynamically:
int *values = new int[size];
or statically
int values[MAX_SIZE];
If you allocate statically, then you need to make sure that MAX_SIZE is some gigantic value that you will never ever exceed. Furthermore, you need to check the actual size of the array against the amount of allocated space every time you add an element.
if (size < MAX_SIZE-1)
{
// add an element
size++;
}
If you allocate dynamically, then you need to reallocate the whole array every time you add an element.
int *temp = new int[size+1];
for (int i = 0; i < size; i++)
temp[i] = values[i];
delete [] values;
values = temp;
temp = NULL;
// add the element
size++;
When you insert a new value, you need to shift every value over.
int temp = 0;
for (i = 0; i < size+1; i++)
{
if (values[i] > num || i == size)
{
temp = values[i];
values[i] = num;
num = temp;
}
}
Keep in mind that this is not at all optimized. A truly magical implementation would combine the two allocation strategies by dynamically allocating more space than you need, then growing the array by blocks when you run out of space. This is exactly what the vector implementation does.
The list implementation uses a linked list which has O(1) time for inserting a value because of it's structure. However, it is much less space inefficient and has O(n) time for accessing an element at location n.
Also, this code was written on the fly... be careful when using it. There might be a weird edge case that I am missing in the last code segment.
Cheers!
Ned

Pointer comparision issue

I'm having a problem with a pointer and can't get around it..
In a HashTable implementation, I have a list of ordered nodes in each bucket.The problem I have It's in the insert function, in the comparision to see if the next node is greater than the current node(in order to inserted in that position if it is) and keep the order.
You might find this hash implementation strange, but I need to be able to do tons of lookups(but sometimes also very few) and count the number of repetitions if It's already inserted (so I need fasts lookups, thus the Hash , I've thought about self-balanced trees as AVL or R-B trees, but I don't know them so I went with the solution I knew how to implement...are they faster for this type of problem?),but I also need to retrieve them by order when I've finished.
Before I had a simple list and I'd retrieve the array, then do a QuickSort, but I think I might be able to improve things by keeping the lists ordered.
What I have to map It's a 27 bit unsigned int(most exactly 3 9 bits numbers, but I convert them to a 27 bit number doing (Sr << 18 | Sg << 9 | Sb) making at the same time their value the hash_value. If you know a good function to map that 27 bit int to an 12-13-14 bit table let me know, I currently just do the typical mod prime solution.
This is my hash_node struct:
class hash_node {
public:
unsigned int hash_value;
int repetitions;
hash_node *next;
hash_node( unsigned int hash_val,
hash_node *nxt);
~hash_node();
};
And this is the source of the problem
void hash_table::insert(unsigned int hash_value) {
unsigned int p = hash_value % tableSize;
if (table[p]!=0) { //The bucket has some elements already
hash_node *pred; //node to keep the last valid position on the list
for (hash_node *aux=table[p]; aux!=0; aux=aux->next) {
pred = aux; //last valid position
if (aux->hash_value == hash_value ) {
//It's already inserted, so we increment it repetition counter
aux->repetitions++;
} else if (hash_value < (aux->next->hash_value) ) { //The problem
//If the next one is greater than the one to insert, we
//create a node in the middle of both.
aux->next = new hash_node(hash_value,aux->next);
colisions++;
numElem++;
}
}//We have arrive to the end od the list without luck, so we insert it after
//the last valid position
ant->next = new hash_node(hash_value,0);
colisions++;
numElem++;
}else { //bucket it's empty, insert it right away.
table[p] = new hash_node(hash_value, 0);
numElem++;
}
}
This is what gdb shows:
Program received signal SIGSEGV, Segmentation fault.
0x08050b4b in hash_table::insert (this=0x806a310, hash_value=3163181) at ht.cc:132
132 } else if (hash_value < (aux->next->hash_value) ) {
Which effectively indicates I'm comparing a memory adress with a value, right?
Hope It was clear. Thanks again!
aux->next->hash_value
There's no check whether "next" is NULL.
aux->next might be NULL at that point? I can't see where you have checked whether aux->next is NULL.

PushFront method for an array C++

I thought i'd post a little of my homework assignment. Im so lost in it. I just have to be really efficient. Without using any stls, boosts and the like. By this post, I was hoping that someone could help me figure it out.
bool stack::pushFront(const int nPushFront)
{
if ( count == maxSize ) // indicates a full array
{
return false;
}
else if ( count <= 0 )
{
count++;
items[top+1].n = nPushFront;
return true;
}
++count;
for ( int i = 0; i < count - 1; i++ )
{
intBackPtr = intFrontPtr;
intBackPtr++;
*intBackPtr = *intFrontPtr;
}
items[top+1].n = nPushFront;
return true;
}
I just cannot figure out for the life of me to do this correctly! I hope im doing this right, what with the pointers and all
int *intFrontPtr = &items[0].n;
int *intBackPtr = &items[capacity-1].n;
Im trying to think of this pushFront method like shifting an array to the right by 'n' units...I can only seem to do that in an array that is full. Can someone out their please help me?
Firstly, I'm not sure why you have the line else if ( count <= 0 ) - the count of items in your stack should never be below 0.
Usually, you would implement a stack not by pushing to the front, but pushing and popping from the back. So rather than moving everything along, as it looks like you're doing, just store a pointer to where the last element is, and insert just after that, and pop from there. When you push, just increment that pointer, and when you pop, decrement it (you don't even have to delete it). If that pointer is at the end of your array, you're full (so you don't even have to store a count value). And if it's at the start, then it's empty.
Edit
If you're after a queue, look into Circular Queues. That's typically how you'd implement one in an array. Alternatively, rather than using an array, try a Linked List - that lets it be arbitrarily big (the only limit is your computer's memory).
You don't need any pointers to shift an array. Just use simple for statement:
int *a; // Your array
int count; // Elements count in array
int length; // Length of array (maxSize)
bool pushFront(const int nPushFront)
{
if (count == length) return false;
for (int i = count - 1; i >= 0; --i)
Swap(a[i], a[i + 1]);
a[0] = nPushFront; ++count;
return true;
}
Without doing your homework for you let me see if I can give you some hints. Implementing a deque (double ended queue) is really quite easy if you can get your head around a few concepts.
Firstly, it is key to note that since we will be popping off the front and/or back in order to efficiently code an algorithm which uses contiguous storage we need to be able to pop front/back without shifting the entire array (what you currently do). A much better and in my mind simpler way is to track the front AND the back of the relevant data within your deque.
As a simple example of the above concept consider a static (cannot grow) deque of size 10:
class Deque
{
public:
Deque()
: front(0)
, count(0) {}
private:
size_t front;
size_t count;
enum {
MAXSIZE = 10
};
int data[MAXSIZE];
};
You can of course implement this and allow it to grow in size etc. But for simplicity I'm leaving all that out. Now to allow a user to add to the deque:
void Deque::push_back(int value)
{
if(count>=MAXSIZE)
throw std::runtime_error("Deque full!");
data[(front+count)%MAXSIZE] = value;
count++;
}
And to pop off the back:
int Deque::pop_back()
{
if(count==0)
throw std::runtime_error("Deque empty! Cannot pop!");
int value = data[(front+(--count))%MAXSIZE];
return value;
}
Now the key thing to observe in the above functions is how we are accessing the data within the array. By modding with MAXSIZE we ensure that we are not accessing out of bounds, and that we are hitting the right value. Also as the value of front changes (due to push_front, pop_front) the modulus operator ensures that wrap around is dealt with appropriately. I'll show you how to do push_front, you can figure out pop_front for yourself:
void Deque::push_front(int value)
{
if(count>=MAXSIZE)
throw std::runtime_error("Deque full!");
// Determine where front should now be.
if (front==0)
front = MAXSIZE-1;
else
--front;
data[front] = value;
++count;
}