bad memory alloc error - segment tree - c++

I am trying to create a segmented tree,
Here is my struct for the node of tree:
struct Node{
int x1, x2; // x coordinates
int y1, y2; // y coordinates
Node * v1;
Node * v2;
Node * v3;
Node * v4;
bool oBo; //check if 1 by 1
bool O;
bool F;
int dimens;
Node(int myx1, int myx2, int myy1, int myy2){
this->x1 = myx1;
this->x2 = myx2;
this->y1 = myy1;
this->y2 = myy2;
this->dimens = abs(x2 - x1);
if (dimens == 1)
{
this->oBo = true;
}
else
this->oBo = false;
this->O = false;
this->F = false;
this->v1 = NULL;
this->v2 = NULL;
this->v3 = NULL;
this->v4 = NULL;
}
};
This is my constructor for the Map
MapTree::MapTree(int iSize)
{
this->size = iSize;
root = new Node(0, size, 0, size);
segment(root);
}
and I am using the this segment function to make sub-segments of the root and then this is function is called recursively on the sub-nodes of root and so on. I get a bad memory alloc on the second segment. i.e when dimens = 2 and I have no idea why this is happening. I tried to fix it by changing the values and size but visual studio is not providing any clear error except bad memory alloc at certain memory location.
here is the segment function:
void MapTree::segment(Node * node)
{
while (node->oBo != true)
{
int dimension = node->dimens;
node->v1 = new Node(0, dimension/2, 0 , dimension/2);
node->v2 = new Node(dimension/ 2, dimension, 0, dimension/ 2);
node->v3 = new Node(0, dimension / 2 , dimension / 2, dimension);
node->v4 = new Node(dimension / 2, dimension, dimension / 2, dimension);
segment(node->v1);
segment(node->v2);
segment(node->v3);
segment(node->v4);
}
and last but not the least the size given for the tree is always the power of 2 so the segments are always going to end up being the size of one by one

Never mind, I figured out what was wrong, I think I did not worded my question here correctly. but after some debugging I found the error, the loop was being called again again from the same position and hence infinite memory allocation. since root node->oBo will never be true hence infinite loop and bad memory alloc.

Related

C++ malloc/realloc weird behavior

I was programming a dynamic array for my own use, that i wanted pre-set with zeros.
template <class T>
dynArr<T>::dynArr()
{
rawData = malloc(sizeof(T) * 20); //we allocate space for 20 elems
memset(this->rawData, 0, sizeof(T) * 20); //we zero it!
currentSize = 20;
dataPtr = static_cast<T*>(rawData); //we cast pointer to required datatype.
}
And this part works - iterating by loop with dereferencind the dataPtr works great. Zeros.
Yet, reallocation behaves (in my opinion) at least a bit strange. First you have to look at reallocation code:
template <class T>
void dynArr<T>::insert(const int index, const T& data)
{
if (index < currentSize - 1)
{
dataPtr[index] = data; //we can just insert things, array is zero-d
}
else
{
//TODO we should increase size exponentially, not just to the element we want
const size_t lastSize = currentSize; //store current size (before realloc). this is count not bytes.
rawData = realloc(rawData, index + 1); //rawData points now to new location in the memory
dataPtr = (T*)rawData;
memset(dataPtr + lastSize - 1, 0, sizeof(T) * index - lastSize - 1); //we zero from ptr+last size to index
dataPtr[index] = data;
currentSize = index + 1;
}
}
Simple, we realloc data up to index+1, and set yet-non-zeroed memory to 0.
As for a test, i first inserted 5 on position 5 on this array. Expected thing happened - 0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Yet, inserting something else, like insert(30,30) gives me strange behavior:
0, 0, 0, 0, 0, 5, 0, -50331648, 16645629, 0, 523809160, 57600, 50928864, 50922840, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30,
What the hell, am i not understanding something here? shouldnt realloc take all the 20 previously set memory bytes into account? What sorcery is going on here.
Problem 1:
You are using the wrong size in the call to realloc. Change it to:
rawData = realloc(rawData, sizeof(T)*(index + 1));
If rawData is of type T*, prefer
rawData = realloc(rawData, sizeof(*rawData)*(index + 1));
Problem 2:
The last term of the following is not right.
memset(dataPtr + lastSize - 1, 0, sizeof(T) * index - lastSize - 1);
You need to use:
memset(dataPtr + lastSize - 1, 0, sizeof(T) * (index - lastSize - 1));
// ^^ ^^
// size * The number of objects
Problem 3:
Assigning to dataPtr using
dataPtr[index] = data;
is a problem when memory is obtained using malloc or realloc. malloc family of functions return just raw memory. They don't initialize objects.
Assigning to uninitialized objects is a problem for all non-POD types.
Problem 4:
If T is type with virtual member functions, using memset to zero out memory will most likely lead to problems.
Suggestion for fixing all the problems:
It will be much better to use new and delete since you are in C++ land.
template <class T>
dynArr<T>::dynArr()
{
currentSize = 20;
dataPtr = new T[currentSize];
// Not sure why you need rawData
}
template <class T>
void dynArr<T>::insert(const int index, const T& data)
{
if (index < currentSize - 1)
{
dataPtr[index] = data;
}
else
{
const size_t lastSize = currentSize;
T* newData = new T[index+1];
std::copy(dataPtr, dataPtr+lastSize, newData);
delete [] dataPtr;
dataPtr = newData;
dataPtr[index] = data;
currentSize = index + 1;
}
}
Please note that the suggested change will work only if T is default constructible.
This will also take care of the problems 3 and 4 outlined above.

How to replace an instance with another instance via pointer?

I'm doing online destructive clustering (clusters replace clustered objects) on a list of class instances (stl::list).
Background
My list of current percepUnits is: stl::list<percepUnit> units; and for each iteration I get a new list of input percepUnits stl::list<percepUnit> scratch; that need to be clustered with the units.
I want to maintain a fixed number of percepUnits (so units.size() is constant), so for each new scratch percepUnit I need to merge it with the nearest percepUnit in units. Following is a code snippet that builds a list (dists) of structures (percepUnitDist) that contain pointers to each pair of items in scratch and units percepDist.scratchUnit = &(*scratchUnit); and percepDist.unit = &(*unit); and their distance. Additionally, for each item in scratch I keep track of which item in units has the least distance minDists.
// For every scratch percepUnit:
for (scratchUnit = scratch.begin(); scratchUnit != scratch.end(); scratchUnit++) {
float minDist=2025.1172; // This is the max possible distance in unnormalized CIELuv, and much larger than the normalized dist.
// For every percepUnit:
for (unit = units.begin(); unit != units.end(); unit++) {
// compare pairs
float dist = featureDist(*scratchUnit, *unit, FGBG);
//cout << "distance: " << dist << endl;
// Put pairs in a structure that caches their distances
percepUnitDist percepDist;
percepDist.scratchUnit = &(*scratchUnit); // address of where scratchUnit points to.
percepDist.unit = &(*unit);
percepDist.dist = dist;
// Figure out the percepUnit that is closest to this scratchUnit.
if (dist < minDist)
minDist = dist;
dists.push_back(percepDist); // append dist struct
}
minDists.push_back(minDist); // append the min distance to the nearest percepUnit for this particular scratchUnit.
}
So now I just need to loop through the percepUnitDist items in dists and match the distances with the minimum distances to figure out which percepUnit in scratch should be merged with which percepUnit in units. The merging process mergePerceps() creates a new percepUnit which is a weighted average of the "parent" percepUnits in scratch and units.
Question
I want to replace the instance in the units list with the new percepUnit constructed by mergePerceps(), but I would like to do so in the context of looping through the percepUnitDists. This is my current code:
// Loop through dists and merge all the closest pairs.
// Loop through all dists
for (distIter = dists.begin(); distIter != dists.end(); distIter++) {
// Loop through all minDists for each scratchUnit.
for (minDistsIter = minDists.begin(); minDistsIter != minDists.end(); minDistsIter++) {
// if this is the closest cluster, and the closest cluster has not already been merged, and the scratch has not already been merged.
if (*minDistsIter == distIter->dist and not distIter->scratchUnit->remove) {
percepUnit newUnit;
mergePerceps(*(distIter->scratchUnit), *(distIter->unit), newUnit, FGBG);
*(distIter->unit) = newUnit; // replace the cluster with the new merged version.
distIter->scratchUnit->remove = true;
}
}
}
I thought that I could replace the instance in units via the percepUnitDist pointer with the new percepUnit instance using *(distIter->unit) = newUnit;, but that does not seem to be working as I'm seeing a memory leak, implying the instances in the units are not getting replaced.
How do I delete the percepUnit in the units list and replace it with a new percepUnit instance such that the new unit is located in the same location?
EDIT1
Here is the percepUnit class. Note the cv::Mat members. Following is the mergePerceps() function and the mergeImages() function on which it depends:
// Function to construct an accumulation.
void clustering::mergeImages(Mat &scratch, Mat &unit, cv::Mat &merged, const string maskOrImage, const string FGBG, const float scratchWeight, const float unitWeight) {
int width, height, type=CV_8UC3;
Mat scratchImagePad, unitImagePad, scratchImage, unitImage;
// use the resolution and aspect of the largest of the pair.
if (unit.cols > scratch.cols)
width = unit.cols;
else
width = scratch.cols;
if (unit.rows > scratch.rows)
height = unit.rows;
else
height = scratch.rows;
if (maskOrImage == "mask")
type = CV_8UC1; // single channel mask
else if (maskOrImage == "image")
type = CV_8UC3; // three channel image
else
cout << "maskOrImage is not 'mask' or 'image'\n";
merged = Mat(height, width, type, Scalar::all(0));
scratchImagePad = Mat(height, width, type, Scalar::all(0));
unitImagePad = Mat(height, width, type, Scalar::all(0));
// weight images before summation.
// because these pass by reference, they mess up the images in memory!
scratch *= scratchWeight;
unit *= unitWeight;
// copy images into padded images.
scratch.copyTo(scratchImagePad(Rect((scratchImagePad.cols-scratch.cols)/2,
(scratchImagePad.rows-scratch.rows)/2,
scratch.cols,
scratch.rows)));
unit.copyTo(unitImagePad(Rect((unitImagePad.cols-unit.cols)/2,
(unitImagePad.rows-unit.rows)/2,
unit.cols,
unit.rows)));
merged = scratchImagePad+unitImagePad;
}
// Merge two perceps and return a new percept to replace them.
void clustering::mergePerceps(percepUnit scratch, percepUnit unit, percepUnit &mergedUnit, const string FGBG) {
Mat accumulation;
Mat accumulationMask;
Mat meanColour;
int x, y, w, h, area;
float l,u,v;
int numMerges=0;
std::vector<float> featuresVar; // Normalized, Sum, Variance.
//float featuresVarMin, featuresVarMax; // min and max variance accross all features.
float scratchWeight, unitWeight;
if (FGBG == "FG") {
// foreground percepts don't get merged as much.
scratchWeight = 0.65;
unitWeight = 1-scratchWeight;
} else {
scratchWeight = 0.85;
unitWeight = 1-scratchWeight;
}
// Images TODO remove the meanColour if needbe.
mergeImages(scratch.image, unit.image, accumulation, "image", FGBG, scratchWeight, unitWeight);
mergeImages(scratch.mask, unit.mask, accumulationMask, "mask", FGBG, scratchWeight, unitWeight);
mergeImages(scratch.meanColour, unit.meanColour, meanColour, "image", "FG", scratchWeight, unitWeight); // merge images
// Position and size.
x = (scratch.x1*scratchWeight) + (unit.x1*unitWeight);
y = (scratch.y1*scratchWeight) + (unit.y1*unitWeight);
w = (scratch.w*scratchWeight) + (unit.w*unitWeight);
h = (scratch.h*scratchWeight) + (unit.h*unitWeight);
// area
area = (scratch.area*scratchWeight) + (unit.area*unitWeight);
// colour
l = (scratch.l*scratchWeight) + (unit.l*unitWeight);
u = (scratch.u*scratchWeight) + (unit.u*unitWeight);
v = (scratch.v*scratchWeight) + (unit.v*unitWeight);
// Number of merges
if (scratch.numMerges < 1 and unit.numMerges < 1) { // both units are patches
numMerges = 1;
} else if (scratch.numMerges < 1 and unit.numMerges >= 1) { // unit A is a patch, B a percept
numMerges = unit.numMerges + 1;
} else if (scratch.numMerges >= 1 and unit.numMerges < 1) { // unit A is a percept, B a patch.
numMerges = scratch.numMerges + 1;
cout << "merged scratch??" <<endl;
// TODO this may be an impossible case.
} else { // both units are percepts
numMerges = scratch.numMerges + unit.numMerges;
cout << "Merging two already merged Percepts" <<endl;
// TODO this may be an impossible case.
}
// Create unit.
mergedUnit = percepUnit(accumulation, accumulationMask, x, y, w, h, area); // time is the earliest value in times?
mergedUnit.l = l; // members not in the constrcutor.
mergedUnit.u = u;
mergedUnit.v = v;
mergedUnit.numMerges = numMerges;
mergedUnit.meanColour = meanColour;
mergedUnit.pActivated = unit.pActivated; // new clusters retain parent's history of activation.
mergedUnit.scratch = false;
mergedUnit.habituation = unit.habituation; // we inherent the habituation of the cluster we merged with.
}
EDIT2
Changing the copy and assignment operators had performance side-effects and did not seem to resolve the problem. So I've added a custom function to do the replacement, which just like the copy operator makes copies of each member and make's sure those copies are deep. The problem is that I still end up with a leak.
So I've changed this line: *(distIter->unit) = newUnit;
to this: (*(distIter->unit)).clone(newUnit)
Where the clone method is as follows:
// Deep Copy of members
void percepUnit::clone(const percepUnit &source) {
// Deep copy of Mats
this->image = source.image.clone();
this->mask = source.mask.clone();
this->alphaImage = source.alphaImage.clone();
this->meanColour = source.meanColour.clone();
// shallow copies of everything else
this->alpha = source.alpha;
this->fadingIn = source.fadingIn;
this->fadingHold = source.fadingHold;
this->fadingOut = source.fadingOut;
this->l = source.l;
this->u = source.u;
this->v = source.v;
this->x1 = source.x1;
this->y1 = source.y1;
this->w = source.w;
this->h = source.h;
this->x2 = source.x2;
this->y2 = source.y2;
this->cx = source.cx;
this->cy = source.cy;
this->numMerges = source.numMerges;
this->id = source.id;
this->area = source.area;
this->features = source.features;
this->featuresNorm = source.featuresNorm;
this->remove = source.remove;
this->fgKnockout = source.fgKnockout;
this->colourCalculated = source.colourCalculated;
this->normalized = source.normalized;
this->activation = source.activation;
this->activated = source.activated;
this->pActivated = source.pActivated;
this->habituation = source.habituation;
this->scratch = source.scratch;
this->FGBG = source.FGBG;
}
And yet, I still see a memory increase. The increase does not happen if I comment out that single replacement line. So I'm still stuck.
EDIT3
I can prevent memory from increasing if I disable the cv::Mat cloning code in the function above:
// Deep Copy of members
void percepUnit::clone(const percepUnit &source) {
/* try releasing Mats first?
// No effect on memory increase, but the refCount is decremented.
this->image.release();
this->mask.release();
this->alphaImage.release();
this->meanColour.release();*/
/* Deep copy of Mats
this->image = source.image.clone();
this->mask = source.mask.clone();
this->alphaImage = source.alphaImage.clone();
this->meanColour = source.meanColour.clone();*/
// shallow copies of everything else
this->alpha = source.alpha;
this->fadingIn = source.fadingIn;
this->fadingHold = source.fadingHold;
this->fadingOut = source.fadingOut;
this->l = source.l;
this->u = source.u;
this->v = source.v;
this->x1 = source.x1;
this->y1 = source.y1;
this->w = source.w;
this->h = source.h;
this->x2 = source.x2;
this->y2 = source.y2;
this->cx = source.cx;
this->cy = source.cy;
this->numMerges = source.numMerges;
this->id = source.id;
this->area = source.area;
this->features = source.features;
this->featuresNorm = source.featuresNorm;
this->remove = source.remove;
this->fgKnockout = source.fgKnockout;
this->colourCalculated = source.colourCalculated;
this->normalized = source.normalized;
this->activation = source.activation;
this->activated = source.activated;
this->pActivated = source.pActivated;
this->habituation = source.habituation;
this->scratch = source.scratch;
this->FGBG = source.FGBG;
}
EDIT4
While I still can't explain this issue, I did notice another hint. I realized that this leak can also be stopped if I don't normalize those features I use to cluster via featureDist() (but continue to clone cv::Mats). The really odd thing is that I rewrote that code entirely and still the problem persists.
Here is the featureDist function:
float clustering::featureDist(percepUnit unitA, percepUnit unitB, const string FGBG) {
float distance=0;
if (FGBG == "BG") {
for (unsigned int i=0; i<unitA.featuresNorm.rows; i++) {
distance += pow(abs(unitA.featuresNorm.at<float>(i) - unitB.featuresNorm.at<float>(i)),0.5);
//cout << "unitA.featuresNorm[" << i << "]: " << unitA.featuresNorm[i] << endl;
//cout << "unitB.featuresNorm[" << i << "]: " << unitB.featuresNorm[i] << endl;
}
// for FG, don't use normalized colour features.
// TODO To include the area use i=4
} else if (FGBG == "FG") {
for (unsigned int i=4; i<unitA.features.rows; i++) {
distance += pow(abs(unitA.features.at<float>(i) - unitB.features.at<float>(i)),0.5);
}
} else {
cout << "FGBG argument was not FG or BG, returning 0." <<endl;
return 0;
}
return pow(distance,2);
}
Features used to be a vector of floats, and thus the normalization code was as follows:
void clustering::normalize(list<percepUnit> &scratch, list<percepUnit> &units) {
list<percepUnit>::iterator unit;
list<percepUnit*>::iterator unitPtr;
vector<float> min,max;
list<percepUnit*> masterList; // list of pointers.
// generate pointers
for (unit = scratch.begin(); unit != scratch.end(); unit++)
masterList.push_back(&(*unit)); // add pointer to where unit points to.
for (unit = units.begin(); unit != units.end(); unit++)
masterList.push_back(&(*unit)); // add pointer to where unit points to.
int numFeatures = masterList.front()->features.size(); // all percepts have the same number of features.
min.resize(numFeatures); // allocate for the number of features we have.
max.resize(numFeatures);
// Loop through all units to get feature values
for (int i=0; i<numFeatures; i++) {
min[i] = masterList.front()->features[i]; // starting point.
max[i] = min[i];
// calculate min and max for each feature.
for (unitPtr = masterList.begin(); unitPtr != masterList.end(); unitPtr++) {
if ((*unitPtr)->features[i] < min[i])
min[i] = (*unitPtr)->features[i];
if ((*unitPtr)->features[i] > max[i])
max[i] = (*unitPtr)->features[i];
}
}
// Normalize features according to min/max.
for (int i=0; i<numFeatures; i++) {
for (unitPtr = masterList.begin(); unitPtr != masterList.end(); unitPtr++) {
(*unitPtr)->featuresNorm[i] = ((*unitPtr)->features[i]-min[i]) / (max[i]-min[i]);
(*unitPtr)->normalized = true;
}
}
}
I changed the features type to a cv::Mat so I could use the opencv normalization function, so I rewrote the normalization function as follows:
void clustering::normalize(list<percepUnit> &scratch, list<percepUnit> &units) {
Mat featureMat = Mat(1,units.size()+scratch.size(), CV_32FC1, Scalar(0));
list<percepUnit>::iterator unit;
// For each feature
for (int i=0; i< units.begin()->features.rows; i++) {
// for each unit in units
int j=0;
float value;
for (unit = units.begin(); unit != units.end(); unit++) {
// Populate featureMat j is the unit index, i is the feature index.
value = unit->features.at<float>(i);
featureMat.at<float>(j) = value;
j++;
}
// for each unit in scratch
for (unit = scratch.begin(); unit != scratch.end(); unit++) {
// Populate featureMat j is the unit index, i is the feature index.
value = unit->features.at<float>(i);
featureMat.at<float>(j) = value;
j++;
}
// Normalize this featureMat in place
cv::normalize(featureMat, featureMat, 0, 1, NORM_MINMAX);
// set normalized values in percepUnits from featureMat
// for each unit in units
j=0;
for (unit = units.begin(); unit != units.end(); unit++) {
// Populate percepUnit featuresNorm, j is the unit index, i is the feature index.
value = featureMat.at<float>(j);
unit->featuresNorm.at<float>(i) = value;
j++;
}
// for each unit in scratch
for (unit = scratch.begin(); unit != scratch.end(); unit++) {
// Populate percepUnit featuresNorm, j is the unit index, i is the feature index.
value = featureMat.at<float>(j);
unit->featuresNorm.at<float>(i) = value;
j++;
}
}
}
I can't understand what the interaction between mergePercepts and normalization, especially since normalization is an entirely rewritten function.
Update
Massif and my /proc memory reporting don't agree. Massif says there is no effect of normalization on memory usage, only commenting out the percepUnit::clone() operation bypasses the leak.
Here is all the code, in case the interaction is somewhere else I am missing.
Here is another version of the same code with the dependence on OpenCV GPU removed, to facilitate testing...
It was recommended by Nghia (on the opencv forum) that I try and make the percepts a constant size. Sure enough, if I fix the dimensions and type of the cv::Mat members of percepUnit, then the leak disappears.
So it seems to me this is a bug in OpenCV that effects calling clone() and copyTo() on Mats of different sizes that are class members. So far unable to reproduce in a simple program. The leak does seem small enough that it may be the headers leaking, rather than the underlying image data.

Stack Overflow with Pathfinding Algorithm

I have been working on a project that will, in short, generate a 2D matrix of numbers, with "empty" spaces are represented by 0's. Each number is connected by a list of nodes. The nodes contain the number value, the number's X and Y position, and a list of all spaces adjacent to it (its "neighbors"), with the exception of spaces diagonally adjacent to the point, due to the algorithm only allowing movements of up, down, left, and right. The issue that I am having is that, as the title would suggest, I am experiencing some stack overflow issues. I will post my code below, if anyone could help, I would be most appreciative.
CoordList* Puzzle::GeneratePath(CoordList* Path, int GoalX, int GoalY)
{
int CurrX;
int CurrY;
CurrX = Path->NeighborX;
CurrY = Path->NeighborY;
if(CurrX == GoalX && CurrY == GoalY)
{
return(Path);
}
else
{
int NewX;
int NewY;
double NewDistance;
int OldX;
int OldY;
double OldDistance;
CoordList* PointNeighbors = NULL;
CoordList* BestChoice = NULL;
for(int i = 0; i < NumDirections; i++)
{
CoordList* NewNeighbor = new CoordList;
NewX = CurrX + DirectsX[i];
NewY = CurrY + DirectsY[i];
if(IsPossible(NewX, NewY))
{
NewNeighbor->NeighborX = NewX;
NewNeighbor->NeighborY = NewY;
if(PointNeighbors == NULL)
{
NewNeighbor->next = NULL;
PointNeighbors = NewNeighbor;
}
else
{
NewNeighbor->next = PointNeighbors;
PointNeighbors = NewNeighbor;
}
}
//delete NewNeighbor;
}
while(PointNeighbors != NULL)
{
if(BestChoice == NULL)
{
CoordList* AChoice = new CoordList;
AChoice->next = NULL;
NewX = PointNeighbors->NeighborX;
NewY = PointNeighbors->NeighborY;
AChoice->NeighborX = NewX;
AChoice->NeighborY = NewY;
BestChoice = AChoice;
PointNeighbors = PointNeighbors->next;
//delete AChoice;
}
else
{
NewX = PointNeighbors->NeighborX;
NewY = PointNeighbors->NeighborY;
NewDistance = DetermineDistance(NewX, NewY, GoalX, GoalY);
OldX = BestChoice->NeighborX;
OldY = BestChoice->NeighborY;
OldDistance = DetermineDistance(OldX, OldY, GoalX, GoalY);
if(NewDistance < OldDistance)
{
BestChoice->NeighborX = NewX;
BestChoice->NeighborY = NewY;
}
PointNeighbors = PointNeighbors->next;
}
}
BestChoice->next = Path;
Path = BestChoice;
return(GeneratePath(Path, GoalX, GoalY));
}
}
I was asked to provide my determine distance function. This is just a simple implementation of the traditional Point Distance formula. Provided below.
double Puzzle::DetermineDistance(int OneX, int OneY, int TwoX, int TwoY)
{
int DifX;
int DifY;
double PointSum;
DifX = (TwoX - OneX);
DifY = (TwoY - OneY);
DifX = (DifX * DifX);
DifY = (DifY * DifY);
PointSum = (DifX + DifY);
return (sqrt(PointSum));
}
The following is the IsPossible function, which determines if an X and Y value lies within the possible grid space.
bool Puzzle::IsPossible(int x, int y)
{
if(x + 1 > Size - 1 || x - 1 < 0
|| y + 1 > Size - 1 || y - 1 < 0)
{
return false;
}
return true;
}
You might have a infinite recursion loop that causes the stackoverflow, as you make new local variables every recursion, especially with your observered oscillation behaviour. I assume you dont have that problem with small matrices. Its just a shot in the dark :-)
The oscillation problem indicates that you dont check whether you have already been on one place already?
Anyways, maybe you want to reconsider using another pathfinding algorithm. I would suggest a agent based solution. I used to use the following solution to solve a maze of similar structure: I started an agent with a "PositionsList" of spots where it have been, so in the beginning only with the starting point. Then it copied itself to every reachable position not being in his own PositionList, adding the new position to that list and destroying itself then. Repeat that pattern with all new agents until the first agent reaches the goal. That way you are guaranteed to find the optimal path. But it might get pretty memory heavy for big matrices, especially when there are a lot different ways to get to the goal and a lot of possible directions per position! But there are plenty of other very good pathfinding algorithms out there. Maybe one of them suits you well :-)
Good Luck!

C++ Heap Corruption: Local heap variable causing issues

I am working on some simple terrain with DirectX9 by manually assembling the verts for the ground.
On the part of my code where I set up the indices I get an error though:
Windows has triggered a breakpoint in test.exe.
This may be due to a corruption of the heap, which indicates a bug in test.exe or any of the DLLs it has loaded.
Here is the part of my code that is giving me problems, and I'm almost 100% sure that it is linked to my indices pointer, but I delete it when I'm finished... so I'm not sure what the problem is.
int total = widthQuads * heightQuads * 6;
DWORD *indices = new DWORD[totalIdx];
for (int y = 0; y < heightQuads; y++)
{
for (int x = 0; x < widthQuads; x++)
{ //Width of nine:
int lowerLeft = x + y * 9;
int lowerRight = (x + 1) + y * 9;
int topLeft = x + (y + 1) * 9;
int topRight = (x + 1) + (y + 1) * 9;
//First triangle:
indices[counter++] = topLeft;
indices[counter++] = lowerRight;
indices[counter++] = lowerLeft;
//Second triangle:
indices[counter++] = topLeft;
indices[counter++] = topRight;
indices[counter++] = lowerRight;
}
}
d3dDevice->CreateIndexBuffer(sizeof(DWORD)* total, 0, D3DFMT_INDEX16,
D3DPOOL_MANAGED, &groundindex, 0);
void* mem = 0;
groundindex->Lock(0, 0, &mem, 0);
memcpy(mem, indices, total * sizeof (DWORD));
groundindex->Unlock();
delete[] indices;
When I remove this block my program runs OK.
The code you've given looks OK - with one caveat: the initial value of counter is not in the code itself. So either you don't start at counter = 0, or some other piece of code is stomping on your indices buffer.
That's the beauty of heap corruptions. There is no guarantee that the bug is in the removed portion on the code. It may simply hide the bug that exists somewhere else in your code.
int total = widthQuads * heightQuads * 6;
DWORD *indices = new DWORD[totalIdx];
Shouldn't you be doing "new DWORD[total];" here?

Why is free() bogging my program down?

I am using free to free the memory allocated for a bunch of temporary arrays in a recursive function. I would post the code but it is pretty long. When I comment out these free() calls, the program runs in less than a second. However, when I am using them, the programs takes about 20 seconds to run. Why is this happening, and how can it be fixed? This is like 100 or so MB so I'd rather not just leave the memory leak.
Additionally, when I run the program that includes all of the free() calls with profiling enabled, it runs in less than a second. I don't know how that would have an effect, but it does.
After using only some of the free() calls, it seems that there are a few in particular that cause the program to slow down. The rest do not seem to have an effect.
Ok... here's the code as requested:
void KDTree::BuildBranch(int height, Mailbox** objs, int nObjects)
{
int dnObjects = nObjects * 2;
int dnmoObjects = dnObjects - 1;
//Check for termination
if(height == -1 || nObjects < minObjectsPerNode)
{
//Create leaf
tree[nodeIndex] = KDTreeNode();
if(nObjects == 1)
tree[nodeIndex].InitializeLeaf(objs[0], 1);
else
tree[nodeIndex].InitializeLeaf(objs, nObjects);
//Added a node, increment index
nodeIndex++;
return;
}
//Save this node's index and increment the current index to save space for this node
int thisNodeIndex = nodeIndex;
nodeIndex++;
//Allocate memory for split options
float* xMins = (float*)malloc(nObjects * sizeof(float));
float* yMins = (float*)malloc(nObjects * sizeof(float));
float* zMins = (float*)malloc(nObjects * sizeof(float));
float* xMaxs = (float*)malloc(nObjects * sizeof(float));
float* yMaxs = (float*)malloc(nObjects * sizeof(float));
float* zMaxs = (float*)malloc(nObjects * sizeof(float));
//Find all possible split locations
int index = 0;
BoundingBox* tempBox = new BoundingBox();
for(int i = 0; i < nObjects; i++)
{
//Get bounding box
objs[i]->prim->MakeBoundingBox(tempBox);
//Add mins to split lists
xMins[index] = tempBox->x0;
yMins[index] = tempBox->y0;
zMins[index] = tempBox->z0;
//Add maxs
xMaxs[index] = tempBox->x1;
yMaxs[index] = tempBox->y1;
zMaxs[index] = tempBox->z1;
index++;
}
//Sort lists
Util::sortFloats(xMins, nObjects);
Util::sortFloats(yMins, nObjects);
Util::sortFloats(zMins, nObjects);
Util::sortFloats(xMaxs, nObjects);
Util::sortFloats(yMaxs, nObjects);
Util::sortFloats(zMaxs, nObjects);
//Allocate bin lists
Bin* xLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* xRight = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* yLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* yRight = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* zLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* zRight = (Bin*)malloc(dnObjects * sizeof(Bin));
//Initialize all bins
for(int i = 0; i < dnObjects; i++)
{
xLeft[i] = Bin(0, 0.0f);
xRight[i] = Bin(0, 0.0f);
yLeft[i] = Bin(0, 0.0f);
yRight[i] = Bin(0, 0.0f);
zLeft[i] = Bin(0, 0.0f);
zRight[i] = Bin(0, 0.0f);
}
//Construct min and max bins bins from split locations
//Merge min/max lists together for each axis
int minIndex = 0, maxIndex = 0;
for(int i = 0; i < dnObjects; i++)
{
if(maxIndex == nObjects || (xMins[minIndex] <= xMaxs[maxIndex] && minIndex != nObjects))
{
//Add split location to both bin lists
xLeft[i].rightEdge = xMins[minIndex];
xRight[i].rightEdge = xMins[minIndex];
//Add geometry to mins counter
xLeft[i+1].objectBoundCounter++;
minIndex++;
}
else
{
//Add split location to both bin lists
xLeft[i].rightEdge = xMaxs[maxIndex];
xRight[i].rightEdge = xMaxs[maxIndex];
//Add geometry to maxs counter
xRight[i].objectBoundCounter++;
maxIndex++;
}
}
//Repeat for y axis
minIndex = 0, maxIndex = 0;
for(int i = 0; i < dnObjects; i++)
{
if(maxIndex == nObjects || (yMins[minIndex] <= yMaxs[maxIndex] && minIndex != nObjects))
{
//Add split location to both bin lists
yLeft[i].rightEdge = yMins[minIndex];
yRight[i].rightEdge = yMins[minIndex];
//Add geometry to mins counter
yLeft[i+1].objectBoundCounter++;
minIndex++;
}
else
{
//Add split location to both bin lists
yLeft[i].rightEdge = yMaxs[maxIndex];
yRight[i].rightEdge = yMaxs[maxIndex];
//Add geometry to maxs counter
yRight[i].objectBoundCounter++;
maxIndex++;
}
}
//Repeat for z axis
minIndex = 0, maxIndex = 0;
for(int i = 0; i < dnObjects; i++)
{
if(maxIndex == nObjects || (zMins[minIndex] <= zMaxs[maxIndex] && minIndex != nObjects))
{
//Add split location to both bin lists
zLeft[i].rightEdge = zMins[minIndex];
zRight[i].rightEdge = zMins[minIndex];
//Add geometry to mins counter
zLeft[i+1].objectBoundCounter++;
minIndex++;
}
else
{
//Add split location to both bin lists
zLeft[i].rightEdge = zMaxs[maxIndex];
zRight[i].rightEdge = zMaxs[maxIndex];
//Add geometry to maxs counter
zRight[i].objectBoundCounter++;
maxIndex++;
}
}
//Free split memory
free(xMins);
free(xMaxs);
free(yMins);
free(yMaxs);
free(zMins);
free(zMaxs);
//PreCalcs
float voxelL = xRight[dnmoObjects].rightEdge - xLeft[0].rightEdge;
float voxelD = zRight[dnmoObjects].rightEdge - zLeft[0].rightEdge;
float voxelH = yRight[dnmoObjects].rightEdge - yLeft[0].rightEdge;
float voxelSA = 2.0f * voxelL * voxelD + 2.0f * voxelL * voxelH + 2.0f * voxelD * voxelH;
//Minimum cost preset to no split at all
float minCost = (float)nObjects;
float splitLoc;
int minLeftCounter = 0, minRightCounter = 0;
int axis = -1;
//---------------------------------------------------------------------------------------------
//Check costs of x-axis split planes keeping track of derivative using
//the fact that there is a minimum point on the graph costs vs split location
//Since there is one object per split plane
int splitIndex = 1;
float lastCost = nObjects * voxelL;
float tempCost;
float lastSplit = xLeft[1].rightEdge;
int leftCount = xLeft[1].objectBoundCounter, rightCount = nObjects - xRight[1].objectBoundCounter;
int lastLO = 0, lastRO = nObjects;
//Keep looping while cost is decreasing
while(splitIndex < dnObjects)
{
tempCost = leftCount * (xLeft[splitIndex].rightEdge - xLeft[0].rightEdge) + rightCount * (xLeft[dnmoObjects].rightEdge - xLeft[splitIndex].rightEdge);
if(tempCost < lastCost)
{
lastCost = tempCost;
lastSplit = xLeft[splitIndex].rightEdge;
lastLO = leftCount;
lastRO = rightCount;
}
//Update counters
splitIndex++;
leftCount += xLeft[splitIndex].objectBoundCounter;
rightCount -= xRight[splitIndex].objectBoundCounter;
}
//Calculate full SAH cost
lastCost = ((lastLO * (2 * (lastSplit - xLeft[0].rightEdge) * voxelD + 2 * (lastSplit - xLeft[0].rightEdge) * voxelH + 2 * voxelD * voxelH)) + (lastRO * (2 * (xLeft[dnmoObjects].rightEdge - lastSplit) * voxelD + 2 * (xLeft[dnmoObjects].rightEdge - lastSplit) * voxelH + 2 * voxelD * voxelH))) / voxelSA;
if(lastCost < minCost)
{
minCost = lastCost;
splitLoc = lastSplit;
minLeftCounter = lastLO;
minRightCounter = lastRO;
axis = 0;
}
//---------------------------------------------------------------------------------------------
//Repeat for y axis
splitIndex = 1;
lastCost = nObjects * voxelH;
lastSplit = yLeft[1].rightEdge;
leftCount = yLeft[1].objectBoundCounter;
rightCount = nObjects - yRight[1].objectBoundCounter;
lastLO = 0;
lastRO = nObjects;
//Keep looping while cost is decreasing
while(splitIndex < dnObjects)
{
tempCost = leftCount * (yLeft[splitIndex].rightEdge - yLeft[0].rightEdge) + rightCount * (yLeft[dnmoObjects].rightEdge - yLeft[splitIndex].rightEdge);
if(tempCost < lastCost)
{
lastCost = tempCost;
lastSplit = yLeft[splitIndex].rightEdge;
lastLO = leftCount;
lastRO = rightCount;
}
//Update counters
splitIndex++;
leftCount += yLeft[splitIndex].objectBoundCounter;
rightCount -= yRight[splitIndex].objectBoundCounter;
}
//Calculate full SAH cost
lastCost = ((lastLO * (2 * (lastSplit - yLeft[0].rightEdge) * voxelD + 2 * (lastSplit - yLeft[0].rightEdge) * voxelL + 2 * voxelD * voxelL)) + (lastRO * (2 * (yLeft[dnmoObjects].rightEdge - lastSplit) * voxelD + 2 * (yLeft[dnmoObjects].rightEdge - lastSplit) * voxelL + 2 * voxelD * voxelL))) / voxelSA;
if(lastCost < minCost)
{
minCost = lastCost;
splitLoc = lastSplit;
minLeftCounter = lastLO;
minRightCounter = lastRO;
axis = 1;
}
//---------------------------------------------------------------------------------------------
//Repeat for z axis
splitIndex = 1;
lastCost = nObjects * voxelD;
lastSplit = zLeft[1].rightEdge;
leftCount = zLeft[1].objectBoundCounter;
rightCount = nObjects - zRight[1].objectBoundCounter;
lastLO = 0;
lastRO = nObjects;
//Keep looping while cost is decreasing
while(splitIndex < dnObjects)
{
tempCost = leftCount * (zLeft[splitIndex].rightEdge - zLeft[0].rightEdge) + rightCount * (zLeft[dnmoObjects].rightEdge - zLeft[splitIndex].rightEdge);
if(tempCost < lastCost)
{
lastCost = tempCost;
lastSplit = zLeft[splitIndex].rightEdge;
lastLO = leftCount;
lastRO = rightCount;
}
//Update counters
splitIndex++;
leftCount += zLeft[splitIndex].objectBoundCounter;
rightCount -= zRight[splitIndex].objectBoundCounter;
}
//Calculate full SAH cost
lastCost = ((lastLO * (2 * (lastSplit - zLeft[0].rightEdge) * voxelL + 2 * (lastSplit - zLeft[0].rightEdge) * voxelH + 2 * voxelH * voxelL)) + (lastRO * (2 * (zLeft[dnmoObjects].rightEdge - lastSplit) * voxelL + 2 * (zLeft[dnmoObjects].rightEdge - lastSplit) * voxelH + 2 * voxelH * voxelL))) / voxelSA;
if(lastCost < minCost)
{
minCost = lastCost;
splitLoc = lastSplit;
minLeftCounter = lastLO;
minRightCounter = lastRO;
axis = 2;
}
//Free bin memory
free(xLeft);
free(xRight);
free(yLeft);
free(yRight);
free(zLeft);
free(zRight);
//---------------------------------------------------------------------------------------------
//Make sure a split is in our best interest
if(axis == -1)
{
//If not decrement the node counter
nodeIndex--;
BuildBranch(-1, objs, nObjects);
return;
}
//Allocate space for left and right lists
Mailbox** leftList = (Mailbox**)malloc(minLeftCounter * sizeof(void*));
Mailbox** rightList = (Mailbox**)malloc(minRightCounter * sizeof(void*));
//Sort objects into lists of those to the left and right of the split plane
int leftIndex = 0, rightIndex = 0;
leftCount = 0;
rightCount = 0;
switch(axis)
{
case 0:
for(int i = 0; i < nObjects; i++)
{
//Get object bounding box
objs[i]->prim->MakeBoundingBox(tempBox);
//Add to left and right lists when necessary
if(tempBox->x0 < splitLoc)
{
leftList[leftIndex++] = objs[i];
leftCount++;
}
if(tempBox->x1 > splitLoc)
{
rightList[rightIndex++] = objs[i];
rightCount++;
}
}
break;
case 1:
for(int i = 0; i < nObjects; i++)
{
//Get object bounding box
objs[i]->prim->MakeBoundingBox(tempBox);
//Add to left and right lists when necessary
if(tempBox->y0 < splitLoc)
{
leftList[leftIndex++] = objs[i];
leftCount++;
}
if(tempBox->y1 > splitLoc)
{
rightList[rightIndex++] = objs[i];
rightCount++;
}
}
break;
case 2:
for(int i = 0; i < nObjects; i++)
{
//Get object bounding box
objs[i]->prim->MakeBoundingBox(tempBox);
//Add to left and right lists when necessary
if(tempBox->z0 < splitLoc)
{
leftList[leftIndex++] = objs[i];
leftCount++;
}
if(tempBox->z1 > splitLoc)
{
rightList[rightIndex++] = objs[i];
rightCount++;
}
}
break;
};
//Delete the bounding box
delete tempBox;
//Delete old objects array
free(objs);
//Construct left and right branches
BuildBranch(height - 1, leftList, leftCount);
BuildBranch(height - 1, rightList, rightCount);
//Build this node
tree[thisNodeIndex] = KDTreeNode();
tree[thisNodeIndex].InitializeInterior(axis, splitLoc, nodeIndex - 1);
return;
}
EDIT:
Ok well I tried to replace the malloc/free with new/delete and that had no effect on the speed. I also found that it is only the free() on xLeft/xRight arrays that seem to affect the execution time significantly. I was able to eliminate the problem by moving the free() calls to after the recursive calls, although I do not know why this is making a difference because I don't see anywhere that these arrays are used after the original location for free(). As for why I am using malloc... some portions of this program use cache aligned memory, so I had been using _aligned_malloc. Although there probably is a way to get new to cache align, this is the only way I know to do it.
Is it possible that you are linking against a debug version of the runtime library that is doing something extra in free() like filling the memory with a garbage value? I have seen this behavior when you link against overly aggressive memory debugging libraries. The code that you have posted does not look strange. I would be interested to know what would happen if you replaced the arrays with std::vector or std::deque though. Vector should have behavior quite similar to the arrays and Deque may actually improve the speed a little if the arrays are large because the memory manager will not have to guarantee contiguous space.
If your program doing all of the free()ing on exit, then you might as well just skip the calls. The entire process heap is freed when you app exits.
Edit: ----
Ok, now that the code is posted, it appears to me that you aren't just freeing on exit, so you should definitely try and figure out if this is a wierd symptom of a bug, or just a costly implementation of free(). Instead of removing the free() calls, time how long it takes to execute them. is the heap manager really using up the whole 19 seconds?
I do see several places were multiple allocations have the same scope and lifetime. You could turn these into a single malloc/free call, althought that would make the code less clear and harder to mantain. So you have to ask yourself, how much does that 20 seconds matter?
Probably just the behavior of the heap manager your CRT uses. It's probably updating free lists, or some other internal structure to manage memory.
You probably should reexamine how your program allocates and uses memory if your bottleneck is here.
Having had a look at the code one big thing that comes to my mind is this - mixture of malloc(...), new(...), delete(...), free(...)
BoundingBox* tempBox = new BoundingBox();
// ....
//Delete the bounding box
delete tempBox;
yet in other places you have
Bin* xLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
// ....
free(xMins);
In short, you are mixing the C++'s runtime in calling new(...) and delete(...) with malloc(...) and free(...).. After all, this is in C++, so a question for you here...
Why did you use the malloc(...) and free(...) which is from C in the middle of this C++ code? The repercussions I could see here, is that the C++ runtime is different in terms of using the memory allocation unlike C in the aspect of OOP paradigm.
Having said this, your best bet is:
Replace all calls to malloc with new.
Replace all calls to free with delete.
Re run the program again and see if that makes a different. Can you confirm this?
Hope this helps,
Best regards,
Tom.
+1 to malloc/free making my eyes hurt in C++. Ignoring that for a second and looking at the code, three ideas:
Roll up your malloc calls to one large malloc and free (for the x/y/left/right/etc structures) instead of 12. Set the pointers into this large buffer as appropriate.
Still talking about the x/y/left/right variables: Employ a small stack based buffer, that you can use when the number of objects is small. When the number of objects is large, then dynamically allocate. When it is not, just set your pointer to the local stack buffer. This can avoid dynamic memory management all together for small inputs.
Right now, your "object" list is dynamically allocated, freed, and reallocated with each recursive call (!!). This is confusing because ownership isn't clear; but also it's a performance issue. Consider reworking the code so one list of "objects" is ever used.
C++ stores some extra information when you allocate using new like the type of the object or number of characters(in case of array) etc..If you are using free, it could be a fragmentation problem where you are actually deleting only the chunks of data in between but not freeing the actual information stored by new. Just a thought.
When you corrupt the heap, it often becomes very slow. Try to run it in debug mode with debug version of your runtime as well.
It could be poor locality of reference for your code. For example, I see the following:
//Allocate memory for split options
float* xMins = (float*)malloc(nObjects * sizeof(float));
float* yMins = (float*)malloc(nObjects * sizeof(float));
float* zMins = (float*)malloc(nObjects * sizeof(float));
float* xMaxs = (float*)malloc(nObjects * sizeof(float));
float* yMaxs = (float*)malloc(nObjects * sizeof(float));
float* zMaxs = (float*)malloc(nObjects * sizeof(float));
...
free(xMins);
free(xMaxs);
free(yMins);
free(yMaxs);
free(zMins);
free(zMaxs);
Now, assuming that the allocations proceed basically linearly, then free(xMaxs); may need to dereference memory that was allocated some number of pages away from xMins (which was just dereferenced during free(xMins);), so you might need to swap in a page from the backing store in order to perform the free (which causes a huge slowdown in execution when that happens). Re-ordering the free()'s to match the allocation order could help... In this case, that'd mean
free(xMins);
free(yMins);
free(zMins);
free(xMaxs);
free(yMaxs);
free(zMaxs);
It sounds like you are running your program from a debugger in Windows, which by default causes a special debug heap to be used, which dramatically slows down memory deallocations. This applies even to non-debug builds, as long as they are launched from a debugger (such as Visual Studio). You should be able to disable this behavior by setting the environment variable _NO_DEBUG_HEAP=1 before running your program (I recommend setting it in the project configuration settings rather than in the system settings, if possible).
You didn't describe anything about your programming environment in the original question, however, so I had to make certain assumptions about it that might be wrong. If you're not running your program under Windows, for example, then my answer doesn't apply and I have no idea what the cause of your problem might be.