C++ Heap Corruption: Local heap variable causing issues - c++

I am working on some simple terrain with DirectX9 by manually assembling the verts for the ground.
On the part of my code where I set up the indices I get an error though:
Windows has triggered a breakpoint in test.exe.
This may be due to a corruption of the heap, which indicates a bug in test.exe or any of the DLLs it has loaded.
Here is the part of my code that is giving me problems, and I'm almost 100% sure that it is linked to my indices pointer, but I delete it when I'm finished... so I'm not sure what the problem is.
int total = widthQuads * heightQuads * 6;
DWORD *indices = new DWORD[totalIdx];
for (int y = 0; y < heightQuads; y++)
{
for (int x = 0; x < widthQuads; x++)
{ //Width of nine:
int lowerLeft = x + y * 9;
int lowerRight = (x + 1) + y * 9;
int topLeft = x + (y + 1) * 9;
int topRight = (x + 1) + (y + 1) * 9;
//First triangle:
indices[counter++] = topLeft;
indices[counter++] = lowerRight;
indices[counter++] = lowerLeft;
//Second triangle:
indices[counter++] = topLeft;
indices[counter++] = topRight;
indices[counter++] = lowerRight;
}
}
d3dDevice->CreateIndexBuffer(sizeof(DWORD)* total, 0, D3DFMT_INDEX16,
D3DPOOL_MANAGED, &groundindex, 0);
void* mem = 0;
groundindex->Lock(0, 0, &mem, 0);
memcpy(mem, indices, total * sizeof (DWORD));
groundindex->Unlock();
delete[] indices;
When I remove this block my program runs OK.

The code you've given looks OK - with one caveat: the initial value of counter is not in the code itself. So either you don't start at counter = 0, or some other piece of code is stomping on your indices buffer.
That's the beauty of heap corruptions. There is no guarantee that the bug is in the removed portion on the code. It may simply hide the bug that exists somewhere else in your code.

int total = widthQuads * heightQuads * 6;
DWORD *indices = new DWORD[totalIdx];
Shouldn't you be doing "new DWORD[total];" here?

Related

munmap_chunk() - Invalid pointer error

I'm writing a renderer using low-level SDL functions to learn how it all works. I am now trying to do polygon drawing, but I run into errors possibly due to my inexperience with C++. When running the code I get a munmap_chunk() - Invalid pointer error. Searching reveals that it is most likely due to free()-ing the memory twice. The error happens when returning from the function. I realize that the error comes from automatically free()ing memory which has been automatically free()d before, but I'm not experienced enough with C++ to spot the error. Any clues?
My code:
void DrawPolygon (const vector<vec3> & verts, vec3 color){
// 0. Project to the screen
vector<ivec2> vertices(verts.size());
for(int i = 0; i < verts.size(); i++){
VertexShader(verts.at(i), vertices.at(i));
}
// 1. Find max and min y-value of the polygon
// and compute the number of rows it occupies.
int miny = vertices[0].y;
int maxy = vertices[0].y;
for (int i = 1; i < 3; i++){
if (vertices[i].y < miny){
miny = vertices[i].y;
}
if (vertices[i].y > maxy){
maxy = vertices[i].y;
}
}
int rows = abs(maxy - miny) + 1;
// 2. Resize leftPixels and rightPixels
// so that they have an element for each row.
vector<ivec2> leftPixels(rows);
vector<ivec2> rightPixels(rows);
// 3. Initialize the x-coordinates in leftPixels
// to some really large value and the x-coordinates
// in rightPixels to some really small value.
for(int i = 0; i < rows; i++){
leftPixels[i].x = std::numeric_limits<int>::max();
rightPixels[i].x = std::numeric_limits<int>::min();
leftPixels[i].y = miny + i;
rightPixels[i].y = miny + i;
}
// 4. Loop through all edges of the polygon and use
// linear interpolation to find the x-coordinate for
// each row it occupies. Update the corresponding
// values in rightPixels and leftPixels.
for(int i = 0; i < 3; i++){
ivec2 a = vertices[i];
ivec2 b = vertices[(i+1)%3];
// find the number of pixels to draw
ivec2 delta = glm::abs(a - b);
int pixels = glm::max(delta.x, delta.y) + 1;
// interpolate to find the pixels
vector<ivec2> line (pixels);
Interpolate(a, b, line);
for(int j = 0; j < pixels; j++){
ivec2 p = line[j];
ivec2 cmpl = leftPixels[p.y - miny];
ivec2 cmpr = rightPixels[p.y - miny];
if(p.x < cmpl.x){
leftPixels[p.y - miny].x = p.x;
//leftPixels[p.y - miny] = cmpl;
}
if(p.x > cmpr.x){
rightPixels[p.y - miny].x = p.x;
//cmpr.x = p.x;
//rightPixels[p.y - miny] = cmpr;
}
}
}
for(int i = 0; i < leftPixels.size(); i++){
ivec2 l = leftPixels.at(i);
ivec2 r = rightPixels.at(i);
// y coord the same, iterate over x
int y = l.y;
for(int x = l.x; x <= r.x; x++){
PutPixelSDL(screen, x, y, color);
}
}
}
Using valgrind gives me this output (this is the first error it reports). Weirdly, the program recovers and keeps running with the expected result, apparently not getting the same error again:
==5706== Invalid write of size 4
==5706== at 0x40AD61: DrawPolygon(std::vector<glm::detail::tvec3<float>, std::allocator<glm::detail::tvec3<float> > > const&, glm::detail::tvec3<float>) (in /home/actimia/prog/dgi14/lab3/ThirdLab)
==5706== by 0x409C78: Draw() (in /home/actimia/prog/dgi14/lab3/ThirdLab)
==5706== by 0x409668: main (in /home/actimia/prog/dgi14/lab3/ThirdLab)
I think my previous post on similar topic would be useful.
https://stackoverflow.com/a/22658693/2724703
From your Valgrind report, it look like your program is doing memory corruption due to overflow. This does not seems like "double free" error(this is overflow scenario). You have mentioned that sometime valgrind is not reporting any error this makes this problem more difficult. However there is certainly a memory corruption and you must fix them. Memory error sometime occur intermittent due to various reason(different input parameter, multi-threaded, change of execution sequence).

Program crashes when calling new operator (C++)

I'm working my way through some tutorials I found on creating an ASCII game engine in C and writing my program in C++ to practice. I'm currently working on some stuff with allocating image data on the heap in the form of an Image struct (containing an int width, int height, and two char pointers to locations on the heap holding arrays of chars [width * height] in size)... however, I'm having some problems calling the new operator. The function where I'm allocating the memory for the struct itself, as well as its character and colour data, looks like this:
Image *allocateImage(int width, int height) {
Image *image;
image = new Image;
if (image == NULL)
return NULL;
image->width = width;
image->height = height;
image->chars = new CHAR[width * height];
image->colours = new COL[width * height];
//image->colours = (CHAR*) PtrAdd(image->chars, sizeof(CHAR) + width * height);
for (int i = 0; i < width * height; ++i) { //initializes transparent image
*(&image->chars + i) = 0;
*(&image->colours + i) = 0;
}
return image;
}
The main function itself (where this function is called twice) looks like this:
int main() {
int x, y, offsetx, offsety;
DWORD i;
srand(time(0));
bool write = FALSE;
INPUT_RECORD *eventBuffer;
COLORREF palette[16] =
{
0x00000000, 0x00800000, 0x00008000, 0x00808000,
0x00000080, 0x00800080, 0x00008080, 0x00c0c0c0,
0x00808080, 0x00ff0000, 0x0000ff00, 0x00ffff00,
0x000000ff, 0x00ff00ff, 0x0000ffff, 0x00ffffff
};
COORD bufferSize = {WIDTH, HEIGHT};
DWORD num_events_read = 0;
SMALL_RECT windowSize = {0, 0, WIDTH - 1, HEIGHT - 1};
COORD characterBufferSize = {WIDTH, HEIGHT};
COORD characterPosition = {0, 0};
SMALL_RECT consoleWriteArea = {0, 0, WIDTH - 1, HEIGHT - 1};
wHnd = GetStdHandle(STD_OUTPUT_HANDLE);
rHnd = GetStdHandle(STD_INPUT_HANDLE);
SetConsoleTitle("Title!");
SetConsolePalette(palette, 8, 8, L"Sunkure Font");
SetConsoleScreenBufferSize(wHnd, bufferSize);
SetConsoleWindowInfo(wHnd, TRUE, &windowSize);
for (y = 0; y < HEIGHT; ++y) {
for (x = 0; x < WIDTH; ++x) {
consoleBuffer[x + WIDTH * y].Char.AsciiChar = (unsigned char)219;
consoleBuffer[x + WIDTH * y].Attributes = FOREGROUND_BLUE;
}
}
write = TRUE;
Image *sun_image = allocateImage(SUNW, SUNH);
Image *cloud_image = allocateImage(CLOUDW, CLOUDH);
setImage(sun_image, SUN.chars, SUN.colors);
setImage(cloud_image, Cloud.chars, Cloud.colours);
I can post more code if anyone feels it's necessary, but the program only reaches this point - in fact, a little before, as it crashes on the second call to allocateImage, at the point in the function where the new operator is called. The program has been working just fine until this point - the only recent additions have been the functions for allocation of image data on the heap (for creation of images with variable sizes) as well as deallocation (which isn't reached by this program). Since the program I'm learning from is written in C this is one place where looking at the source code won't help me, and Google's been not much help either. Can anyone point me to what's going wrong?
These lines
*(&image->chars + i) = 0;
*(&image->colours + i) = 0;
are dubious because image is already a pointer. A pointer to a pointer doesn't make sense here. Simply remove the &.
Since your actual code writes to Joe Random Address anything can happen. So it is not unusual that you thwart the memory subsystem and hence the next new call.

A method for indexing triangles from a loaded heightmap?

I am currently making a method to load in a noisy heightmap, but lack the triangles to do so. I want to make an algorithm that will take an image, its width and height and construct a terrain node out of it.
Here's what I have so far, in somewhat pseudo
Vertex* vertices = new Vertices[image.width * image.height];
Index* indices; // How do I judge how many indices I will have?
float scaleX = 1 / image.width;
float scaleY = 1 / image.height;
float currentYScale = 0;
for(int y = 0; y < image.height; ++y) {
float currentXScale = 0;
for (int x = 0; x < image.width; ++x) {
Vertex* v = vertices[x * y];
v.x = currentXScale;
v.y = currentYScale;
v.z = image[x,y];
currentXScale += scaleX;
}
currentYScale += scaleY;
}
This works well enough to my needs, my only problem is this: How would I calculate the # of indices and their positions for drawing the triangles? I have somewhat familiarity with indices, but not how to programmatically calculate them, I can only do that statically.
As far as your code above goes, using vertices[x * y] isn't right - if you use that, then e.g. vert(2,3) == vert(3,2). What you want is something like vertices[y * image.width + x], but you can do it more efficiently by incrementing a counter (see below).
Here's the equivalent code I use. It's in C# unfortunately, but hopefully it should illustrate the point:
/// <summary>
/// Constructs the vertex and index buffers for the terrain (for use when rendering the terrain).
/// </summary>
private void ConstructBuffers()
{
int heightmapHeight = Heightmap.GetLength(0);
int heightmapWidth = Heightmap.GetLength(1);
int gridHeight = heightmapHeight - 1;
int gridWidth = heightmapWidth - 1;
// Construct the individual vertices for the terrain.
var vertices = new VertexPositionTexture[heightmapHeight * heightmapWidth];
int vertIndex = 0;
for(int y = 0; y < heightmapHeight; ++y)
{
for(int x = 0; x < heightmapWidth; ++x)
{
var position = new Vector3(x, y, Heightmap[y,x]);
var texCoords = new Vector2(x * 2f / heightmapWidth, y * 2f / heightmapHeight);
vertices[vertIndex++] = new VertexPositionTexture(position, texCoords);
}
}
// Create the vertex buffer and fill it with the constructed vertices.
this.VertexBuffer = new VertexBuffer(Renderer.GraphicsDevice, typeof(VertexPositionTexture), vertices.Length, BufferUsage.WriteOnly);
this.VertexBuffer.SetData(vertices);
// Construct the index array.
var indices = new short[gridHeight * gridWidth * 6]; // 2 triangles per grid square x 3 vertices per triangle
int indicesIndex = 0;
for(int y = 0; y < gridHeight; ++y)
{
for(int x = 0; x < gridWidth; ++x)
{
int start = y * heightmapWidth + x;
indices[indicesIndex++] = (short)start;
indices[indicesIndex++] = (short)(start + 1);
indices[indicesIndex++] = (short)(start + heightmapWidth);
indices[indicesIndex++] = (short)(start + 1);
indices[indicesIndex++] = (short)(start + 1 + heightmapWidth);
indices[indicesIndex++] = (short)(start + heightmapWidth);
}
}
// Create the index buffer.
this.IndexBuffer = new IndexBuffer(Renderer.GraphicsDevice, typeof(short), indices.Length, BufferUsage.WriteOnly);
this.IndexBuffer.SetData(indices);
}
I guess the key point is that given a heightmap of size heightmapHeight * heightmapWidth, you need (heightmapHeight - 1) * (heightmapWidth - 1) * 6 indices, since you're drawing:
2 triangles per grid square
3 vertices per triangle
(heightmapHeight - 1) * (heightmapWidth - 1) grid squares in your terrain.

Sometimes I get EXEC_BAD_ACCESS (Access violation) when reversing an array

I am loading an image using the OpenEXR library.
This works fine, except the image is loaded rotated 180 degrees. I use the loop shown below to reverse the array but sometimes the program will quit and xcode will give me an EXEC_BAD_ACCESS error (Which I assume is the same as an access violation in msvc). It does not happen everytime, just once every 5-10 times.
Ideally I'd want to reverse the array in place, although that led to errors everytime and using memcpy would fail but without causing an error, just a blank image. I'd like to know what's causing this problem first.
Here is the code I am using: (Rgba is a struct of 4 "Half"s r, g, b, and a, defined in OpenEXR)
Rgba* readRgba(const char filename[], int& width, int& height){
Rgba* pixelBuffer = new Rgba[width * height];
Rgba* temp = new Rgba[width * height];
// ....EXR Loading code....
// TODO: *Sometimes* the following code results in a bad memory access error. No idea why.
// Flip the image to conform with OpenGL coordinates.
for (int i = 0; i < height; i++){
for(int j = 0; j < width; j++){
temp[(i*width)+j] = pixelBuffer[(width*height)-(i*width)+j];
}
}
delete pixelBuffer;
return temp;
}
Thanks in advance!
Change:
temp[(i*width)+j] = pixelBuffer[(width*height)-(i*width)+j];
to:
temp[(i*width)+j] = pixelBuffer[(width*height)-(i*width)+j - 1];
(Hint: think about what happens when i = 0 and j = 0 !)
And here's how you can optimize this code, to save memory and for cycles:
Rgba* readRgba(const char filename[], int& width, int& height)
{
Rgba* pixelBuffer = new Rgba[width * height];
Rgba tempPixel;
// ....EXR Loading code....
// Flip the image to conform with OpenGL coordinates.
for (int i = 0; i <= height/2; i++)
for(int j = 0; j < width && (i*width + j) <= (height*width/2); j++)
{
tempPixel = pixelBuffer[i*width + j];
pixelBuffer[i*width + j] = pixelBuffer[height*width - (i*width + j) -1];
pixelBuffer[height*width - (i*width + j) -1] = tempPixel;
}
return pixelBuffer;
}
Note that optimal (from a memory usage best practices point of view) would be to pass pixelBuffer* as a parameter and already allocated. It's a good practice to allocate and release the memory in the same piece of code.

Why is free() bogging my program down?

I am using free to free the memory allocated for a bunch of temporary arrays in a recursive function. I would post the code but it is pretty long. When I comment out these free() calls, the program runs in less than a second. However, when I am using them, the programs takes about 20 seconds to run. Why is this happening, and how can it be fixed? This is like 100 or so MB so I'd rather not just leave the memory leak.
Additionally, when I run the program that includes all of the free() calls with profiling enabled, it runs in less than a second. I don't know how that would have an effect, but it does.
After using only some of the free() calls, it seems that there are a few in particular that cause the program to slow down. The rest do not seem to have an effect.
Ok... here's the code as requested:
void KDTree::BuildBranch(int height, Mailbox** objs, int nObjects)
{
int dnObjects = nObjects * 2;
int dnmoObjects = dnObjects - 1;
//Check for termination
if(height == -1 || nObjects < minObjectsPerNode)
{
//Create leaf
tree[nodeIndex] = KDTreeNode();
if(nObjects == 1)
tree[nodeIndex].InitializeLeaf(objs[0], 1);
else
tree[nodeIndex].InitializeLeaf(objs, nObjects);
//Added a node, increment index
nodeIndex++;
return;
}
//Save this node's index and increment the current index to save space for this node
int thisNodeIndex = nodeIndex;
nodeIndex++;
//Allocate memory for split options
float* xMins = (float*)malloc(nObjects * sizeof(float));
float* yMins = (float*)malloc(nObjects * sizeof(float));
float* zMins = (float*)malloc(nObjects * sizeof(float));
float* xMaxs = (float*)malloc(nObjects * sizeof(float));
float* yMaxs = (float*)malloc(nObjects * sizeof(float));
float* zMaxs = (float*)malloc(nObjects * sizeof(float));
//Find all possible split locations
int index = 0;
BoundingBox* tempBox = new BoundingBox();
for(int i = 0; i < nObjects; i++)
{
//Get bounding box
objs[i]->prim->MakeBoundingBox(tempBox);
//Add mins to split lists
xMins[index] = tempBox->x0;
yMins[index] = tempBox->y0;
zMins[index] = tempBox->z0;
//Add maxs
xMaxs[index] = tempBox->x1;
yMaxs[index] = tempBox->y1;
zMaxs[index] = tempBox->z1;
index++;
}
//Sort lists
Util::sortFloats(xMins, nObjects);
Util::sortFloats(yMins, nObjects);
Util::sortFloats(zMins, nObjects);
Util::sortFloats(xMaxs, nObjects);
Util::sortFloats(yMaxs, nObjects);
Util::sortFloats(zMaxs, nObjects);
//Allocate bin lists
Bin* xLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* xRight = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* yLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* yRight = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* zLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* zRight = (Bin*)malloc(dnObjects * sizeof(Bin));
//Initialize all bins
for(int i = 0; i < dnObjects; i++)
{
xLeft[i] = Bin(0, 0.0f);
xRight[i] = Bin(0, 0.0f);
yLeft[i] = Bin(0, 0.0f);
yRight[i] = Bin(0, 0.0f);
zLeft[i] = Bin(0, 0.0f);
zRight[i] = Bin(0, 0.0f);
}
//Construct min and max bins bins from split locations
//Merge min/max lists together for each axis
int minIndex = 0, maxIndex = 0;
for(int i = 0; i < dnObjects; i++)
{
if(maxIndex == nObjects || (xMins[minIndex] <= xMaxs[maxIndex] && minIndex != nObjects))
{
//Add split location to both bin lists
xLeft[i].rightEdge = xMins[minIndex];
xRight[i].rightEdge = xMins[minIndex];
//Add geometry to mins counter
xLeft[i+1].objectBoundCounter++;
minIndex++;
}
else
{
//Add split location to both bin lists
xLeft[i].rightEdge = xMaxs[maxIndex];
xRight[i].rightEdge = xMaxs[maxIndex];
//Add geometry to maxs counter
xRight[i].objectBoundCounter++;
maxIndex++;
}
}
//Repeat for y axis
minIndex = 0, maxIndex = 0;
for(int i = 0; i < dnObjects; i++)
{
if(maxIndex == nObjects || (yMins[minIndex] <= yMaxs[maxIndex] && minIndex != nObjects))
{
//Add split location to both bin lists
yLeft[i].rightEdge = yMins[minIndex];
yRight[i].rightEdge = yMins[minIndex];
//Add geometry to mins counter
yLeft[i+1].objectBoundCounter++;
minIndex++;
}
else
{
//Add split location to both bin lists
yLeft[i].rightEdge = yMaxs[maxIndex];
yRight[i].rightEdge = yMaxs[maxIndex];
//Add geometry to maxs counter
yRight[i].objectBoundCounter++;
maxIndex++;
}
}
//Repeat for z axis
minIndex = 0, maxIndex = 0;
for(int i = 0; i < dnObjects; i++)
{
if(maxIndex == nObjects || (zMins[minIndex] <= zMaxs[maxIndex] && minIndex != nObjects))
{
//Add split location to both bin lists
zLeft[i].rightEdge = zMins[minIndex];
zRight[i].rightEdge = zMins[minIndex];
//Add geometry to mins counter
zLeft[i+1].objectBoundCounter++;
minIndex++;
}
else
{
//Add split location to both bin lists
zLeft[i].rightEdge = zMaxs[maxIndex];
zRight[i].rightEdge = zMaxs[maxIndex];
//Add geometry to maxs counter
zRight[i].objectBoundCounter++;
maxIndex++;
}
}
//Free split memory
free(xMins);
free(xMaxs);
free(yMins);
free(yMaxs);
free(zMins);
free(zMaxs);
//PreCalcs
float voxelL = xRight[dnmoObjects].rightEdge - xLeft[0].rightEdge;
float voxelD = zRight[dnmoObjects].rightEdge - zLeft[0].rightEdge;
float voxelH = yRight[dnmoObjects].rightEdge - yLeft[0].rightEdge;
float voxelSA = 2.0f * voxelL * voxelD + 2.0f * voxelL * voxelH + 2.0f * voxelD * voxelH;
//Minimum cost preset to no split at all
float minCost = (float)nObjects;
float splitLoc;
int minLeftCounter = 0, minRightCounter = 0;
int axis = -1;
//---------------------------------------------------------------------------------------------
//Check costs of x-axis split planes keeping track of derivative using
//the fact that there is a minimum point on the graph costs vs split location
//Since there is one object per split plane
int splitIndex = 1;
float lastCost = nObjects * voxelL;
float tempCost;
float lastSplit = xLeft[1].rightEdge;
int leftCount = xLeft[1].objectBoundCounter, rightCount = nObjects - xRight[1].objectBoundCounter;
int lastLO = 0, lastRO = nObjects;
//Keep looping while cost is decreasing
while(splitIndex < dnObjects)
{
tempCost = leftCount * (xLeft[splitIndex].rightEdge - xLeft[0].rightEdge) + rightCount * (xLeft[dnmoObjects].rightEdge - xLeft[splitIndex].rightEdge);
if(tempCost < lastCost)
{
lastCost = tempCost;
lastSplit = xLeft[splitIndex].rightEdge;
lastLO = leftCount;
lastRO = rightCount;
}
//Update counters
splitIndex++;
leftCount += xLeft[splitIndex].objectBoundCounter;
rightCount -= xRight[splitIndex].objectBoundCounter;
}
//Calculate full SAH cost
lastCost = ((lastLO * (2 * (lastSplit - xLeft[0].rightEdge) * voxelD + 2 * (lastSplit - xLeft[0].rightEdge) * voxelH + 2 * voxelD * voxelH)) + (lastRO * (2 * (xLeft[dnmoObjects].rightEdge - lastSplit) * voxelD + 2 * (xLeft[dnmoObjects].rightEdge - lastSplit) * voxelH + 2 * voxelD * voxelH))) / voxelSA;
if(lastCost < minCost)
{
minCost = lastCost;
splitLoc = lastSplit;
minLeftCounter = lastLO;
minRightCounter = lastRO;
axis = 0;
}
//---------------------------------------------------------------------------------------------
//Repeat for y axis
splitIndex = 1;
lastCost = nObjects * voxelH;
lastSplit = yLeft[1].rightEdge;
leftCount = yLeft[1].objectBoundCounter;
rightCount = nObjects - yRight[1].objectBoundCounter;
lastLO = 0;
lastRO = nObjects;
//Keep looping while cost is decreasing
while(splitIndex < dnObjects)
{
tempCost = leftCount * (yLeft[splitIndex].rightEdge - yLeft[0].rightEdge) + rightCount * (yLeft[dnmoObjects].rightEdge - yLeft[splitIndex].rightEdge);
if(tempCost < lastCost)
{
lastCost = tempCost;
lastSplit = yLeft[splitIndex].rightEdge;
lastLO = leftCount;
lastRO = rightCount;
}
//Update counters
splitIndex++;
leftCount += yLeft[splitIndex].objectBoundCounter;
rightCount -= yRight[splitIndex].objectBoundCounter;
}
//Calculate full SAH cost
lastCost = ((lastLO * (2 * (lastSplit - yLeft[0].rightEdge) * voxelD + 2 * (lastSplit - yLeft[0].rightEdge) * voxelL + 2 * voxelD * voxelL)) + (lastRO * (2 * (yLeft[dnmoObjects].rightEdge - lastSplit) * voxelD + 2 * (yLeft[dnmoObjects].rightEdge - lastSplit) * voxelL + 2 * voxelD * voxelL))) / voxelSA;
if(lastCost < minCost)
{
minCost = lastCost;
splitLoc = lastSplit;
minLeftCounter = lastLO;
minRightCounter = lastRO;
axis = 1;
}
//---------------------------------------------------------------------------------------------
//Repeat for z axis
splitIndex = 1;
lastCost = nObjects * voxelD;
lastSplit = zLeft[1].rightEdge;
leftCount = zLeft[1].objectBoundCounter;
rightCount = nObjects - zRight[1].objectBoundCounter;
lastLO = 0;
lastRO = nObjects;
//Keep looping while cost is decreasing
while(splitIndex < dnObjects)
{
tempCost = leftCount * (zLeft[splitIndex].rightEdge - zLeft[0].rightEdge) + rightCount * (zLeft[dnmoObjects].rightEdge - zLeft[splitIndex].rightEdge);
if(tempCost < lastCost)
{
lastCost = tempCost;
lastSplit = zLeft[splitIndex].rightEdge;
lastLO = leftCount;
lastRO = rightCount;
}
//Update counters
splitIndex++;
leftCount += zLeft[splitIndex].objectBoundCounter;
rightCount -= zRight[splitIndex].objectBoundCounter;
}
//Calculate full SAH cost
lastCost = ((lastLO * (2 * (lastSplit - zLeft[0].rightEdge) * voxelL + 2 * (lastSplit - zLeft[0].rightEdge) * voxelH + 2 * voxelH * voxelL)) + (lastRO * (2 * (zLeft[dnmoObjects].rightEdge - lastSplit) * voxelL + 2 * (zLeft[dnmoObjects].rightEdge - lastSplit) * voxelH + 2 * voxelH * voxelL))) / voxelSA;
if(lastCost < minCost)
{
minCost = lastCost;
splitLoc = lastSplit;
minLeftCounter = lastLO;
minRightCounter = lastRO;
axis = 2;
}
//Free bin memory
free(xLeft);
free(xRight);
free(yLeft);
free(yRight);
free(zLeft);
free(zRight);
//---------------------------------------------------------------------------------------------
//Make sure a split is in our best interest
if(axis == -1)
{
//If not decrement the node counter
nodeIndex--;
BuildBranch(-1, objs, nObjects);
return;
}
//Allocate space for left and right lists
Mailbox** leftList = (Mailbox**)malloc(minLeftCounter * sizeof(void*));
Mailbox** rightList = (Mailbox**)malloc(minRightCounter * sizeof(void*));
//Sort objects into lists of those to the left and right of the split plane
int leftIndex = 0, rightIndex = 0;
leftCount = 0;
rightCount = 0;
switch(axis)
{
case 0:
for(int i = 0; i < nObjects; i++)
{
//Get object bounding box
objs[i]->prim->MakeBoundingBox(tempBox);
//Add to left and right lists when necessary
if(tempBox->x0 < splitLoc)
{
leftList[leftIndex++] = objs[i];
leftCount++;
}
if(tempBox->x1 > splitLoc)
{
rightList[rightIndex++] = objs[i];
rightCount++;
}
}
break;
case 1:
for(int i = 0; i < nObjects; i++)
{
//Get object bounding box
objs[i]->prim->MakeBoundingBox(tempBox);
//Add to left and right lists when necessary
if(tempBox->y0 < splitLoc)
{
leftList[leftIndex++] = objs[i];
leftCount++;
}
if(tempBox->y1 > splitLoc)
{
rightList[rightIndex++] = objs[i];
rightCount++;
}
}
break;
case 2:
for(int i = 0; i < nObjects; i++)
{
//Get object bounding box
objs[i]->prim->MakeBoundingBox(tempBox);
//Add to left and right lists when necessary
if(tempBox->z0 < splitLoc)
{
leftList[leftIndex++] = objs[i];
leftCount++;
}
if(tempBox->z1 > splitLoc)
{
rightList[rightIndex++] = objs[i];
rightCount++;
}
}
break;
};
//Delete the bounding box
delete tempBox;
//Delete old objects array
free(objs);
//Construct left and right branches
BuildBranch(height - 1, leftList, leftCount);
BuildBranch(height - 1, rightList, rightCount);
//Build this node
tree[thisNodeIndex] = KDTreeNode();
tree[thisNodeIndex].InitializeInterior(axis, splitLoc, nodeIndex - 1);
return;
}
EDIT:
Ok well I tried to replace the malloc/free with new/delete and that had no effect on the speed. I also found that it is only the free() on xLeft/xRight arrays that seem to affect the execution time significantly. I was able to eliminate the problem by moving the free() calls to after the recursive calls, although I do not know why this is making a difference because I don't see anywhere that these arrays are used after the original location for free(). As for why I am using malloc... some portions of this program use cache aligned memory, so I had been using _aligned_malloc. Although there probably is a way to get new to cache align, this is the only way I know to do it.
Is it possible that you are linking against a debug version of the runtime library that is doing something extra in free() like filling the memory with a garbage value? I have seen this behavior when you link against overly aggressive memory debugging libraries. The code that you have posted does not look strange. I would be interested to know what would happen if you replaced the arrays with std::vector or std::deque though. Vector should have behavior quite similar to the arrays and Deque may actually improve the speed a little if the arrays are large because the memory manager will not have to guarantee contiguous space.
If your program doing all of the free()ing on exit, then you might as well just skip the calls. The entire process heap is freed when you app exits.
Edit: ----
Ok, now that the code is posted, it appears to me that you aren't just freeing on exit, so you should definitely try and figure out if this is a wierd symptom of a bug, or just a costly implementation of free(). Instead of removing the free() calls, time how long it takes to execute them. is the heap manager really using up the whole 19 seconds?
I do see several places were multiple allocations have the same scope and lifetime. You could turn these into a single malloc/free call, althought that would make the code less clear and harder to mantain. So you have to ask yourself, how much does that 20 seconds matter?
Probably just the behavior of the heap manager your CRT uses. It's probably updating free lists, or some other internal structure to manage memory.
You probably should reexamine how your program allocates and uses memory if your bottleneck is here.
Having had a look at the code one big thing that comes to my mind is this - mixture of malloc(...), new(...), delete(...), free(...)
BoundingBox* tempBox = new BoundingBox();
// ....
//Delete the bounding box
delete tempBox;
yet in other places you have
Bin* xLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
// ....
free(xMins);
In short, you are mixing the C++'s runtime in calling new(...) and delete(...) with malloc(...) and free(...).. After all, this is in C++, so a question for you here...
Why did you use the malloc(...) and free(...) which is from C in the middle of this C++ code? The repercussions I could see here, is that the C++ runtime is different in terms of using the memory allocation unlike C in the aspect of OOP paradigm.
Having said this, your best bet is:
Replace all calls to malloc with new.
Replace all calls to free with delete.
Re run the program again and see if that makes a different. Can you confirm this?
Hope this helps,
Best regards,
Tom.
+1 to malloc/free making my eyes hurt in C++. Ignoring that for a second and looking at the code, three ideas:
Roll up your malloc calls to one large malloc and free (for the x/y/left/right/etc structures) instead of 12. Set the pointers into this large buffer as appropriate.
Still talking about the x/y/left/right variables: Employ a small stack based buffer, that you can use when the number of objects is small. When the number of objects is large, then dynamically allocate. When it is not, just set your pointer to the local stack buffer. This can avoid dynamic memory management all together for small inputs.
Right now, your "object" list is dynamically allocated, freed, and reallocated with each recursive call (!!). This is confusing because ownership isn't clear; but also it's a performance issue. Consider reworking the code so one list of "objects" is ever used.
C++ stores some extra information when you allocate using new like the type of the object or number of characters(in case of array) etc..If you are using free, it could be a fragmentation problem where you are actually deleting only the chunks of data in between but not freeing the actual information stored by new. Just a thought.
When you corrupt the heap, it often becomes very slow. Try to run it in debug mode with debug version of your runtime as well.
It could be poor locality of reference for your code. For example, I see the following:
//Allocate memory for split options
float* xMins = (float*)malloc(nObjects * sizeof(float));
float* yMins = (float*)malloc(nObjects * sizeof(float));
float* zMins = (float*)malloc(nObjects * sizeof(float));
float* xMaxs = (float*)malloc(nObjects * sizeof(float));
float* yMaxs = (float*)malloc(nObjects * sizeof(float));
float* zMaxs = (float*)malloc(nObjects * sizeof(float));
...
free(xMins);
free(xMaxs);
free(yMins);
free(yMaxs);
free(zMins);
free(zMaxs);
Now, assuming that the allocations proceed basically linearly, then free(xMaxs); may need to dereference memory that was allocated some number of pages away from xMins (which was just dereferenced during free(xMins);), so you might need to swap in a page from the backing store in order to perform the free (which causes a huge slowdown in execution when that happens). Re-ordering the free()'s to match the allocation order could help... In this case, that'd mean
free(xMins);
free(yMins);
free(zMins);
free(xMaxs);
free(yMaxs);
free(zMaxs);
It sounds like you are running your program from a debugger in Windows, which by default causes a special debug heap to be used, which dramatically slows down memory deallocations. This applies even to non-debug builds, as long as they are launched from a debugger (such as Visual Studio). You should be able to disable this behavior by setting the environment variable _NO_DEBUG_HEAP=1 before running your program (I recommend setting it in the project configuration settings rather than in the system settings, if possible).
You didn't describe anything about your programming environment in the original question, however, so I had to make certain assumptions about it that might be wrong. If you're not running your program under Windows, for example, then my answer doesn't apply and I have no idea what the cause of your problem might be.