Why is free() bogging my program down? - c++

I am using free to free the memory allocated for a bunch of temporary arrays in a recursive function. I would post the code but it is pretty long. When I comment out these free() calls, the program runs in less than a second. However, when I am using them, the programs takes about 20 seconds to run. Why is this happening, and how can it be fixed? This is like 100 or so MB so I'd rather not just leave the memory leak.
Additionally, when I run the program that includes all of the free() calls with profiling enabled, it runs in less than a second. I don't know how that would have an effect, but it does.
After using only some of the free() calls, it seems that there are a few in particular that cause the program to slow down. The rest do not seem to have an effect.
Ok... here's the code as requested:
void KDTree::BuildBranch(int height, Mailbox** objs, int nObjects)
int dnObjects = nObjects * 2;
int dnmoObjects = dnObjects - 1;
//Check for termination
if(height == -1 || nObjects < minObjectsPerNode)
//Create leaf
tree[nodeIndex] = KDTreeNode();
if(nObjects == 1)
tree[nodeIndex].InitializeLeaf(objs[0], 1);
tree[nodeIndex].InitializeLeaf(objs, nObjects);
//Added a node, increment index
//Save this node's index and increment the current index to save space for this node
int thisNodeIndex = nodeIndex;
//Allocate memory for split options
float* xMins = (float*)malloc(nObjects * sizeof(float));
float* yMins = (float*)malloc(nObjects * sizeof(float));
float* zMins = (float*)malloc(nObjects * sizeof(float));
float* xMaxs = (float*)malloc(nObjects * sizeof(float));
float* yMaxs = (float*)malloc(nObjects * sizeof(float));
float* zMaxs = (float*)malloc(nObjects * sizeof(float));
//Find all possible split locations
int index = 0;
BoundingBox* tempBox = new BoundingBox();
for(int i = 0; i < nObjects; i++)
//Get bounding box
//Add mins to split lists
xMins[index] = tempBox->x0;
yMins[index] = tempBox->y0;
zMins[index] = tempBox->z0;
//Add maxs
xMaxs[index] = tempBox->x1;
yMaxs[index] = tempBox->y1;
zMaxs[index] = tempBox->z1;
//Sort lists
Util::sortFloats(xMins, nObjects);
Util::sortFloats(yMins, nObjects);
Util::sortFloats(zMins, nObjects);
Util::sortFloats(xMaxs, nObjects);
Util::sortFloats(yMaxs, nObjects);
Util::sortFloats(zMaxs, nObjects);
//Allocate bin lists
Bin* xLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* xRight = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* yLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* yRight = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* zLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* zRight = (Bin*)malloc(dnObjects * sizeof(Bin));
//Initialize all bins
for(int i = 0; i < dnObjects; i++)
xLeft[i] = Bin(0, 0.0f);
xRight[i] = Bin(0, 0.0f);
yLeft[i] = Bin(0, 0.0f);
yRight[i] = Bin(0, 0.0f);
zLeft[i] = Bin(0, 0.0f);
zRight[i] = Bin(0, 0.0f);
//Construct min and max bins bins from split locations
//Merge min/max lists together for each axis
int minIndex = 0, maxIndex = 0;
for(int i = 0; i < dnObjects; i++)
if(maxIndex == nObjects || (xMins[minIndex] <= xMaxs[maxIndex] && minIndex != nObjects))
//Add split location to both bin lists
xLeft[i].rightEdge = xMins[minIndex];
xRight[i].rightEdge = xMins[minIndex];
//Add geometry to mins counter
//Add split location to both bin lists
xLeft[i].rightEdge = xMaxs[maxIndex];
xRight[i].rightEdge = xMaxs[maxIndex];
//Add geometry to maxs counter
//Repeat for y axis
minIndex = 0, maxIndex = 0;
for(int i = 0; i < dnObjects; i++)
if(maxIndex == nObjects || (yMins[minIndex] <= yMaxs[maxIndex] && minIndex != nObjects))
//Add split location to both bin lists
yLeft[i].rightEdge = yMins[minIndex];
yRight[i].rightEdge = yMins[minIndex];
//Add geometry to mins counter
//Add split location to both bin lists
yLeft[i].rightEdge = yMaxs[maxIndex];
yRight[i].rightEdge = yMaxs[maxIndex];
//Add geometry to maxs counter
//Repeat for z axis
minIndex = 0, maxIndex = 0;
for(int i = 0; i < dnObjects; i++)
if(maxIndex == nObjects || (zMins[minIndex] <= zMaxs[maxIndex] && minIndex != nObjects))
//Add split location to both bin lists
zLeft[i].rightEdge = zMins[minIndex];
zRight[i].rightEdge = zMins[minIndex];
//Add geometry to mins counter
//Add split location to both bin lists
zLeft[i].rightEdge = zMaxs[maxIndex];
zRight[i].rightEdge = zMaxs[maxIndex];
//Add geometry to maxs counter
//Free split memory
float voxelL = xRight[dnmoObjects].rightEdge - xLeft[0].rightEdge;
float voxelD = zRight[dnmoObjects].rightEdge - zLeft[0].rightEdge;
float voxelH = yRight[dnmoObjects].rightEdge - yLeft[0].rightEdge;
float voxelSA = 2.0f * voxelL * voxelD + 2.0f * voxelL * voxelH + 2.0f * voxelD * voxelH;
//Minimum cost preset to no split at all
float minCost = (float)nObjects;
float splitLoc;
int minLeftCounter = 0, minRightCounter = 0;
int axis = -1;
//Check costs of x-axis split planes keeping track of derivative using
//the fact that there is a minimum point on the graph costs vs split location
//Since there is one object per split plane
int splitIndex = 1;
float lastCost = nObjects * voxelL;
float tempCost;
float lastSplit = xLeft[1].rightEdge;
int leftCount = xLeft[1].objectBoundCounter, rightCount = nObjects - xRight[1].objectBoundCounter;
int lastLO = 0, lastRO = nObjects;
//Keep looping while cost is decreasing
while(splitIndex < dnObjects)
tempCost = leftCount * (xLeft[splitIndex].rightEdge - xLeft[0].rightEdge) + rightCount * (xLeft[dnmoObjects].rightEdge - xLeft[splitIndex].rightEdge);
if(tempCost < lastCost)
lastCost = tempCost;
lastSplit = xLeft[splitIndex].rightEdge;
lastLO = leftCount;
lastRO = rightCount;
//Update counters
leftCount += xLeft[splitIndex].objectBoundCounter;
rightCount -= xRight[splitIndex].objectBoundCounter;
//Calculate full SAH cost
lastCost = ((lastLO * (2 * (lastSplit - xLeft[0].rightEdge) * voxelD + 2 * (lastSplit - xLeft[0].rightEdge) * voxelH + 2 * voxelD * voxelH)) + (lastRO * (2 * (xLeft[dnmoObjects].rightEdge - lastSplit) * voxelD + 2 * (xLeft[dnmoObjects].rightEdge - lastSplit) * voxelH + 2 * voxelD * voxelH))) / voxelSA;
if(lastCost < minCost)
minCost = lastCost;
splitLoc = lastSplit;
minLeftCounter = lastLO;
minRightCounter = lastRO;
axis = 0;
//Repeat for y axis
splitIndex = 1;
lastCost = nObjects * voxelH;
lastSplit = yLeft[1].rightEdge;
leftCount = yLeft[1].objectBoundCounter;
rightCount = nObjects - yRight[1].objectBoundCounter;
lastLO = 0;
lastRO = nObjects;
//Keep looping while cost is decreasing
while(splitIndex < dnObjects)
tempCost = leftCount * (yLeft[splitIndex].rightEdge - yLeft[0].rightEdge) + rightCount * (yLeft[dnmoObjects].rightEdge - yLeft[splitIndex].rightEdge);
if(tempCost < lastCost)
lastCost = tempCost;
lastSplit = yLeft[splitIndex].rightEdge;
lastLO = leftCount;
lastRO = rightCount;
//Update counters
leftCount += yLeft[splitIndex].objectBoundCounter;
rightCount -= yRight[splitIndex].objectBoundCounter;
//Calculate full SAH cost
lastCost = ((lastLO * (2 * (lastSplit - yLeft[0].rightEdge) * voxelD + 2 * (lastSplit - yLeft[0].rightEdge) * voxelL + 2 * voxelD * voxelL)) + (lastRO * (2 * (yLeft[dnmoObjects].rightEdge - lastSplit) * voxelD + 2 * (yLeft[dnmoObjects].rightEdge - lastSplit) * voxelL + 2 * voxelD * voxelL))) / voxelSA;
if(lastCost < minCost)
minCost = lastCost;
splitLoc = lastSplit;
minLeftCounter = lastLO;
minRightCounter = lastRO;
axis = 1;
//Repeat for z axis
splitIndex = 1;
lastCost = nObjects * voxelD;
lastSplit = zLeft[1].rightEdge;
leftCount = zLeft[1].objectBoundCounter;
rightCount = nObjects - zRight[1].objectBoundCounter;
lastLO = 0;
lastRO = nObjects;
//Keep looping while cost is decreasing
while(splitIndex < dnObjects)
tempCost = leftCount * (zLeft[splitIndex].rightEdge - zLeft[0].rightEdge) + rightCount * (zLeft[dnmoObjects].rightEdge - zLeft[splitIndex].rightEdge);
if(tempCost < lastCost)
lastCost = tempCost;
lastSplit = zLeft[splitIndex].rightEdge;
lastLO = leftCount;
lastRO = rightCount;
//Update counters
leftCount += zLeft[splitIndex].objectBoundCounter;
rightCount -= zRight[splitIndex].objectBoundCounter;
//Calculate full SAH cost
lastCost = ((lastLO * (2 * (lastSplit - zLeft[0].rightEdge) * voxelL + 2 * (lastSplit - zLeft[0].rightEdge) * voxelH + 2 * voxelH * voxelL)) + (lastRO * (2 * (zLeft[dnmoObjects].rightEdge - lastSplit) * voxelL + 2 * (zLeft[dnmoObjects].rightEdge - lastSplit) * voxelH + 2 * voxelH * voxelL))) / voxelSA;
if(lastCost < minCost)
minCost = lastCost;
splitLoc = lastSplit;
minLeftCounter = lastLO;
minRightCounter = lastRO;
axis = 2;
//Free bin memory
//Make sure a split is in our best interest
if(axis == -1)
//If not decrement the node counter
BuildBranch(-1, objs, nObjects);
//Allocate space for left and right lists
Mailbox** leftList = (Mailbox**)malloc(minLeftCounter * sizeof(void*));
Mailbox** rightList = (Mailbox**)malloc(minRightCounter * sizeof(void*));
//Sort objects into lists of those to the left and right of the split plane
int leftIndex = 0, rightIndex = 0;
leftCount = 0;
rightCount = 0;
case 0:
for(int i = 0; i < nObjects; i++)
//Get object bounding box
//Add to left and right lists when necessary
if(tempBox->x0 < splitLoc)
leftList[leftIndex++] = objs[i];
if(tempBox->x1 > splitLoc)
rightList[rightIndex++] = objs[i];
case 1:
for(int i = 0; i < nObjects; i++)
//Get object bounding box
//Add to left and right lists when necessary
if(tempBox->y0 < splitLoc)
leftList[leftIndex++] = objs[i];
if(tempBox->y1 > splitLoc)
rightList[rightIndex++] = objs[i];
case 2:
for(int i = 0; i < nObjects; i++)
//Get object bounding box
//Add to left and right lists when necessary
if(tempBox->z0 < splitLoc)
leftList[leftIndex++] = objs[i];
if(tempBox->z1 > splitLoc)
rightList[rightIndex++] = objs[i];
//Delete the bounding box
delete tempBox;
//Delete old objects array
//Construct left and right branches
BuildBranch(height - 1, leftList, leftCount);
BuildBranch(height - 1, rightList, rightCount);
//Build this node
tree[thisNodeIndex] = KDTreeNode();
tree[thisNodeIndex].InitializeInterior(axis, splitLoc, nodeIndex - 1);
Ok well I tried to replace the malloc/free with new/delete and that had no effect on the speed. I also found that it is only the free() on xLeft/xRight arrays that seem to affect the execution time significantly. I was able to eliminate the problem by moving the free() calls to after the recursive calls, although I do not know why this is making a difference because I don't see anywhere that these arrays are used after the original location for free(). As for why I am using malloc... some portions of this program use cache aligned memory, so I had been using _aligned_malloc. Although there probably is a way to get new to cache align, this is the only way I know to do it.

Is it possible that you are linking against a debug version of the runtime library that is doing something extra in free() like filling the memory with a garbage value? I have seen this behavior when you link against overly aggressive memory debugging libraries. The code that you have posted does not look strange. I would be interested to know what would happen if you replaced the arrays with std::vector or std::deque though. Vector should have behavior quite similar to the arrays and Deque may actually improve the speed a little if the arrays are large because the memory manager will not have to guarantee contiguous space.

If your program doing all of the free()ing on exit, then you might as well just skip the calls. The entire process heap is freed when you app exits.
Edit: ----
Ok, now that the code is posted, it appears to me that you aren't just freeing on exit, so you should definitely try and figure out if this is a wierd symptom of a bug, or just a costly implementation of free(). Instead of removing the free() calls, time how long it takes to execute them. is the heap manager really using up the whole 19 seconds?
I do see several places were multiple allocations have the same scope and lifetime. You could turn these into a single malloc/free call, althought that would make the code less clear and harder to mantain. So you have to ask yourself, how much does that 20 seconds matter?

Probably just the behavior of the heap manager your CRT uses. It's probably updating free lists, or some other internal structure to manage memory.
You probably should reexamine how your program allocates and uses memory if your bottleneck is here.

Having had a look at the code one big thing that comes to my mind is this - mixture of malloc(...), new(...), delete(...), free(...)
BoundingBox* tempBox = new BoundingBox();
// ....
//Delete the bounding box
delete tempBox;
yet in other places you have
Bin* xLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
// ....
In short, you are mixing the C++'s runtime in calling new(...) and delete(...) with malloc(...) and free(...).. After all, this is in C++, so a question for you here...
Why did you use the malloc(...) and free(...) which is from C in the middle of this C++ code? The repercussions I could see here, is that the C++ runtime is different in terms of using the memory allocation unlike C in the aspect of OOP paradigm.
Having said this, your best bet is:
Replace all calls to malloc with new.
Replace all calls to free with delete.
Re run the program again and see if that makes a different. Can you confirm this?
Hope this helps,
Best regards,

+1 to malloc/free making my eyes hurt in C++. Ignoring that for a second and looking at the code, three ideas:
Roll up your malloc calls to one large malloc and free (for the x/y/left/right/etc structures) instead of 12. Set the pointers into this large buffer as appropriate.
Still talking about the x/y/left/right variables: Employ a small stack based buffer, that you can use when the number of objects is small. When the number of objects is large, then dynamically allocate. When it is not, just set your pointer to the local stack buffer. This can avoid dynamic memory management all together for small inputs.
Right now, your "object" list is dynamically allocated, freed, and reallocated with each recursive call (!!). This is confusing because ownership isn't clear; but also it's a performance issue. Consider reworking the code so one list of "objects" is ever used.

C++ stores some extra information when you allocate using new like the type of the object or number of characters(in case of array) etc..If you are using free, it could be a fragmentation problem where you are actually deleting only the chunks of data in between but not freeing the actual information stored by new. Just a thought.

When you corrupt the heap, it often becomes very slow. Try to run it in debug mode with debug version of your runtime as well.

It could be poor locality of reference for your code. For example, I see the following:
//Allocate memory for split options
float* xMins = (float*)malloc(nObjects * sizeof(float));
float* yMins = (float*)malloc(nObjects * sizeof(float));
float* zMins = (float*)malloc(nObjects * sizeof(float));
float* xMaxs = (float*)malloc(nObjects * sizeof(float));
float* yMaxs = (float*)malloc(nObjects * sizeof(float));
float* zMaxs = (float*)malloc(nObjects * sizeof(float));
Now, assuming that the allocations proceed basically linearly, then free(xMaxs); may need to dereference memory that was allocated some number of pages away from xMins (which was just dereferenced during free(xMins);), so you might need to swap in a page from the backing store in order to perform the free (which causes a huge slowdown in execution when that happens). Re-ordering the free()'s to match the allocation order could help... In this case, that'd mean

It sounds like you are running your program from a debugger in Windows, which by default causes a special debug heap to be used, which dramatically slows down memory deallocations. This applies even to non-debug builds, as long as they are launched from a debugger (such as Visual Studio). You should be able to disable this behavior by setting the environment variable _NO_DEBUG_HEAP=1 before running your program (I recommend setting it in the project configuration settings rather than in the system settings, if possible).
You didn't describe anything about your programming environment in the original question, however, so I had to make certain assumptions about it that might be wrong. If you're not running your program under Windows, for example, then my answer doesn't apply and I have no idea what the cause of your problem might be.


Function started with std::async crashes after quite a few iterations

I am trying to develop a simple evolution algorithm in C++. To make my calculations faster I decided to use async functions to run multiple calculations at once:
std::vector<std::future<int> > compute(8);
unsigned nptr = 0;
int syncp = 0;
while(nptr != network::networks.size()){
compute.at(syncp) = std::async(&network::analyse, &network::networks.at(nptr), data, width, height, sw, dFnum.at(idx));
if(syncp == 8){
syncp = 0;
for(unsigned i = 0; i < 8; i++){
This is how I start my calculating function. The function is called analyse, and for each "network" it assigns a score depending on how good it identifies the image.
This is part of the analyse function:
for(unsigned i = 0; i < entry.size(); i++){
double sum = 0;
data * d = &entry.at(i);
pattern * p = &pattern::patterns.at(d->patNo);
int sx = iWidth;
int sy = iHeight;
if(d->xPercentage*iWidth + d->xSpan*iWidth < sx) sx = d->xPercentage*iWidth + d->xSpan*iWidth;
if(d->yPercentage*iHeight + d->xSpan*iWidth < sy) sy = d->yPercentage*iHeight + d->xSpan*iWidth;
int xdisp = sx-d->xPercentage*iWidth;
int ydisp = sy-d->yPercentage*iHeight;
for(int x = d->xPercentage*iWidth; x < sx; x++){
for(int y = d->yPercentage*iHeight; y < sy; y++){
double xpl = x-d->xPercentage*iWidth;
double ypl = y-d->yPercentage*iHeight;
xpl /= xdisp;
ypl /= ydisp;
unsigned idx = (unsigned)(xpl*(p->width) + ypl*(p->height)*(p->width));
if(idx >= p->lweight.size()) idx = p->lweight.size()-1;
double weight = p->lweight.at(idx) - 5;
sum += weight;
sum -= 2*weight;
digitWeight[d->digit-1] += sum;
Now, there is no need to analyse the function itself - I'm sure it works, I have tested it on a single thread, and it runs just fine. The only problem is, after some time of execution, I get errors like segmentation fault, or vector range check error.
They mostly happen at this line:
digitWeight[d->digit-1] += sum;
Now, you can be sure that d->digit-1 is a valid range for this array.
The problem is that the value of the d pointer is different than it was here:
data * d = &entry.at(i);
It magically changes during the execution of the function, and starts pointing to different data, leading to errors. I have tried saving the value of d->digit to some variable and later use this variable, and it worked fine for just a while longer, before crashing on another shared resource, imageData this time.
I'm thinking this might be something related to data sharing - all async functions share the same array of data - it's a static vector. But this data is only read, not written anywhere, so why would it stop working? I know of something called mutex locking, but this would make no sense to lock this async functions, as it would run just as slow as a single threaded program would run.
I have also tried running the functions like this:
std::vector<std::thread*> threads(8);
unsigned nptr = 0;
int threadp = 0;
while(nptr != network::networks.size()){
threads.at(threadp) = new std::thread(&network::analyse, &network::networks.at(nptr), data, width, height, sw, dFnum.at(idx));
if(threadp == 8){
threadp = 0;
for(unsigned i = 0; i < 8; i++){
if(threads.at(i)->joinable()) threads.at(i)->join();
delete threads.at(i);
and it did work for a second, but after some time a very similar error appeared.
Data is a structure containing 7 integers, one of which is an ID of
pattern, and pattern is a class that contains two integers - width and height
and vector of chars.
Why does it happen on read-only data and how can I prevent it?
Here is an example of what happens:

FFT Spectrum not displaying correctly

I'm currently trying to display an audio spectrum using FFTW3 and SFML. I've followed the directions found here and looked at numerous references on FFT and spectrums and FFTW yet somehow my bars are almost all aligned to the left like below. Another issue I'm having is I can't find information on what the scale of the FFT output is. Currently I'm dividing it by 64 yet it still reaches beyond that occasionally. And further still I have found no information on why the output of the from FFTW has to be the same size as the input. So my questions are:
Why is the majority of my spectrum aligned to the left unlike the image below mine?
Why isn't the output between 0.0 and 1.0?
Why is the input sample count related to the fft output count?
What I get:
What I'm looking for:
const int bufferSize = 256 * 8;
void init() {
sampleCount = (int)buffer.getSampleCount();
channelCount = (int)buffer.getChannelCount();
for (int i = 0; i < bufferSize; i++) {
window.push_back(0.54f - 0.46f * cos(2.0f * GMath::PI * (float)i / (float)bufferSize));
plan = fftwf_plan_dft_1d(bufferSize, signal, results, FFTW_FORWARD, FFTW_ESTIMATE);
void update() {
int mark = (int)(sound.getPlayingOffset().asSeconds() * sampleRate);
for (int i = 0; i < bufferSize; i++) {
float s = 0.0f;
if (i + mark < sampleCount) {
s = (float)buffer.getSamples()[(i + mark) * channelCount] / (float)SHRT_MAX * window[i];
signal[i][0] = s;
signal[i][1] = 0.0f;
void draw() {
int inc = bufferSize / 2 / size.x;
int y = size.y - 1;
int max = size.y;
for (int i = 0; i < size.x; i ++) {
float total = 0.0f;
for (int j = 0; j < inc; j++) {
int index = i * inc + j;
total += std::sqrt(results[index][0] * results[index][0] + results[index][1] * results[index][1]);
total /= (float)(inc * 64);
Rectangle2I rect = Rectangle2I(i, y, 1, -(int)(total * max)).absRect();
g->setPixel(rect, Pixel(254, toColor(BLACK, GREEN)));
All of your questions are related to the FFT theory. Study the properties of FFT from any standard text/reference book and you will be able to answer your questions all by yourself only.
The least you can start from is here:
Many FFT implementations are energy preserving. That means the scale of the output is linearly related to the scale and/or size of the input.
An FFT is a DFT is a square matrix transform. So the number of outputs will always be equal to the number of inputs (or half that by ignoring the redundant complex conjugate half given strictly real input), unless some outputs are thrown away. If not, it's not an FFT. If you want less outputs, there are ways to downsample the FFT output or post process it in other ways.

C++ Heap Corruption: Local heap variable causing issues

I am working on some simple terrain with DirectX9 by manually assembling the verts for the ground.
On the part of my code where I set up the indices I get an error though:
Windows has triggered a breakpoint in test.exe.
This may be due to a corruption of the heap, which indicates a bug in test.exe or any of the DLLs it has loaded.
Here is the part of my code that is giving me problems, and I'm almost 100% sure that it is linked to my indices pointer, but I delete it when I'm finished... so I'm not sure what the problem is.
int total = widthQuads * heightQuads * 6;
DWORD *indices = new DWORD[totalIdx];
for (int y = 0; y < heightQuads; y++)
for (int x = 0; x < widthQuads; x++)
{ //Width of nine:
int lowerLeft = x + y * 9;
int lowerRight = (x + 1) + y * 9;
int topLeft = x + (y + 1) * 9;
int topRight = (x + 1) + (y + 1) * 9;
//First triangle:
indices[counter++] = topLeft;
indices[counter++] = lowerRight;
indices[counter++] = lowerLeft;
//Second triangle:
indices[counter++] = topLeft;
indices[counter++] = topRight;
indices[counter++] = lowerRight;
d3dDevice->CreateIndexBuffer(sizeof(DWORD)* total, 0, D3DFMT_INDEX16,
D3DPOOL_MANAGED, &groundindex, 0);
void* mem = 0;
groundindex->Lock(0, 0, &mem, 0);
memcpy(mem, indices, total * sizeof (DWORD));
delete[] indices;
When I remove this block my program runs OK.
The code you've given looks OK - with one caveat: the initial value of counter is not in the code itself. So either you don't start at counter = 0, or some other piece of code is stomping on your indices buffer.
That's the beauty of heap corruptions. There is no guarantee that the bug is in the removed portion on the code. It may simply hide the bug that exists somewhere else in your code.
int total = widthQuads * heightQuads * 6;
DWORD *indices = new DWORD[totalIdx];
Shouldn't you be doing "new DWORD[total];" here?

Sometimes I get EXEC_BAD_ACCESS (Access violation) when reversing an array

I am loading an image using the OpenEXR library.
This works fine, except the image is loaded rotated 180 degrees. I use the loop shown below to reverse the array but sometimes the program will quit and xcode will give me an EXEC_BAD_ACCESS error (Which I assume is the same as an access violation in msvc). It does not happen everytime, just once every 5-10 times.
Ideally I'd want to reverse the array in place, although that led to errors everytime and using memcpy would fail but without causing an error, just a blank image. I'd like to know what's causing this problem first.
Here is the code I am using: (Rgba is a struct of 4 "Half"s r, g, b, and a, defined in OpenEXR)
Rgba* readRgba(const char filename[], int& width, int& height){
Rgba* pixelBuffer = new Rgba[width * height];
Rgba* temp = new Rgba[width * height];
// ....EXR Loading code....
// TODO: *Sometimes* the following code results in a bad memory access error. No idea why.
// Flip the image to conform with OpenGL coordinates.
for (int i = 0; i < height; i++){
for(int j = 0; j < width; j++){
temp[(i*width)+j] = pixelBuffer[(width*height)-(i*width)+j];
delete pixelBuffer;
return temp;
Thanks in advance!
temp[(i*width)+j] = pixelBuffer[(width*height)-(i*width)+j];
temp[(i*width)+j] = pixelBuffer[(width*height)-(i*width)+j - 1];
(Hint: think about what happens when i = 0 and j = 0 !)
And here's how you can optimize this code, to save memory and for cycles:
Rgba* readRgba(const char filename[], int& width, int& height)
Rgba* pixelBuffer = new Rgba[width * height];
Rgba tempPixel;
// ....EXR Loading code....
// Flip the image to conform with OpenGL coordinates.
for (int i = 0; i <= height/2; i++)
for(int j = 0; j < width && (i*width + j) <= (height*width/2); j++)
tempPixel = pixelBuffer[i*width + j];
pixelBuffer[i*width + j] = pixelBuffer[height*width - (i*width + j) -1];
pixelBuffer[height*width - (i*width + j) -1] = tempPixel;
return pixelBuffer;
Note that optimal (from a memory usage best practices point of view) would be to pass pixelBuffer* as a parameter and already allocated. It's a good practice to allocate and release the memory in the same piece of code.

C++ vector element is different when accessed at different times

I'm developing a 3D game using SDL and OpenGL on Ubuntu 9.04 using Eclipse CDT. I've got a class to hold the mesh data in vectors for each type. Such as Vertex, Normal, UVcoord (texture coordinates), as well as a vector of faces. Each face has 3 int vectors which hold indexes to the other data. So far my game has been working quite well at rendering at nice rates. But then again I only had less then one hundred vertexes among two objects for testing purposes.
The loop accessing this data looks like this:
void RenderFace(oFace face)
* More Stuff
oVertice gvert;
oUVcoord tvert;
oNormal nvert;
for (unsigned int fvIndex = 0; fvIndex < face.GeoVerts.size(); fvIndex++)
gvert = obj.TheMesh.GetVertice(face.GeoVerts[fvIndex] - 1);
tvert = obj.TheMesh.GetUVcoord(face.UV_Verts[fvIndex] - 1);
nvert = obj.TheMesh.GetNormal(face.NrmVerts[fvIndex] - 1);
glNormal3f(nvert.X, nvert.Y, nvert.Z);
glTexCoord2f(tvert.U, tvert.V);
glVertex3f(scale * gvert.X, scale * gvert.Y, scale * gvert.Z);
* More Stuff
There is a loop that calls the renderFace() function which includes the above for loop. The minus one is because Wavefront .obj files are 1 indexed (instead of c++ 0 index). Anyway, I discovered that once you have about 30 thousand or so faces, all those calls to glVertex3f() and the like slow the game down to about 10 FPS. That I can't allow. So I learned about vertex arrays, which require pointers to arrays. Following the example of a NeHe tutorial I continued to use my oVertice class and the others. Which just have floats x, y, z, or u, v. So I added the same function above to my OnLoad() function to build the arrays which are just "oVertice*" and similar.
Here is the code:
bool oEntity::OnLoad(std::string FileName)
if (!obj.OnLoad(FileName))
return false;
unsigned int flsize = obj.TheMesh.GetFaceListSize();
obj.TheMesh.VertListPointer = new oVertice[flsize];
obj.TheMesh.UVlistPointer = new oUVcoord[flsize];
obj.TheMesh.NormListPointer = new oNormal[flsize];
oFace face = obj.TheMesh.GetFace(0);
oVertice gvert;
oUVcoord tvert;
oNormal nvert;
unsigned int counter = 0;
unsigned int temp = 0;
for (unsigned int flIndex = 0; flIndex < obj.TheMesh.GetFaceListSize(); flIndex++)
face = obj.TheMesh.GetFace(flIndex);
for (unsigned int fvIndex = 0; fvIndex < face.GeoVerts.size(); fvIndex++)
temp = face.GeoVerts[fvIndex];
gvert = obj.TheMesh.GetVertice(face.GeoVerts[fvIndex] - 1);
temp = face.UV_Verts[fvIndex];
tvert = obj.TheMesh.GetUVcoord(face.UV_Verts[fvIndex] - 1);
temp = face.NrmVerts[fvIndex];
nvert = obj.TheMesh.GetNormal(face.NrmVerts[fvIndex] - 1);
obj.TheMesh.VertListPointer[counter].X = gvert.X;
obj.TheMesh.VertListPointer[counter].Y = gvert.Y;
obj.TheMesh.VertListPointer[counter].Z = gvert.Z;
obj.TheMesh.UVlistPointer[counter].U = tvert.U;
obj.TheMesh.UVlistPointer[counter].V = tvert.V;
obj.TheMesh.NormListPointer[counter].X = nvert.X;
obj.TheMesh.NormListPointer[counter].Y = nvert.Y;
obj.TheMesh.NormListPointer[counter].Z = nvert.Z;
return true;
The unsigned int temp variable is for debugging purposes. Apparently I don't have a default constructor for oFace that doesn't have something to initialize with. Anyway, as you can see it's pretty much that same exact routine. Only instead of calling a gl function I add the data to three arrays.
Here's the kicker:
I'm loading a typical cube made of triangles.
When I access element 16 (0 indexed) of the UV_Verts vector from the RenderFace() function I get 12.
But when I access element 16 (0 indexed) of the same UV_Verts vector from the OnLoad() function I get something like 3045472189
I am so confused.
Does anyone know what's causing this? And if so how to resolve it?
One possible reason could be that you're creating arrays with size flsize:
obj.TheMesh.VertListPointer = new oVertice[flsize];
obj.TheMesh.UVlistPointer = new oUVcoord[flsize];
obj.TheMesh.NormListPointer = new oNormal[flsize];
but use the arrays with indices up to flsize * face.GeoVerts.size
for (...; flIndex < obj.TheMesh.GetFaceListSize(); ...) { // flsize = GetFaceListSize
for (...; fvIndex < face.GeoVerts.size(); ...) {
obj.TheMesh.UVlistPointer[counter].U = ...;
so your array creation code should actually be more like
obj.TheMesh.VertListPointer = new oVertice[flsize * face.GeoVerts.size()];
obj.TheMesh.UVlistPointer = new oUVcoord[flsize * face.GeoVerts.size()];
obj.TheMesh.NormListPointer = new oNormal[flsize * face.GeoVerts.size()];