I'v tried to solve a memory leak in the GLU callback by creating a global variable but now it dos not draw anything:
GLdouble *gluptr = NULL;
void CALLBACK combineCallback(GLdouble coords[3], GLdouble *vertex_data[4],
GLfloat weight[4], GLdouble **dataOut)
{
GLdouble *vertex;
if(gluptr == NULL)
{
gluptr = (GLdouble *) malloc(6 * sizeof(GLdouble));
}
vertex = (GLdouble*)gluptr;
vertex[0] = coords[0];
vertex[1] = coords[1];
vertex[2] = coords[2];
for (int i = 3; i < 6; i++)
{
vertex[i] = weight[0] * vertex_data[0][i] +
weight[1] * vertex_data[0][i] +
weight[2] * vertex_data[0][i] +
weight[3] * vertex_data[0][i];
}
*dataOut = vertex;
}
basically instead of doing malloc each time in the loop (thus the memory leak) im using a global pointer, but this doesn't work (drawing to the screen not working) which means dataOut is not receiving the vertex data pointed to by my pointer. Why would using malloc to a pointer created in the function work any different than a global variable?
Thanks
You allocate the data only once -- but GLUtesselator needs more than one set of data at a time!
What you do here, is putting all the vertex data into a single place in memory, where in the original code, you had memory per vertex. GLUtesselator needs more then one vertex to function properly.
You do call
void gluDeleteTess(GLUtesselator *tessobj);
...afterwards, do you?
Most likely the reason is that something outside of your callback is holding on to the returned data across calls to combineCallback(), and subsequent calls to combineCallback() clobber now the data from the older calls.
Looking at the code, you need to rewrite this, there's a few things wrong which shows the inherent lack of understanding pointers and using call-by-reference parameter such as dataOut, secondly, there is no checking on the call to malloc which can fail and WILL fail, the code blindly assumes that the memory is available, thirdly, you have redundant pointer variables used such as vertex and gluptr for a reason. You are actually trying to build up a block of memory by copying the contents from gluptr to vertex, and use the coords pointer-to-block of data type of 'GLDouble', then build up the vertex block of memory... and finally assign it back to dataOut...forgive me if I misunderstand but read on...
This is the code that has removed redundant variables as shown below and fixes up the lack of checking for a NULL pointer...
GLdouble *gluptr = NULL;
void CALLBACK combineCallback(GLdouble coords[3], GLdouble *vertex_data[4],
GLfloat weight[4], GLdouble **dataOut)
{
if((*dataOut) == NULL)
{
(*dataOut) = (GLdouble *) malloc(6 * sizeof(GLdouble));
}
if (*dataOut != NULL){
/* PASSED MEMORY ALLOC! */
(*dataOut)[0] = coords[0];
(*dataOut)[1] = coords[1];
(*dataOut)[2] = coords[2];
for (int i = 3; i < 6; i++)
{
(*dataOut)[i] = weight[0] * vertex_data[0][i] +
weight[1] * vertex_data[0][i] +
weight[2] * vertex_data[0][i] +
weight[3] * vertex_data[0][i];
}
}
}
The last parameter when calling this function combineCallback is a call-by-reference parameter, hence the usage of the double asterisk..
I must ask this, is dataOut definitely a fixed size of 6 elements? if so then the parameter would need to be tweaked up...to make it look like *(*dataOut[6])... looking at it top off my head (it's late and past my bedtime...)
Related
I'm making a small OpenGL program for my intro to C++ class in Uni. I have a program that is complete but I want to change it up a bit to make it more unique. I have a Cube class:
class Cube {
public:
Cube(Mesh* mesh, Texture2D* texture, float x, float y, float z);
~Cube();
void Draw();
void Update(float rSpeed);
Vector3 position;
private:
GLfloat rotationSpeed;
Vector3 rotationVector;
Mesh* _mesh;
Texture2D* _texture;
};
I then create an array of type Cube:
Cube* cubes[CUBE_AMOUNT];
I then fill each index of this array with data to draw the cube on screen later in the program:
for (int i = 0; i < CUBE_AMOUNT; i++) {
float x = ((rand() % 400) / 10.0f) - 20.0f;
float y = ((rand() % 200) / 10.0f) - 10.0f;
float z = -(rand() % 1000);
if (i % 2 == 1) {
cubes[i] = new Cube(cubeMesh, textureStars, x, y, z);
}
else {
cubes[i] = new Cube(cubeMesh, texturePenguins, x, y, z);
}
}
With this new thing I want to add to the program, I want to check whether an index of cubes[] has been filled with the data yet. However I keep getting exceptions when running. I have tried to check whether cubes[i] is equal to nullptr, and tried checking whether it is NULL too, but neither seem to match.
Sorry for any errors in terminology that I used. New to C++, and having come from only doing Python before this, it is confusing!
Solution:
When I create the array, I changed it to Cube* cubes[CUBE_AMOUNT] = { NULL }, and now when checking the array, cubes[i] == NULL!
If cubes is not a global variable, you can use:
Cube* cubes[CUBE_AMOUNT] = {};
to initialize all the elements to nullptr.
You can also use:
std::vector<std::unique_ptr<Cube>> cubes(CUBE_AMOUNT);
to remove the burden of having to deallocate dynamic memory in your code.
In either case, can use:
if ( cubes[index] )
{
// Got a valid pointer. Use it.
}
Your cubes variable is not automatically initialized with null_ptr's. Until you either fill it with null_ptr's or good pointers it initially points to random garbage.
I think this would work
//This bit should check if theres anything stored currently.
cout << "\nWhich Slot would you like to store the informaton in ?(1-10)";
cin >> i;
i--;
if (information[i] != NULL){
// Already written
cout << "THERES SOMETHING HERE";
}
else{
cout << "\nEMPTY!!!!!!!!!";
}
EDIT: This turned out to be an uninitialized variable creating chaotic behavior. See this post about getting more compiler warnings for JUCE
I was attempting to create a basic synthesizer and I quickly ran into an absurd problem when simply attempting to assign a value to a newly declared variable.
After following along with the JUCE simple sine synthesis tutorial I ran into the problem. This is the basic code of my getNextAudioBlock() function when it is producing white noise. Note how there are four integers declared and assigned throughout:
const int numChannels = bufferToFill.buffer->getNumChannels();
const int numSamples = bufferToFill.numSamples;
for (int channel = 0; channel < numChannels; channel++){
float* const buffer = bufferToFill.buffer -> getWritePointer(channel, bufferToFill.startSample);
for (int sample; sample < numSamples; sample++){
buffer[sample] = (randomGen.nextFloat() * 2.0f - 1.0f);
}
}
However, as soon as I attempt to add another int I no longer get sound. Just simply adding the line int unusedVariable = 0; anywhere in the getNextAudioBlock() function but before the buffer[sample] assignment immediately returns from the function and it therefore produces no audio.
If I simply declare the new variable (int unusedVariable;) then it still works. It is only specifically the assignment part that causes the error. Also, if I declare the variable as a global member then the assignment within the function works just fine.
To reiterate, this works:
buffer[sample] = (randomGen.nextFloat() * 2.0f - 1.0f;
This works:
int unusedVariable;
buffer[sample] = (randomGen.nextFloat() * 2.0f - 1.0f;
But this doesn't:
int unusedVariable = 0;
buffer[sample] = (randomGen.nextFloat() * 2.0f - 1.0f;
My only idea was that allocating new memory on the Audio thread causes the error but I have seen declaration and assignment done in other online sources and even in my exact same function with numChannels, numSamples, channel, and sample all allocated and assigned just fine. I also considered that it has something to do with using the Random class, but I get the same problem even when it is generating sine waves.
EDIT: Here is the exact code copied from the project. Right here nextSample is declared globally, as the buffer does not get filled when it is declared locally
void MainContentComponent::getNextAudioBlock (const AudioSourceChannelInfo& bufferToFill)
{
const int numChannels = bufferToFill.buffer->getNumChannels();
const int numSamples = bufferToFill.numSamples;
for (int channel = 0; channel < numChannels; channel++){
float* const buffer = bufferToFill.buffer -> getWritePointer (channel, bufferToFill.startSample);
for (int sample; sample < numSamples; sample++){
// nextSample = (randomGen.nextFloat() * 2.0f - 1.0f); // For Randomly generated White Noise
nextSample = (float) std::sin (currentAngle);
currentAngle += angleDelta;
buffer[sample] = nextSample * volumeLevel;
}
}
}
I created a new AudioApplication project in the Projucer and pasted this block of code into the getNextAudioBlock() method (adding sensible member variables as you're referencing them here).
The compiler pointed at the problem right away -- the loop variable sample below isn't initialized (and C++ won't default init it for you), so if the memory used by that variable happened to have contained a value that's less than the buffer size, you'll generate some audio; if not, the buffer passed into this function is unaffected because the loop never runs.
for (int sample; sample < numSamples; sample++){
nextSample = (randomGen.nextFloat() * 2.0f - 1.0f); // For Randomly generated White Noise
//nextSample = (float) std::sin (currentAngle);
//currentAngle += angleDelta;
buffer[sample] = nextSample * volumeLevel;
}
see if changing that to for (int sample=0; doesn't fix things for you.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I use glu tessellation to tessellate complex polygons. The code simplified is listed bellow.
It always crashes at gluTessEndPolygon(GLUtessobj) with error:
Error: 0xC0000005: Access violation reading location 0x57783b39;
The code works when the number of points of the polygon is small (<100).
I just can't figure out why.
typedef boost::geometry::model::point<float, 2, boost::geometry::cs::cartesian> pt;
typedef boost::geometry::model::polygon<pt> Polygon;
typedef boost::geometry::model::ring<pt> Ring;
vector<Polygon> g_myPolys;
// ------Static variables used in glu tessellation------
static GLUtesselator *GLUtessobj;
static unsigned int s_gltri_type;
static int s_tess_orient;
static int s_cur_pt_idx;
// Create an array to hold pointers to allocated vertices created by "combine" callback,
// so that they may be deleted after tessellation.
static std::vector<GLdouble*> s_combineVertexArray;
// Store tessellated results
static std::vector<double> s_vecTriVerts; // Store area objects' tessellated triangle( triangle fan, triangle strip and triangles) vertices.
static std::vector<int> s_vecTriStripVertCnts; // Store every triangle strips' start indices in m_vecTriVerts.
static std::vector<int> s_vecTriStripFirstIdx; // Store every triangle strips' vertex count start from its start index.
static std::vector<int> s_vecTriFanVertCnts; // Store every triangle fans' start indices in m_vecTriVerts.
static std::vector<int> s_vecTriFanFirstIdx; // Store every triangle fans' vertex count start from its start index.
static std::vector<int> s_vecTrisVertCnts; // Store every triangles' start indices in m_vecTriVerts.
static std::vector<int> s_vecTrisFirstIdx; // Store every triangles' vertex count start from its start index.
static int s_cur_tri_fans_vert_cnt;
static int s_cur_tri_strips_vert_cnt;
static int s_cur_tris_vert_cnt;
static std::vector<double*> s_vecTmp;
void beginCallback(GLenum which)
{
s_gltri_type = which;
switch ( s_gltri_type)
{
case GL_TRIANGLE_FAN:
s_vecTriFanFirstIdx.push_back(s_cur_pt_idx);
s_cur_tri_fans_vert_cnt = 0;
break;
case GL_TRIANGLE_STRIP:
s_vecTriStripFirstIdx.push_back(s_cur_pt_idx);
s_cur_tri_strips_vert_cnt = 0;
break;
case GL_TRIANGLES:
s_vecTrisFirstIdx.push_back(s_cur_pt_idx);
s_cur_tris_vert_cnt = 0;
break;
}
}
void vertexCallback(GLvoid *vertex)
{
GLdouble *pv = (GLdouble *) vertex;
s_vecTriVerts.push_back(pv[0]);
s_vecTriVerts.push_back(pv[1]);
s_cur_pt_idx ++;
switch ( s_gltri_type)
{
case GL_TRIANGLE_FAN:
s_cur_tri_fans_vert_cnt ++;
break;
case GL_TRIANGLE_STRIP:
s_cur_tri_strips_vert_cnt ++;
break;
case GL_TRIANGLES:
s_cur_tris_vert_cnt ++;
break;
}
}
void combineCallback(GLdouble coords[3],
GLdouble *vertex_data[4],
GLfloat weight[4], GLdouble **dataOut )
{
GLdouble *vertex = (GLdouble *)malloc(6 * sizeof(GLdouble));
vertex[0] = coords[0];
vertex[1] = coords[1];
vertex[2] = coords[2];
vertex[3] = vertex[4] = vertex[5] = 0.0;
*dataOut = vertex;
s_combineVertexArray.push_back(vertex);
}
void endCallback()
{
switch ( s_gltri_type)
{
case GL_TRIANGLE_FAN:
s_vecTriFanVertCnts.push_back(s_cur_tri_fans_vert_cnt);
break;
case GL_TRIANGLE_STRIP:
s_vecTriStripVertCnts.push_back(s_cur_tri_strips_vert_cnt);
break;
case GL_TRIANGLES:
s_vecTrisVertCnts.push_back(s_cur_tris_vert_cnt);
break;
}
}
void errorCallback(GLenum errorCode)
{
const GLubyte *estring;
estring = gluErrorString(errorCode);
printf ("Tessellation Error: %s\n", estring);
}
void Tessellate()
{
// Create tessellate object
GLUtessobj = gluNewTess();
// Register the callbacks
gluTessCallback(GLUtessobj, GLU_TESS_BEGIN, (void (__stdcall*)())&beginCallback);
gluTessCallback(GLUtessobj, GLU_TESS_VERTEX, (void (__stdcall*)())&vertexCallback);
gluTessCallback(GLUtessobj, GLU_TESS_END, (void (__stdcall*)())&endCallback);
gluTessCallback(GLUtessobj, GLU_TESS_COMBINE, (void (__stdcall*)())&combineCallback);
gluTessCallback(GLUtessobj, GLU_TESS_ERROR, (void (__stdcall*)())&errorCallback);
gluTessProperty(GLUtessobj, GLU_TESS_WINDING_RULE, GLU_TESS_WINDING_POSITIVE );
gluTessBeginPolygon(GLUtessobj, NULL);
gluTessBeginContour(GLUtessobj);
Polygon pp = g_myPolys[0];
for ( int i = 0; i < pp.outer().size(); i ++)
{
GLdouble *p = new GLdouble[3];
s_vecTmp.push_back(p);
p[0] = pp.outer()[i].get<0>();
p[1] = pp.outer()[i].get<1>();
p[2] = 0.0;
gluTessVertex( GLUtessobj, p, p ) ;
}
gluTessEndContour(GLUtessobj);
gluTessEndPolygon(GLUtessobj);
gluDeleteTess(GLUtessobj);
for ( int i = 0; i < s_vecTmp.size(); i ++)
delete[] s_vecTmp[i];
s_vecTmp.clear();
// Free up any "Combine" vertices created
for(unsigned int i = 0; i < s_combineVertexArray.size(); i++)
free (s_combineVertexArray[i]);
s_combineVertexArray.clear();
}
One thing that immediately strikes me as odd is, that you do the cast to __stdcall there.
gluTessCallback(GLUtessobj, GLU_TESS_BEGIN, (void (__stdcall*)())&beginCallback);
Why are you doing that? If your compiler complains about incompatible calling conventions, then the last thing you should do there is casting the calling convention. Only despair and horror await if you cast a calling convention. It's already a bad idea to cast pointers (in C++ casting from/to void* is kind of okay, but that's it).
And then there are a few other weird things you do with pointers. For example you're mixing std::vector with manually managed memory (new GLdouble[3]). Seriously, why?!
I strongly suggest you simplify your data structures and clean up that pointer juggling. Most likely you have some out of bounds buffer write somewhere in your code, but it's difficult to see where exactly.
the title is a bit too general. Let me get to it straight.
I have an application with a large number of potential structures, called Player.
So I thought, lets make an array of pointers to Player, due to the fact that usually you wont need all the players and you will safe up dynamic memory compared to directly allocating an array with the maximum size, so (C++):
Player* a[max];
for loop
a[i] = new Player
end
vs
Player* a;
a = new Player[max]
The first example is what I use in a function. Every time you call that function, it allocates the next pointer in the array. Everything works, but the strange thing is that sometimes one of the pointers seems to lose its reference to the heap memory. When I have 2 players and I allocate 10, it works, but after a while (several frames guaranteed) it displays "-1.#J" as the value of a float in that Player structure, which I suppose means that it is an undefined value.
I could not post the entire code (would be too much), and I checked for all other possible bugs, but I could not find one. I start presuming that it has to do with the fact that I allocate the memory using new in a function in a different .ccp file (so different obj file). Could this be the case? And can you use array of pointers for this situation?
I could be horrible be wrong with my ideas about memory saving using this method. Please give me advice on how to do it properly if so.
Thanks.
(EDIT)
the code for spawning the player (I dont know how to do the layout, sorry for that):
void SpawnPlayer(float xpos, float ypos, float angle, unsigned short ammo[WP_TOTAL], unsigned char weapon)
{
player[game.players] = new Player;
// Properties
player[game.players]->pos[0] = xpos;
player[game.players]->pos[1] = ypos;
player[game.players]->angle = angle;
player[game.players]->dir[0] = 0.0f;
player[game.players]->dir[1] = 0.0f;
player[game.players]->speed = PL_SPEED;
player[game.players]->health = PL_HEALTH;
player[game.players]->shoot = true;
player[game.players]->shootwait = 0;
unsigned char i;
CLoops(i, 0, WP_TOTAL)
{
player[game.players]->ammo[i] = ammo[i];
}
player[game.players]->weapon = weapon;
// Model
float cx = cos(angle) * PL_RADIUS, cy = sin(angle) * PL_RADIUS;
player[game.players]->model.vertexcnt = 4;
player[game.players]->model.vertex = new float[8];
player[game.players]->model.vertex[0] = xpos - cx - cy ;player[game.players]->model.vertex[1] = ypos - cy + cx;
player[game.players]->model.vertex[2] = xpos - cx + cy ;player[game.players]->model.vertex[3] = ypos - cy - cx;
player[game.players]->model.vertex[4] = xpos + cx - cy ;player[game.players]->model.vertex[5] = ypos + cy + cx;
player[game.players]->model.vertex[6] = xpos + cx + cy ;player[game.players]->model.vertex[7] = ypos + cy - cx;
player[game.players]->model.texcoord = new float[8];
player[game.players]->model.texcoord[0] = 0.0f;player[game.players]->model.texcoord[1] = 0.0f;
player[game.players]->model.texcoord[2] = 1.0f;player[game.players]->model.texcoord[3] = 0.0f;
player[game.players]->model.texcoord[4] = 0.0f;player[game.players]->model.texcoord[5] = 1.0f;
player[game.players]->model.texcoord[6] = 1.0f;player[game.players]->model.texcoord[7] = 1.0f;
core.CCreateModel(player[game.players]->model, CMODEL_TRISTRIPS, CMODEL_DYNAMIC);
// AI
player[game.players]->target = PL_MAXCNT;
player[game.players]->move = 0;
// Add global counter
game.players++;
}
Why not use a vector of Player object pointers?
std::vector<Player*> vec;
Then, when you want to allocate a new player, do:
vec.push_back(new Player);
Here, the problem with your code may be that there is a chance of any Player stored in an array getting over written. But, using vector, you can use const iterators and hence, make sure that they are not getting changed when you don't want them to.
Also, you don't have to worry about allocating a constant size.
Also, if your Player object contains any member pointers, make sure that you don't pass any Player object by value or do assignment using them. If that is the case, you might want to write your own copy constructor and assignment operator so that shallow copy does not occur.
I am using free to free the memory allocated for a bunch of temporary arrays in a recursive function. I would post the code but it is pretty long. When I comment out these free() calls, the program runs in less than a second. However, when I am using them, the programs takes about 20 seconds to run. Why is this happening, and how can it be fixed? This is like 100 or so MB so I'd rather not just leave the memory leak.
Additionally, when I run the program that includes all of the free() calls with profiling enabled, it runs in less than a second. I don't know how that would have an effect, but it does.
After using only some of the free() calls, it seems that there are a few in particular that cause the program to slow down. The rest do not seem to have an effect.
Ok... here's the code as requested:
void KDTree::BuildBranch(int height, Mailbox** objs, int nObjects)
{
int dnObjects = nObjects * 2;
int dnmoObjects = dnObjects - 1;
//Check for termination
if(height == -1 || nObjects < minObjectsPerNode)
{
//Create leaf
tree[nodeIndex] = KDTreeNode();
if(nObjects == 1)
tree[nodeIndex].InitializeLeaf(objs[0], 1);
else
tree[nodeIndex].InitializeLeaf(objs, nObjects);
//Added a node, increment index
nodeIndex++;
return;
}
//Save this node's index and increment the current index to save space for this node
int thisNodeIndex = nodeIndex;
nodeIndex++;
//Allocate memory for split options
float* xMins = (float*)malloc(nObjects * sizeof(float));
float* yMins = (float*)malloc(nObjects * sizeof(float));
float* zMins = (float*)malloc(nObjects * sizeof(float));
float* xMaxs = (float*)malloc(nObjects * sizeof(float));
float* yMaxs = (float*)malloc(nObjects * sizeof(float));
float* zMaxs = (float*)malloc(nObjects * sizeof(float));
//Find all possible split locations
int index = 0;
BoundingBox* tempBox = new BoundingBox();
for(int i = 0; i < nObjects; i++)
{
//Get bounding box
objs[i]->prim->MakeBoundingBox(tempBox);
//Add mins to split lists
xMins[index] = tempBox->x0;
yMins[index] = tempBox->y0;
zMins[index] = tempBox->z0;
//Add maxs
xMaxs[index] = tempBox->x1;
yMaxs[index] = tempBox->y1;
zMaxs[index] = tempBox->z1;
index++;
}
//Sort lists
Util::sortFloats(xMins, nObjects);
Util::sortFloats(yMins, nObjects);
Util::sortFloats(zMins, nObjects);
Util::sortFloats(xMaxs, nObjects);
Util::sortFloats(yMaxs, nObjects);
Util::sortFloats(zMaxs, nObjects);
//Allocate bin lists
Bin* xLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* xRight = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* yLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* yRight = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* zLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* zRight = (Bin*)malloc(dnObjects * sizeof(Bin));
//Initialize all bins
for(int i = 0; i < dnObjects; i++)
{
xLeft[i] = Bin(0, 0.0f);
xRight[i] = Bin(0, 0.0f);
yLeft[i] = Bin(0, 0.0f);
yRight[i] = Bin(0, 0.0f);
zLeft[i] = Bin(0, 0.0f);
zRight[i] = Bin(0, 0.0f);
}
//Construct min and max bins bins from split locations
//Merge min/max lists together for each axis
int minIndex = 0, maxIndex = 0;
for(int i = 0; i < dnObjects; i++)
{
if(maxIndex == nObjects || (xMins[minIndex] <= xMaxs[maxIndex] && minIndex != nObjects))
{
//Add split location to both bin lists
xLeft[i].rightEdge = xMins[minIndex];
xRight[i].rightEdge = xMins[minIndex];
//Add geometry to mins counter
xLeft[i+1].objectBoundCounter++;
minIndex++;
}
else
{
//Add split location to both bin lists
xLeft[i].rightEdge = xMaxs[maxIndex];
xRight[i].rightEdge = xMaxs[maxIndex];
//Add geometry to maxs counter
xRight[i].objectBoundCounter++;
maxIndex++;
}
}
//Repeat for y axis
minIndex = 0, maxIndex = 0;
for(int i = 0; i < dnObjects; i++)
{
if(maxIndex == nObjects || (yMins[minIndex] <= yMaxs[maxIndex] && minIndex != nObjects))
{
//Add split location to both bin lists
yLeft[i].rightEdge = yMins[minIndex];
yRight[i].rightEdge = yMins[minIndex];
//Add geometry to mins counter
yLeft[i+1].objectBoundCounter++;
minIndex++;
}
else
{
//Add split location to both bin lists
yLeft[i].rightEdge = yMaxs[maxIndex];
yRight[i].rightEdge = yMaxs[maxIndex];
//Add geometry to maxs counter
yRight[i].objectBoundCounter++;
maxIndex++;
}
}
//Repeat for z axis
minIndex = 0, maxIndex = 0;
for(int i = 0; i < dnObjects; i++)
{
if(maxIndex == nObjects || (zMins[minIndex] <= zMaxs[maxIndex] && minIndex != nObjects))
{
//Add split location to both bin lists
zLeft[i].rightEdge = zMins[minIndex];
zRight[i].rightEdge = zMins[minIndex];
//Add geometry to mins counter
zLeft[i+1].objectBoundCounter++;
minIndex++;
}
else
{
//Add split location to both bin lists
zLeft[i].rightEdge = zMaxs[maxIndex];
zRight[i].rightEdge = zMaxs[maxIndex];
//Add geometry to maxs counter
zRight[i].objectBoundCounter++;
maxIndex++;
}
}
//Free split memory
free(xMins);
free(xMaxs);
free(yMins);
free(yMaxs);
free(zMins);
free(zMaxs);
//PreCalcs
float voxelL = xRight[dnmoObjects].rightEdge - xLeft[0].rightEdge;
float voxelD = zRight[dnmoObjects].rightEdge - zLeft[0].rightEdge;
float voxelH = yRight[dnmoObjects].rightEdge - yLeft[0].rightEdge;
float voxelSA = 2.0f * voxelL * voxelD + 2.0f * voxelL * voxelH + 2.0f * voxelD * voxelH;
//Minimum cost preset to no split at all
float minCost = (float)nObjects;
float splitLoc;
int minLeftCounter = 0, minRightCounter = 0;
int axis = -1;
//---------------------------------------------------------------------------------------------
//Check costs of x-axis split planes keeping track of derivative using
//the fact that there is a minimum point on the graph costs vs split location
//Since there is one object per split plane
int splitIndex = 1;
float lastCost = nObjects * voxelL;
float tempCost;
float lastSplit = xLeft[1].rightEdge;
int leftCount = xLeft[1].objectBoundCounter, rightCount = nObjects - xRight[1].objectBoundCounter;
int lastLO = 0, lastRO = nObjects;
//Keep looping while cost is decreasing
while(splitIndex < dnObjects)
{
tempCost = leftCount * (xLeft[splitIndex].rightEdge - xLeft[0].rightEdge) + rightCount * (xLeft[dnmoObjects].rightEdge - xLeft[splitIndex].rightEdge);
if(tempCost < lastCost)
{
lastCost = tempCost;
lastSplit = xLeft[splitIndex].rightEdge;
lastLO = leftCount;
lastRO = rightCount;
}
//Update counters
splitIndex++;
leftCount += xLeft[splitIndex].objectBoundCounter;
rightCount -= xRight[splitIndex].objectBoundCounter;
}
//Calculate full SAH cost
lastCost = ((lastLO * (2 * (lastSplit - xLeft[0].rightEdge) * voxelD + 2 * (lastSplit - xLeft[0].rightEdge) * voxelH + 2 * voxelD * voxelH)) + (lastRO * (2 * (xLeft[dnmoObjects].rightEdge - lastSplit) * voxelD + 2 * (xLeft[dnmoObjects].rightEdge - lastSplit) * voxelH + 2 * voxelD * voxelH))) / voxelSA;
if(lastCost < minCost)
{
minCost = lastCost;
splitLoc = lastSplit;
minLeftCounter = lastLO;
minRightCounter = lastRO;
axis = 0;
}
//---------------------------------------------------------------------------------------------
//Repeat for y axis
splitIndex = 1;
lastCost = nObjects * voxelH;
lastSplit = yLeft[1].rightEdge;
leftCount = yLeft[1].objectBoundCounter;
rightCount = nObjects - yRight[1].objectBoundCounter;
lastLO = 0;
lastRO = nObjects;
//Keep looping while cost is decreasing
while(splitIndex < dnObjects)
{
tempCost = leftCount * (yLeft[splitIndex].rightEdge - yLeft[0].rightEdge) + rightCount * (yLeft[dnmoObjects].rightEdge - yLeft[splitIndex].rightEdge);
if(tempCost < lastCost)
{
lastCost = tempCost;
lastSplit = yLeft[splitIndex].rightEdge;
lastLO = leftCount;
lastRO = rightCount;
}
//Update counters
splitIndex++;
leftCount += yLeft[splitIndex].objectBoundCounter;
rightCount -= yRight[splitIndex].objectBoundCounter;
}
//Calculate full SAH cost
lastCost = ((lastLO * (2 * (lastSplit - yLeft[0].rightEdge) * voxelD + 2 * (lastSplit - yLeft[0].rightEdge) * voxelL + 2 * voxelD * voxelL)) + (lastRO * (2 * (yLeft[dnmoObjects].rightEdge - lastSplit) * voxelD + 2 * (yLeft[dnmoObjects].rightEdge - lastSplit) * voxelL + 2 * voxelD * voxelL))) / voxelSA;
if(lastCost < minCost)
{
minCost = lastCost;
splitLoc = lastSplit;
minLeftCounter = lastLO;
minRightCounter = lastRO;
axis = 1;
}
//---------------------------------------------------------------------------------------------
//Repeat for z axis
splitIndex = 1;
lastCost = nObjects * voxelD;
lastSplit = zLeft[1].rightEdge;
leftCount = zLeft[1].objectBoundCounter;
rightCount = nObjects - zRight[1].objectBoundCounter;
lastLO = 0;
lastRO = nObjects;
//Keep looping while cost is decreasing
while(splitIndex < dnObjects)
{
tempCost = leftCount * (zLeft[splitIndex].rightEdge - zLeft[0].rightEdge) + rightCount * (zLeft[dnmoObjects].rightEdge - zLeft[splitIndex].rightEdge);
if(tempCost < lastCost)
{
lastCost = tempCost;
lastSplit = zLeft[splitIndex].rightEdge;
lastLO = leftCount;
lastRO = rightCount;
}
//Update counters
splitIndex++;
leftCount += zLeft[splitIndex].objectBoundCounter;
rightCount -= zRight[splitIndex].objectBoundCounter;
}
//Calculate full SAH cost
lastCost = ((lastLO * (2 * (lastSplit - zLeft[0].rightEdge) * voxelL + 2 * (lastSplit - zLeft[0].rightEdge) * voxelH + 2 * voxelH * voxelL)) + (lastRO * (2 * (zLeft[dnmoObjects].rightEdge - lastSplit) * voxelL + 2 * (zLeft[dnmoObjects].rightEdge - lastSplit) * voxelH + 2 * voxelH * voxelL))) / voxelSA;
if(lastCost < minCost)
{
minCost = lastCost;
splitLoc = lastSplit;
minLeftCounter = lastLO;
minRightCounter = lastRO;
axis = 2;
}
//Free bin memory
free(xLeft);
free(xRight);
free(yLeft);
free(yRight);
free(zLeft);
free(zRight);
//---------------------------------------------------------------------------------------------
//Make sure a split is in our best interest
if(axis == -1)
{
//If not decrement the node counter
nodeIndex--;
BuildBranch(-1, objs, nObjects);
return;
}
//Allocate space for left and right lists
Mailbox** leftList = (Mailbox**)malloc(minLeftCounter * sizeof(void*));
Mailbox** rightList = (Mailbox**)malloc(minRightCounter * sizeof(void*));
//Sort objects into lists of those to the left and right of the split plane
int leftIndex = 0, rightIndex = 0;
leftCount = 0;
rightCount = 0;
switch(axis)
{
case 0:
for(int i = 0; i < nObjects; i++)
{
//Get object bounding box
objs[i]->prim->MakeBoundingBox(tempBox);
//Add to left and right lists when necessary
if(tempBox->x0 < splitLoc)
{
leftList[leftIndex++] = objs[i];
leftCount++;
}
if(tempBox->x1 > splitLoc)
{
rightList[rightIndex++] = objs[i];
rightCount++;
}
}
break;
case 1:
for(int i = 0; i < nObjects; i++)
{
//Get object bounding box
objs[i]->prim->MakeBoundingBox(tempBox);
//Add to left and right lists when necessary
if(tempBox->y0 < splitLoc)
{
leftList[leftIndex++] = objs[i];
leftCount++;
}
if(tempBox->y1 > splitLoc)
{
rightList[rightIndex++] = objs[i];
rightCount++;
}
}
break;
case 2:
for(int i = 0; i < nObjects; i++)
{
//Get object bounding box
objs[i]->prim->MakeBoundingBox(tempBox);
//Add to left and right lists when necessary
if(tempBox->z0 < splitLoc)
{
leftList[leftIndex++] = objs[i];
leftCount++;
}
if(tempBox->z1 > splitLoc)
{
rightList[rightIndex++] = objs[i];
rightCount++;
}
}
break;
};
//Delete the bounding box
delete tempBox;
//Delete old objects array
free(objs);
//Construct left and right branches
BuildBranch(height - 1, leftList, leftCount);
BuildBranch(height - 1, rightList, rightCount);
//Build this node
tree[thisNodeIndex] = KDTreeNode();
tree[thisNodeIndex].InitializeInterior(axis, splitLoc, nodeIndex - 1);
return;
}
EDIT:
Ok well I tried to replace the malloc/free with new/delete and that had no effect on the speed. I also found that it is only the free() on xLeft/xRight arrays that seem to affect the execution time significantly. I was able to eliminate the problem by moving the free() calls to after the recursive calls, although I do not know why this is making a difference because I don't see anywhere that these arrays are used after the original location for free(). As for why I am using malloc... some portions of this program use cache aligned memory, so I had been using _aligned_malloc. Although there probably is a way to get new to cache align, this is the only way I know to do it.
Is it possible that you are linking against a debug version of the runtime library that is doing something extra in free() like filling the memory with a garbage value? I have seen this behavior when you link against overly aggressive memory debugging libraries. The code that you have posted does not look strange. I would be interested to know what would happen if you replaced the arrays with std::vector or std::deque though. Vector should have behavior quite similar to the arrays and Deque may actually improve the speed a little if the arrays are large because the memory manager will not have to guarantee contiguous space.
If your program doing all of the free()ing on exit, then you might as well just skip the calls. The entire process heap is freed when you app exits.
Edit: ----
Ok, now that the code is posted, it appears to me that you aren't just freeing on exit, so you should definitely try and figure out if this is a wierd symptom of a bug, or just a costly implementation of free(). Instead of removing the free() calls, time how long it takes to execute them. is the heap manager really using up the whole 19 seconds?
I do see several places were multiple allocations have the same scope and lifetime. You could turn these into a single malloc/free call, althought that would make the code less clear and harder to mantain. So you have to ask yourself, how much does that 20 seconds matter?
Probably just the behavior of the heap manager your CRT uses. It's probably updating free lists, or some other internal structure to manage memory.
You probably should reexamine how your program allocates and uses memory if your bottleneck is here.
Having had a look at the code one big thing that comes to my mind is this - mixture of malloc(...), new(...), delete(...), free(...)
BoundingBox* tempBox = new BoundingBox();
// ....
//Delete the bounding box
delete tempBox;
yet in other places you have
Bin* xLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
// ....
free(xMins);
In short, you are mixing the C++'s runtime in calling new(...) and delete(...) with malloc(...) and free(...).. After all, this is in C++, so a question for you here...
Why did you use the malloc(...) and free(...) which is from C in the middle of this C++ code? The repercussions I could see here, is that the C++ runtime is different in terms of using the memory allocation unlike C in the aspect of OOP paradigm.
Having said this, your best bet is:
Replace all calls to malloc with new.
Replace all calls to free with delete.
Re run the program again and see if that makes a different. Can you confirm this?
Hope this helps,
Best regards,
Tom.
+1 to malloc/free making my eyes hurt in C++. Ignoring that for a second and looking at the code, three ideas:
Roll up your malloc calls to one large malloc and free (for the x/y/left/right/etc structures) instead of 12. Set the pointers into this large buffer as appropriate.
Still talking about the x/y/left/right variables: Employ a small stack based buffer, that you can use when the number of objects is small. When the number of objects is large, then dynamically allocate. When it is not, just set your pointer to the local stack buffer. This can avoid dynamic memory management all together for small inputs.
Right now, your "object" list is dynamically allocated, freed, and reallocated with each recursive call (!!). This is confusing because ownership isn't clear; but also it's a performance issue. Consider reworking the code so one list of "objects" is ever used.
C++ stores some extra information when you allocate using new like the type of the object or number of characters(in case of array) etc..If you are using free, it could be a fragmentation problem where you are actually deleting only the chunks of data in between but not freeing the actual information stored by new. Just a thought.
When you corrupt the heap, it often becomes very slow. Try to run it in debug mode with debug version of your runtime as well.
It could be poor locality of reference for your code. For example, I see the following:
//Allocate memory for split options
float* xMins = (float*)malloc(nObjects * sizeof(float));
float* yMins = (float*)malloc(nObjects * sizeof(float));
float* zMins = (float*)malloc(nObjects * sizeof(float));
float* xMaxs = (float*)malloc(nObjects * sizeof(float));
float* yMaxs = (float*)malloc(nObjects * sizeof(float));
float* zMaxs = (float*)malloc(nObjects * sizeof(float));
...
free(xMins);
free(xMaxs);
free(yMins);
free(yMaxs);
free(zMins);
free(zMaxs);
Now, assuming that the allocations proceed basically linearly, then free(xMaxs); may need to dereference memory that was allocated some number of pages away from xMins (which was just dereferenced during free(xMins);), so you might need to swap in a page from the backing store in order to perform the free (which causes a huge slowdown in execution when that happens). Re-ordering the free()'s to match the allocation order could help... In this case, that'd mean
free(xMins);
free(yMins);
free(zMins);
free(xMaxs);
free(yMaxs);
free(zMaxs);
It sounds like you are running your program from a debugger in Windows, which by default causes a special debug heap to be used, which dramatically slows down memory deallocations. This applies even to non-debug builds, as long as they are launched from a debugger (such as Visual Studio). You should be able to disable this behavior by setting the environment variable _NO_DEBUG_HEAP=1 before running your program (I recommend setting it in the project configuration settings rather than in the system settings, if possible).
You didn't describe anything about your programming environment in the original question, however, so I had to make certain assumptions about it that might be wrong. If you're not running your program under Windows, for example, then my answer doesn't apply and I have no idea what the cause of your problem might be.