Related
There's some code where it creates a float array like this:
mData = new float[channelCount * maxFrames];
then it does
memcpy(&mData[sampleIndex],
buffer,
(numSamples * sizeof(float)));
What does &mData[sampleIndex] mean? Well, we have a float array, we take an element of that array, and then take the address of that element. Wouldn't the address of that element be mData + sampleIndex?
What if I wanted to change memcpy by a for loop? I did this and it worked:
for (int i=0; i< numSamples * sizeof(float); i++) {
(&mData[sampleIndex])[i] = buffer[i];
}
but I don't know what (&mData[sampleIndex])[i] means. Should it be mData + sampleIndex + i?
This code is supposed to work to record microfone wav data, so we should be able to store things in multiple channels. How this code manages such channels?
What does &mData[sampleIndex] mean? Well, we have a float array, we take an element of that array, and then take the address of that element. Wouldn't the address of that element be mData + sampleIndex?
Yes.
What if I wanted to change memcpy by a for loop?
Your loop doesn't quite do the same thing. memcpy is copying numSamples * sizeof(float) bytes while your loop is copying numSamples * sizeof(float) floats. Since a float consists of multiple bytes (on most systems), this may result in a buffer overflow.
but I don't know what (&mData[sampleIndex])[i] means. Should it be mData + sampleIndex + i?
It's not quite the same. (&mData[sampleIndex])[i] would be equal to *(mData + sampleIndex + i)
How this code manages such channels?
This code simply copies values from one array into another. It doesn't "manage" anything.
The syntax array[index] is the same as *(array + index), thus:
&mData[sampleIndex]
is the same as:
&(*(mData + sampleIndex))
Which is simply:
mData + sampleIndex
And so, (&mData[sampleIndex])[i] is getting a float* pointer to the mData element at index sampleIndex, and then applying the index i to that pointer. So yes, in this case:
(&mData[sampleIndex])[i] = buffer[i];
is the same as:
*(mData + sampleIndex + i) = *(buffer + i);
I was programming a dynamic array for my own use, that i wanted pre-set with zeros.
template <class T>
dynArr<T>::dynArr()
{
rawData = malloc(sizeof(T) * 20); //we allocate space for 20 elems
memset(this->rawData, 0, sizeof(T) * 20); //we zero it!
currentSize = 20;
dataPtr = static_cast<T*>(rawData); //we cast pointer to required datatype.
}
And this part works - iterating by loop with dereferencind the dataPtr works great. Zeros.
Yet, reallocation behaves (in my opinion) at least a bit strange. First you have to look at reallocation code:
template <class T>
void dynArr<T>::insert(const int index, const T& data)
{
if (index < currentSize - 1)
{
dataPtr[index] = data; //we can just insert things, array is zero-d
}
else
{
//TODO we should increase size exponentially, not just to the element we want
const size_t lastSize = currentSize; //store current size (before realloc). this is count not bytes.
rawData = realloc(rawData, index + 1); //rawData points now to new location in the memory
dataPtr = (T*)rawData;
memset(dataPtr + lastSize - 1, 0, sizeof(T) * index - lastSize - 1); //we zero from ptr+last size to index
dataPtr[index] = data;
currentSize = index + 1;
}
}
Simple, we realloc data up to index+1, and set yet-non-zeroed memory to 0.
As for a test, i first inserted 5 on position 5 on this array. Expected thing happened - 0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Yet, inserting something else, like insert(30,30) gives me strange behavior:
0, 0, 0, 0, 0, 5, 0, -50331648, 16645629, 0, 523809160, 57600, 50928864, 50922840, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30,
What the hell, am i not understanding something here? shouldnt realloc take all the 20 previously set memory bytes into account? What sorcery is going on here.
Problem 1:
You are using the wrong size in the call to realloc. Change it to:
rawData = realloc(rawData, sizeof(T)*(index + 1));
If rawData is of type T*, prefer
rawData = realloc(rawData, sizeof(*rawData)*(index + 1));
Problem 2:
The last term of the following is not right.
memset(dataPtr + lastSize - 1, 0, sizeof(T) * index - lastSize - 1);
You need to use:
memset(dataPtr + lastSize - 1, 0, sizeof(T) * (index - lastSize - 1));
// ^^ ^^
// size * The number of objects
Problem 3:
Assigning to dataPtr using
dataPtr[index] = data;
is a problem when memory is obtained using malloc or realloc. malloc family of functions return just raw memory. They don't initialize objects.
Assigning to uninitialized objects is a problem for all non-POD types.
Problem 4:
If T is type with virtual member functions, using memset to zero out memory will most likely lead to problems.
Suggestion for fixing all the problems:
It will be much better to use new and delete since you are in C++ land.
template <class T>
dynArr<T>::dynArr()
{
currentSize = 20;
dataPtr = new T[currentSize];
// Not sure why you need rawData
}
template <class T>
void dynArr<T>::insert(const int index, const T& data)
{
if (index < currentSize - 1)
{
dataPtr[index] = data;
}
else
{
const size_t lastSize = currentSize;
T* newData = new T[index+1];
std::copy(dataPtr, dataPtr+lastSize, newData);
delete [] dataPtr;
dataPtr = newData;
dataPtr[index] = data;
currentSize = index + 1;
}
}
Please note that the suggested change will work only if T is default constructible.
This will also take care of the problems 3 and 4 outlined above.
I have a function which returns the address of a 4x2 matrix whose name is 'a'.
This function computes the elements of 'a' matrix inside and returns the address of the matrix. When I use that function, I want to assign its output to a matrix called 'a1' but when I do so, 'a1' becomes a zero matrix. However, when I assign the output to the same 'a' matrix, everything works fine. Can anyone help me? The code is written on Arduino IDE.
double a[4][2], a1[4][2];
double T0E[4][4]={
{0.1632, -0.3420, 0.9254, 297.9772},
{0.0594, 0.9397, 0.3368, 108.4548},
{-0.9848, 0, 0.1736, -280.5472},
{0, 0, 0, 1}
};
const int axis_limits[4][2]=
{
{ -160, 160 },
{ -135, 60 },
{ -135, 135 },
{ -90, 90 }
};
const unsigned int basex = 50, basez = 100, link1 = 200, link2 = 200, link3=30, endeff=link3+50;
double *inversekinematic(double target[4][4])
{
// angle 1
a[0][0] = -asin(target[0][1]);
a[0][1] = a[0][0];
if (a[0][0]<axis_limits[0][0] || a[0][0]>axis_limits[0][1] || isnan(a[0][0]))
{
bool error=true;
}
// angle 2
double A = sqrt(pow(target[0][3]-cos(a[0][0])*endeff*target[2][2], 2) + pow(target[1][3]-sin(a[0][0])*endeff*target[2][2], 2));
double N = (A - basex) / link1;
double M = -(target[2][3]-endeff*target[2][0] - basez) / link2;
double theta = acos(N / sqrt(pow(N, 2) + pow(M, 2)));
a[1][0] = theta + acos(sqrt(pow(N, 2) + pow(M, 2)) / 2);
a[1][1] = theta - acos(sqrt(pow(N, 2) + pow(M, 2)) / 2);
// angle 3
for (int i = 0; i <= 1; i++)
{
a[2][i] = {asin(-(target[2][3]-endeff*target[2][0]-basez)/link2-sin(a[1][i]))-a[1][i]};
}
// angle 4
for(int i = 0; i <=1; i++)
{
a[3][i] = {-asin(target[2][0])-a[1][i]-a[2][i]};
}
return &a[4][2];
}
void setup(){
Serial.begin(9600);
}
void loop() {
a1[4][2]={*inversekinematic(T0E)};
}
When you type return &a[4][2]; you are returning the address of the 3rd element of the 5th row. This is out of bounds, since C++ uses zero-based indexing and the array was declared as double a[4][2];. I think what you want to do is just return a; to return the address of the entire matrix.
Also, you're doing lots of strange things like declaring the parameter double target[4][4] with a size and using initializer lists to assign single elements, which look unusual to me.
I'll try to be a little more detailed. In C/C++, arrays are nothing more than pointers. So, when you assign one array to another array you are making them literally point to the same data in memory. What you will have to do is copy the elements with loops, or perhaps use memcpy(dest, src, size). For example, if you want to copy the contents of double a[4][2] to double b[4][2], you would use something like memcpy(b, a, sizeof(double) * 8);. If you use a = b; then a and b are pointing to same locations in memory.
Two points:
1. your code says the function inversekinematic() returns a pointer to a double, not an array.
2. you return a pointer to a double, but it's always the same address.
Maybe typedefs will help simplify the code?
typedef double Mat42[4][2];
Mat42 a, a1;
Mat42 *inversekinematic(double target[4][4])
{
// ...
return &a;
}
But, for the code you've shown, I don't see why you need to return the address of a fixed global value. Perhaps your real code might return the address of 'a' or 'a1', but if it doesn't ...
as an exercise, i'm translating my master's thesis finite-difference time-domain code for simulation of wave propagation from matlab to c++ and i've come across the following problem.
i would like to create a class that corresponds to a non-physical absorbing layer called cpml. the size of the layer depends on the desired parameters of the simulation, so the arrays that define the absorbing layer have to be dynamic.
#ifndef fdtd_h
#define fdtd_h
#include <cmath>
#include <iostream>
#include <sstream>
using namespace std;
class cpml {
public:
int thickness;
int n_1, n_2, n_3;
double cut_off_freq;
double kappa_x_max, sigma_x_1_max, sigma_x_2_max, alpha_x_max;
double *kappa_x_tau_xy, *sigma_x_tau_xy, *alpha_x_tau_xy;
void set_cpml_parameters_tau_xy();
};
void cpml::set_cpml_parameters_tau_xy(){
double temp1[thickness], temp2[thickness], temp3[thickness];
for(int j = 1; j < thickness; j++){
temp1[j] = 1 + kappa_x_max * pow((double)(thickness - j - 0.5) / (double)(thickness - 1), n_1);
temp2[j] = sigma_x_1_max * pow((double)(thickness - j - 0.5) / (double)(thickness - 1), n_1 + n_2);
temp3[j] = alpha_x_max * pow((double)(j - 0.5) / (double)(thickness - 1), n_3);
}
kappa_x_tau_xy = temp1;
sigma_x_tau_xy = temp2;
for(int i = 1; i < thickness; i++){
cout << sigma_x_tau_xy[i] << endl;
}
alpha_x_tau_xy = temp3;
}
#endif /* fdtd_h */
when i call the function cpml::set_cpml_parameters_tau_xy() in my main function, the first value of the array sigma_x_tau_xy is correct. however, the further values aren't.
#include "fdtd.h"
using namespace std;
int main() {
cpml cpml;
int cpml_thickness = 10;
cpml.thickness = cpml_thickness;
int n_1 = 3, n_2 = 0, n_3 = 3;
cpml.n_1 = n_1; cpml.n_2 = n_2; cpml.n_3 = n_3;
double cut_off_freq = 1;
cpml.cut_off_freq = cut_off_freq;
double kappa_x_max = 0;
double sigma_x_1_max = 0.8 * (n_1 + 1) / (sqrt(simulation_medium.mu/simulation_medium.rho) * simulation_grid.big_delta_x), sigma_x_2_max = 0.8 * (n_1 + 1) / (sqrt(simulation_medium.mu/simulation_medium.rho) * simulation_grid.big_delta_x);
double alpha_x_max = 2 * PI * cpml.cut_off_freq;
double kappa_y_max = 0;
double sigma_y_1_max = 0.8 * (n_1 + 1) / (sqrt(simulation_medium.mu/simulation_medium.rho) * simulation_grid.big_delta_y), sigma_y_2_max = 0.8 * (n_1 + 1) / (sqrt(simulation_medium.mu/simulation_medium.rho) * simulation_grid.big_delta_y);
double alpha_y_max = 2 * PI * cpml.cut_off_freq;
cpml.kappa_x_max = kappa_x_max; cpml.sigma_x_1_max = sigma_x_1_max; cpml.sigma_x_2_max = sigma_x_2_max; cpml.alpha_x_max = alpha_x_max;
cpml.kappa_y_max = kappa_y_max; cpml.sigma_y_1_max = sigma_y_1_max; cpml.sigma_y_2_max = sigma_y_2_max; cpml.alpha_y_max = alpha_y_max;
cpml.set_cpml_parameters_tau_xy();
for(int j = 1; j < cpml.thickness; j++){
cout << *(cpml.sigma_x_tau_xy + j) << endl;
}
}
what am i doing wrong and how do i make the dynamic array members of the class cpml contain the correct values when called in the main function?
Two problems: The lesser of them is that your program is technically not a valid C++ program, since C++ doesn't have variable-length arrays (which your arrays temp1, temp2 and temp3 are).
The more serious problem is that you save pointers to local variables. When a function returns, local variables go out of scope and no longer exist. Pointers to them will become invalid, and using those pointers will lead to undefined behavior.
Both problems are easily solved by using std::vector instead of arrays and pointers.
You cannot declare an array in C++ without a "constant" expression for its size (the bounds must be known at compile time). That means this code is invalid:
double temp1[thickness], temp2[thickness], temp3[thickness];
What you should instead do is the following:
class cmpl
{
//...
std::vector<double> kappa_x_tau_xy, sigma_x_tau_xy, alpha_x_tau_xy;
// ...
};
void cpml::set_cpml_parameters_tau_xy(){
alpha_x_tau_xy.resize(thickness);
kappa_x_tau_xy.resize(thickness);
sigma_x_tau_xy.resize(thickness);
//...
std::vector will handle all the dynamic allocation under the hood for you. If your code compiled, it was because you were using a nonstandard GCC extension for variable length arrays. Turn your warnings up -Wall -pedantic -Werror when you compile and it should complain more.
Note that you also have issues in array bounds. Whereas Matlab is 1-indexed, C++ is 0-indexed, so you'll need to do this, too:
for(int j = 0; j < thickness; j++){
alpha_x_tau_xy[j] = 1 + kappa_x_max * pow((double)(thickness - j - 0.5) / (double)(thickness - 1), n_1);
kappa_x_tau_xy = sigma_x_1_max * pow((double)(thickness - j - 0.5) / (double)(thickness - 1), n_1 + n_2);
sigma_x_tau_xy = alpha_x_max * pow((double)(j - 0.5) / (double)(thickness - 1), n_3);
}
You have a similar issue in main:
for(int j = 1; j < cpml.thickness; j++){
cout << *(cpml.sigma_x_tau_xy + j) << endl;
}
Should become:
for(int j = 0; j < cpml.thickness; j++){
cout << cpml.sigma_x_tau_xy[j] << endl;
}
Additional Notes:
Your code is very unstructured. Consider putting all of the cmpl-related getting and setting into the cmpl class ([Encapsulation])(https://en.wikipedia.org/wiki/Encapsulation_(computer_programming)). This will make it easer for the client (you in this case) to interact with the object.
This will include hiding your class data as protected or private and exposing functions to get and set those variables (don't forget const where appropriate).
Add a constructor to initialize all of the fields at once. As it stands now, your class consists of mostly uninitialized garbage for much of its lifetime. If someone where to prematurely try to access a field, you're in Undefined Behavior territory.
std::endl is good for printing newline characters, but restrict that to Debug-only code. The reason being is that it flushes the buffer every time its called, which can make your code overall slower if it's printing a lot. Use a newline character "\n" instead for Release.
An additional benefit of std::vector is that it makes copying and assigning to a cmpl well behaved. Otherwise, the compiler will generate a copy constructor and copy assignment, which when used will be a shallow copy instead of the deep copy that you'd want.
After restructuring your class, your main might look something like this:
int main() {
int cpml_thickness = 10;
int n_1 = 3, n_2 = 0, n_3 = 3;
double cut_off_freq = 1;
double kappa_x_max = 0;
double sigma_x_1_max = 0.8 * (n_1 + 1) / (sqrt(simulation_medium.mu/simulation_medium.rho) * simulation_grid.big_delta_x), sigma_x_2_max = 0.8 * (n_1 + 1) / (sqrt(simulation_medium.mu/simulation_medium.rho) * simulation_grid.big_delta_x);
double alpha_x_max = 2 * PI * cut_off_freq;
double kappa_y_max = 0;
double sigma_y_1_max = 0.8 * (n_1 + 1) / (sqrt(simulation_medium.mu/simulation_medium.rho) * simulation_grid.big_delta_y), sigma_y_2_max = 0.8 * (n_1 + 1) / (sqrt(simulation_medium.mu/simulation_medium.rho) * simulation_grid.big_delta_y);
double alpha_y_max = 2 * PI * cut_off_freq;
cpml cpml(cpml_thickness, n_1, n_2, n_3, cut_off_freq, kappa_x_max, kappa_y_max, sigma_x_1_max, sigma_x_2max, alpha_x_max, alpha_y_max);
cpml.set_cpml_parameters_tau_xy();
cpml.PrintSigmaTauXY(std::cout);
}
Which is arguably better. (You might use a getter to get sigma_tau_xy from the class and then print it yourself, though). And then you can think about how to simplify things even further by creating objects that represent the logical groupings of alpha_x_max and alpha_y_max etc. This could be a std::pair or a full-on struct with its own getters and setters. Now their own logic is grouped together and is easy to pass around/reference/think about. Your constructor for cmpl also becomes simpler, where you accept a single parameter that represents both x and y instead of separate ones for both.
Matlab doesn't really encourage an Object-Oriented approach in my (admittedly breif) experience, but in C++ it's easy.
I am using free to free the memory allocated for a bunch of temporary arrays in a recursive function. I would post the code but it is pretty long. When I comment out these free() calls, the program runs in less than a second. However, when I am using them, the programs takes about 20 seconds to run. Why is this happening, and how can it be fixed? This is like 100 or so MB so I'd rather not just leave the memory leak.
Additionally, when I run the program that includes all of the free() calls with profiling enabled, it runs in less than a second. I don't know how that would have an effect, but it does.
After using only some of the free() calls, it seems that there are a few in particular that cause the program to slow down. The rest do not seem to have an effect.
Ok... here's the code as requested:
void KDTree::BuildBranch(int height, Mailbox** objs, int nObjects)
{
int dnObjects = nObjects * 2;
int dnmoObjects = dnObjects - 1;
//Check for termination
if(height == -1 || nObjects < minObjectsPerNode)
{
//Create leaf
tree[nodeIndex] = KDTreeNode();
if(nObjects == 1)
tree[nodeIndex].InitializeLeaf(objs[0], 1);
else
tree[nodeIndex].InitializeLeaf(objs, nObjects);
//Added a node, increment index
nodeIndex++;
return;
}
//Save this node's index and increment the current index to save space for this node
int thisNodeIndex = nodeIndex;
nodeIndex++;
//Allocate memory for split options
float* xMins = (float*)malloc(nObjects * sizeof(float));
float* yMins = (float*)malloc(nObjects * sizeof(float));
float* zMins = (float*)malloc(nObjects * sizeof(float));
float* xMaxs = (float*)malloc(nObjects * sizeof(float));
float* yMaxs = (float*)malloc(nObjects * sizeof(float));
float* zMaxs = (float*)malloc(nObjects * sizeof(float));
//Find all possible split locations
int index = 0;
BoundingBox* tempBox = new BoundingBox();
for(int i = 0; i < nObjects; i++)
{
//Get bounding box
objs[i]->prim->MakeBoundingBox(tempBox);
//Add mins to split lists
xMins[index] = tempBox->x0;
yMins[index] = tempBox->y0;
zMins[index] = tempBox->z0;
//Add maxs
xMaxs[index] = tempBox->x1;
yMaxs[index] = tempBox->y1;
zMaxs[index] = tempBox->z1;
index++;
}
//Sort lists
Util::sortFloats(xMins, nObjects);
Util::sortFloats(yMins, nObjects);
Util::sortFloats(zMins, nObjects);
Util::sortFloats(xMaxs, nObjects);
Util::sortFloats(yMaxs, nObjects);
Util::sortFloats(zMaxs, nObjects);
//Allocate bin lists
Bin* xLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* xRight = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* yLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* yRight = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* zLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
Bin* zRight = (Bin*)malloc(dnObjects * sizeof(Bin));
//Initialize all bins
for(int i = 0; i < dnObjects; i++)
{
xLeft[i] = Bin(0, 0.0f);
xRight[i] = Bin(0, 0.0f);
yLeft[i] = Bin(0, 0.0f);
yRight[i] = Bin(0, 0.0f);
zLeft[i] = Bin(0, 0.0f);
zRight[i] = Bin(0, 0.0f);
}
//Construct min and max bins bins from split locations
//Merge min/max lists together for each axis
int minIndex = 0, maxIndex = 0;
for(int i = 0; i < dnObjects; i++)
{
if(maxIndex == nObjects || (xMins[minIndex] <= xMaxs[maxIndex] && minIndex != nObjects))
{
//Add split location to both bin lists
xLeft[i].rightEdge = xMins[minIndex];
xRight[i].rightEdge = xMins[minIndex];
//Add geometry to mins counter
xLeft[i+1].objectBoundCounter++;
minIndex++;
}
else
{
//Add split location to both bin lists
xLeft[i].rightEdge = xMaxs[maxIndex];
xRight[i].rightEdge = xMaxs[maxIndex];
//Add geometry to maxs counter
xRight[i].objectBoundCounter++;
maxIndex++;
}
}
//Repeat for y axis
minIndex = 0, maxIndex = 0;
for(int i = 0; i < dnObjects; i++)
{
if(maxIndex == nObjects || (yMins[minIndex] <= yMaxs[maxIndex] && minIndex != nObjects))
{
//Add split location to both bin lists
yLeft[i].rightEdge = yMins[minIndex];
yRight[i].rightEdge = yMins[minIndex];
//Add geometry to mins counter
yLeft[i+1].objectBoundCounter++;
minIndex++;
}
else
{
//Add split location to both bin lists
yLeft[i].rightEdge = yMaxs[maxIndex];
yRight[i].rightEdge = yMaxs[maxIndex];
//Add geometry to maxs counter
yRight[i].objectBoundCounter++;
maxIndex++;
}
}
//Repeat for z axis
minIndex = 0, maxIndex = 0;
for(int i = 0; i < dnObjects; i++)
{
if(maxIndex == nObjects || (zMins[minIndex] <= zMaxs[maxIndex] && minIndex != nObjects))
{
//Add split location to both bin lists
zLeft[i].rightEdge = zMins[minIndex];
zRight[i].rightEdge = zMins[minIndex];
//Add geometry to mins counter
zLeft[i+1].objectBoundCounter++;
minIndex++;
}
else
{
//Add split location to both bin lists
zLeft[i].rightEdge = zMaxs[maxIndex];
zRight[i].rightEdge = zMaxs[maxIndex];
//Add geometry to maxs counter
zRight[i].objectBoundCounter++;
maxIndex++;
}
}
//Free split memory
free(xMins);
free(xMaxs);
free(yMins);
free(yMaxs);
free(zMins);
free(zMaxs);
//PreCalcs
float voxelL = xRight[dnmoObjects].rightEdge - xLeft[0].rightEdge;
float voxelD = zRight[dnmoObjects].rightEdge - zLeft[0].rightEdge;
float voxelH = yRight[dnmoObjects].rightEdge - yLeft[0].rightEdge;
float voxelSA = 2.0f * voxelL * voxelD + 2.0f * voxelL * voxelH + 2.0f * voxelD * voxelH;
//Minimum cost preset to no split at all
float minCost = (float)nObjects;
float splitLoc;
int minLeftCounter = 0, minRightCounter = 0;
int axis = -1;
//---------------------------------------------------------------------------------------------
//Check costs of x-axis split planes keeping track of derivative using
//the fact that there is a minimum point on the graph costs vs split location
//Since there is one object per split plane
int splitIndex = 1;
float lastCost = nObjects * voxelL;
float tempCost;
float lastSplit = xLeft[1].rightEdge;
int leftCount = xLeft[1].objectBoundCounter, rightCount = nObjects - xRight[1].objectBoundCounter;
int lastLO = 0, lastRO = nObjects;
//Keep looping while cost is decreasing
while(splitIndex < dnObjects)
{
tempCost = leftCount * (xLeft[splitIndex].rightEdge - xLeft[0].rightEdge) + rightCount * (xLeft[dnmoObjects].rightEdge - xLeft[splitIndex].rightEdge);
if(tempCost < lastCost)
{
lastCost = tempCost;
lastSplit = xLeft[splitIndex].rightEdge;
lastLO = leftCount;
lastRO = rightCount;
}
//Update counters
splitIndex++;
leftCount += xLeft[splitIndex].objectBoundCounter;
rightCount -= xRight[splitIndex].objectBoundCounter;
}
//Calculate full SAH cost
lastCost = ((lastLO * (2 * (lastSplit - xLeft[0].rightEdge) * voxelD + 2 * (lastSplit - xLeft[0].rightEdge) * voxelH + 2 * voxelD * voxelH)) + (lastRO * (2 * (xLeft[dnmoObjects].rightEdge - lastSplit) * voxelD + 2 * (xLeft[dnmoObjects].rightEdge - lastSplit) * voxelH + 2 * voxelD * voxelH))) / voxelSA;
if(lastCost < minCost)
{
minCost = lastCost;
splitLoc = lastSplit;
minLeftCounter = lastLO;
minRightCounter = lastRO;
axis = 0;
}
//---------------------------------------------------------------------------------------------
//Repeat for y axis
splitIndex = 1;
lastCost = nObjects * voxelH;
lastSplit = yLeft[1].rightEdge;
leftCount = yLeft[1].objectBoundCounter;
rightCount = nObjects - yRight[1].objectBoundCounter;
lastLO = 0;
lastRO = nObjects;
//Keep looping while cost is decreasing
while(splitIndex < dnObjects)
{
tempCost = leftCount * (yLeft[splitIndex].rightEdge - yLeft[0].rightEdge) + rightCount * (yLeft[dnmoObjects].rightEdge - yLeft[splitIndex].rightEdge);
if(tempCost < lastCost)
{
lastCost = tempCost;
lastSplit = yLeft[splitIndex].rightEdge;
lastLO = leftCount;
lastRO = rightCount;
}
//Update counters
splitIndex++;
leftCount += yLeft[splitIndex].objectBoundCounter;
rightCount -= yRight[splitIndex].objectBoundCounter;
}
//Calculate full SAH cost
lastCost = ((lastLO * (2 * (lastSplit - yLeft[0].rightEdge) * voxelD + 2 * (lastSplit - yLeft[0].rightEdge) * voxelL + 2 * voxelD * voxelL)) + (lastRO * (2 * (yLeft[dnmoObjects].rightEdge - lastSplit) * voxelD + 2 * (yLeft[dnmoObjects].rightEdge - lastSplit) * voxelL + 2 * voxelD * voxelL))) / voxelSA;
if(lastCost < minCost)
{
minCost = lastCost;
splitLoc = lastSplit;
minLeftCounter = lastLO;
minRightCounter = lastRO;
axis = 1;
}
//---------------------------------------------------------------------------------------------
//Repeat for z axis
splitIndex = 1;
lastCost = nObjects * voxelD;
lastSplit = zLeft[1].rightEdge;
leftCount = zLeft[1].objectBoundCounter;
rightCount = nObjects - zRight[1].objectBoundCounter;
lastLO = 0;
lastRO = nObjects;
//Keep looping while cost is decreasing
while(splitIndex < dnObjects)
{
tempCost = leftCount * (zLeft[splitIndex].rightEdge - zLeft[0].rightEdge) + rightCount * (zLeft[dnmoObjects].rightEdge - zLeft[splitIndex].rightEdge);
if(tempCost < lastCost)
{
lastCost = tempCost;
lastSplit = zLeft[splitIndex].rightEdge;
lastLO = leftCount;
lastRO = rightCount;
}
//Update counters
splitIndex++;
leftCount += zLeft[splitIndex].objectBoundCounter;
rightCount -= zRight[splitIndex].objectBoundCounter;
}
//Calculate full SAH cost
lastCost = ((lastLO * (2 * (lastSplit - zLeft[0].rightEdge) * voxelL + 2 * (lastSplit - zLeft[0].rightEdge) * voxelH + 2 * voxelH * voxelL)) + (lastRO * (2 * (zLeft[dnmoObjects].rightEdge - lastSplit) * voxelL + 2 * (zLeft[dnmoObjects].rightEdge - lastSplit) * voxelH + 2 * voxelH * voxelL))) / voxelSA;
if(lastCost < minCost)
{
minCost = lastCost;
splitLoc = lastSplit;
minLeftCounter = lastLO;
minRightCounter = lastRO;
axis = 2;
}
//Free bin memory
free(xLeft);
free(xRight);
free(yLeft);
free(yRight);
free(zLeft);
free(zRight);
//---------------------------------------------------------------------------------------------
//Make sure a split is in our best interest
if(axis == -1)
{
//If not decrement the node counter
nodeIndex--;
BuildBranch(-1, objs, nObjects);
return;
}
//Allocate space for left and right lists
Mailbox** leftList = (Mailbox**)malloc(minLeftCounter * sizeof(void*));
Mailbox** rightList = (Mailbox**)malloc(minRightCounter * sizeof(void*));
//Sort objects into lists of those to the left and right of the split plane
int leftIndex = 0, rightIndex = 0;
leftCount = 0;
rightCount = 0;
switch(axis)
{
case 0:
for(int i = 0; i < nObjects; i++)
{
//Get object bounding box
objs[i]->prim->MakeBoundingBox(tempBox);
//Add to left and right lists when necessary
if(tempBox->x0 < splitLoc)
{
leftList[leftIndex++] = objs[i];
leftCount++;
}
if(tempBox->x1 > splitLoc)
{
rightList[rightIndex++] = objs[i];
rightCount++;
}
}
break;
case 1:
for(int i = 0; i < nObjects; i++)
{
//Get object bounding box
objs[i]->prim->MakeBoundingBox(tempBox);
//Add to left and right lists when necessary
if(tempBox->y0 < splitLoc)
{
leftList[leftIndex++] = objs[i];
leftCount++;
}
if(tempBox->y1 > splitLoc)
{
rightList[rightIndex++] = objs[i];
rightCount++;
}
}
break;
case 2:
for(int i = 0; i < nObjects; i++)
{
//Get object bounding box
objs[i]->prim->MakeBoundingBox(tempBox);
//Add to left and right lists when necessary
if(tempBox->z0 < splitLoc)
{
leftList[leftIndex++] = objs[i];
leftCount++;
}
if(tempBox->z1 > splitLoc)
{
rightList[rightIndex++] = objs[i];
rightCount++;
}
}
break;
};
//Delete the bounding box
delete tempBox;
//Delete old objects array
free(objs);
//Construct left and right branches
BuildBranch(height - 1, leftList, leftCount);
BuildBranch(height - 1, rightList, rightCount);
//Build this node
tree[thisNodeIndex] = KDTreeNode();
tree[thisNodeIndex].InitializeInterior(axis, splitLoc, nodeIndex - 1);
return;
}
EDIT:
Ok well I tried to replace the malloc/free with new/delete and that had no effect on the speed. I also found that it is only the free() on xLeft/xRight arrays that seem to affect the execution time significantly. I was able to eliminate the problem by moving the free() calls to after the recursive calls, although I do not know why this is making a difference because I don't see anywhere that these arrays are used after the original location for free(). As for why I am using malloc... some portions of this program use cache aligned memory, so I had been using _aligned_malloc. Although there probably is a way to get new to cache align, this is the only way I know to do it.
Is it possible that you are linking against a debug version of the runtime library that is doing something extra in free() like filling the memory with a garbage value? I have seen this behavior when you link against overly aggressive memory debugging libraries. The code that you have posted does not look strange. I would be interested to know what would happen if you replaced the arrays with std::vector or std::deque though. Vector should have behavior quite similar to the arrays and Deque may actually improve the speed a little if the arrays are large because the memory manager will not have to guarantee contiguous space.
If your program doing all of the free()ing on exit, then you might as well just skip the calls. The entire process heap is freed when you app exits.
Edit: ----
Ok, now that the code is posted, it appears to me that you aren't just freeing on exit, so you should definitely try and figure out if this is a wierd symptom of a bug, or just a costly implementation of free(). Instead of removing the free() calls, time how long it takes to execute them. is the heap manager really using up the whole 19 seconds?
I do see several places were multiple allocations have the same scope and lifetime. You could turn these into a single malloc/free call, althought that would make the code less clear and harder to mantain. So you have to ask yourself, how much does that 20 seconds matter?
Probably just the behavior of the heap manager your CRT uses. It's probably updating free lists, or some other internal structure to manage memory.
You probably should reexamine how your program allocates and uses memory if your bottleneck is here.
Having had a look at the code one big thing that comes to my mind is this - mixture of malloc(...), new(...), delete(...), free(...)
BoundingBox* tempBox = new BoundingBox();
// ....
//Delete the bounding box
delete tempBox;
yet in other places you have
Bin* xLeft = (Bin*)malloc(dnObjects * sizeof(Bin));
// ....
free(xMins);
In short, you are mixing the C++'s runtime in calling new(...) and delete(...) with malloc(...) and free(...).. After all, this is in C++, so a question for you here...
Why did you use the malloc(...) and free(...) which is from C in the middle of this C++ code? The repercussions I could see here, is that the C++ runtime is different in terms of using the memory allocation unlike C in the aspect of OOP paradigm.
Having said this, your best bet is:
Replace all calls to malloc with new.
Replace all calls to free with delete.
Re run the program again and see if that makes a different. Can you confirm this?
Hope this helps,
Best regards,
Tom.
+1 to malloc/free making my eyes hurt in C++. Ignoring that for a second and looking at the code, three ideas:
Roll up your malloc calls to one large malloc and free (for the x/y/left/right/etc structures) instead of 12. Set the pointers into this large buffer as appropriate.
Still talking about the x/y/left/right variables: Employ a small stack based buffer, that you can use when the number of objects is small. When the number of objects is large, then dynamically allocate. When it is not, just set your pointer to the local stack buffer. This can avoid dynamic memory management all together for small inputs.
Right now, your "object" list is dynamically allocated, freed, and reallocated with each recursive call (!!). This is confusing because ownership isn't clear; but also it's a performance issue. Consider reworking the code so one list of "objects" is ever used.
C++ stores some extra information when you allocate using new like the type of the object or number of characters(in case of array) etc..If you are using free, it could be a fragmentation problem where you are actually deleting only the chunks of data in between but not freeing the actual information stored by new. Just a thought.
When you corrupt the heap, it often becomes very slow. Try to run it in debug mode with debug version of your runtime as well.
It could be poor locality of reference for your code. For example, I see the following:
//Allocate memory for split options
float* xMins = (float*)malloc(nObjects * sizeof(float));
float* yMins = (float*)malloc(nObjects * sizeof(float));
float* zMins = (float*)malloc(nObjects * sizeof(float));
float* xMaxs = (float*)malloc(nObjects * sizeof(float));
float* yMaxs = (float*)malloc(nObjects * sizeof(float));
float* zMaxs = (float*)malloc(nObjects * sizeof(float));
...
free(xMins);
free(xMaxs);
free(yMins);
free(yMaxs);
free(zMins);
free(zMaxs);
Now, assuming that the allocations proceed basically linearly, then free(xMaxs); may need to dereference memory that was allocated some number of pages away from xMins (which was just dereferenced during free(xMins);), so you might need to swap in a page from the backing store in order to perform the free (which causes a huge slowdown in execution when that happens). Re-ordering the free()'s to match the allocation order could help... In this case, that'd mean
free(xMins);
free(yMins);
free(zMins);
free(xMaxs);
free(yMaxs);
free(zMaxs);
It sounds like you are running your program from a debugger in Windows, which by default causes a special debug heap to be used, which dramatically slows down memory deallocations. This applies even to non-debug builds, as long as they are launched from a debugger (such as Visual Studio). You should be able to disable this behavior by setting the environment variable _NO_DEBUG_HEAP=1 before running your program (I recommend setting it in the project configuration settings rather than in the system settings, if possible).
You didn't describe anything about your programming environment in the original question, however, so I had to make certain assumptions about it that might be wrong. If you're not running your program under Windows, for example, then my answer doesn't apply and I have no idea what the cause of your problem might be.