Backpropagation() to file - c++

I wanna get backPropagation proccess into a file in the most detail possible.
I've tried adding ofstream File as Attribute for Matrix class to write methods results via ofstream append to file but then most Matrices in SimpleNeuralNetwork class were marked with error (for example _weightMatrices).
class SimpleNeuralNetwork
std::vector<uint32_t> _topology;
std::vector<Matrix2D<float>> _weightMatrices;
std::vector<Matrix2D<float>> _valueMatrices;
std::vector<Matrix2D<float>> _biasMatrices;
float _learningRate;
// topology defines the no.of neurons for each layer
// learning rate defines how much modification should be done in each backwords propagation i.e. training
SimpleNeuralNetwork(std::vector<uint32_t> topology,float learningRate = 0.1f)
bool backPropagate(std::vector<float> targetOutput)
if(targetOutput.size() != _topology.back())
return false;
// determine the simple error
// error = target - output
Matrix2D<float> errors(targetOutput.size(), 1);
errors._vals = targetOutput;
errors = errors.add(_valueMatrices.back().negetive());
// back propagating the error from output layer to input layer
// and adjusting weights of weight matrices and bias matrics
for(int32_t i = _weightMatrices.size() - 1; i >= 0; i--)
//calculating errrors for previous layer
Matrix2D<float> prevErrors = errors.multiply(_weightMatrices[i].transpose());
//calculating gradient i.e. delta weight (dw)
//dw = lr * error * d/dx(activated value)
Matrix2D<float> dOutputs = _valueMatrices[i + 1].applyFunction(DSigmoid);
Matrix2D<float> gradients = errors.multiplyElements(dOutputs);
gradients = gradients.multiplyScaler(_learningRate);
Matrix2D<float> weightGradients = _valueMatrices[i].transpose().multiply(gradients);
//adjusting bias and weight
_biasMatrices[i] = _biasMatrices[i].add(gradients);
_weightMatrices[i] = _weightMatrices[i].add(weightGradients);
errors = prevErrors;


ScalableTSDFVolume Integrate from TUM-RGBD Dataset

I am using Open3D 0.15 and C++11 on Ubuntu 18.04.
The main function I'm interested in is the ScalabeTSDFVolume Integrate() function, using the TUM RGBD dataset (the xyz set to be exact), based off of the IntegrateRGBD example from the Open3D repo.
Since the TUM-RGBD dataset does not provide an association file that matches the RGBD images and the trajectory info, I've created my own small code that matches the timestamp on the TUM dataset's image data and the trajectory information, and converting the 7-dimension [x y z rx ry rz rw] trajectory information into Eigen::Matrix4d, using the same equation that Open3D's FileTUM.cpp uses:
// Read the timestamp first
gt >> p_gt.timestamp;
double poseArr[7];
// push the remaining 7 numbers to the poseArr
for (int i = 0; i < 7; i++)
gt >> poseArr[i];
// copy paste of the tum trajectory reader
Eigen::Matrix4d transform;
transform.topLeftCorner<3, 3>() =
Eigen::Quaterniond(poseArr[6], poseArr[3], poseArr[4], poseArr[5]).toRotationMatrix();
transform.topRightCorner<3, 1>() = Eigen::Vector3d(poseArr[0], poseArr[1], poseArr[2]);
p_gt.pose = transform.inverse();
} while (std::getline(gt, line));
The code runs fine, but the issue is when I try to integrate multiple frames into the same volume and extract its pointcloud or mesh.
I can tell that the RGBD information is being fed into the program correctly, by extracting the mesh at the very first frame:
first frame mesh extraction
But there is a significant artifact when I try to extract the mesh when more frames are integrated, like this:
30 frames mesh extraction
From my previous experience, this probably has to do with the fact that the transformation matrices are not in the correct axis. If anyone has tried to use the TUM dataset with Open3D and encountered the same problem, I would greatly appreciate any info on this.
For reference, this is the modified code I'm using for the reconstruction.
int main(int argc, char *argv[]) {
using namespace open3d;
std::string filebase("/home/geometry/Documents/rgbd_dataset_freiburg1_xyz");
VirtualSensor::CameraParameters kinect{ 525.0,525.0,319.5,239.5,5000};
VirtualSensor::CameraParameters camPar = kinect;
VirtualSensor v1(filebase,camPar);
bool save_pointcloud = true;
bool save_mesh = true;
bool save_voxel = false;
int every_k_frames = 50;
double length = 4.0;
double uLength = 6.0;
int resolution = 512;
double sdf_trunc_percentage = 0.01;
int verbose = 2;
auto camera_intrinsic = camera::PinholeCameraIntrinsic(640, 480, 525.0, 525.0, 319.5, 239.5);
int index = 0;
int save_index = 0;
int pairSize = 30;
// initialise TSDF
pipelines::integration::ScalableTSDFVolume volume(
length / (double)resolution, length * sdf_trunc_percentage,
//pipelines::integration::UniformTSDFVolume uVolume(uLength, resolution, uLength*sdf_trunc_percentage, pipelines::integration::TSDFVolumeColorType::RGB8);
utility::FPSTimer timer("Process RGBD stream",
geometry::Image depth, color;
// start loop
for(int i = 0; i < pairSize; i++){
utility::LogInfo("Processing frame {:d} ...", index);
io::ReadImage(v1.GetDepthPath(i), depth);
io::ReadImage(v1.GetColorPath(i), color);
auto rgbd = geometry::RGBDImage::CreateFromColorAndDepth(
color, depth, 5000.0, 6.0, false);
if (index == 0 ||
(every_k_frames > 0 && index % every_k_frames == 0))
camera_intrinsic, // intrinsic never changes
v1.GetCounterGT(i)); // get the groundtruth pose from my class
// saving mesh/pc logic
if (index == pairSize ||
(every_k_frames > 0 && index % every_k_frames == 0)) {
utility::LogInfo("Saving fragment {:d} ...", save_index);
std::string save_index_str = std::to_string(save_index);
if (save_pointcloud) {
utility::LogInfo("Saving pointcloud {:d} ...", save_index);
auto pcd = volume.ExtractPointCloud();
io::WritePointCloud("pointcloud_" + save_index_str + ".ply",
if (save_mesh) {
utility::LogInfo("Saving mesh {:d} ...", save_index);
auto mesh = volume.ExtractTriangleMesh();
io::WriteTriangleMesh("mesh_" + save_index_str + ".ply",
if (save_voxel) {
utility::LogInfo("Saving voxel {:d} ...", save_index);
auto voxel = volume.ExtractVoxelPointCloud();
io::WritePointCloud("voxel_" + save_index_str + ".ply",
return 0;

Applying a peak detection algorithm to a realtime data

I have a function to detect the peak of real-time data. The algorithm is mentioned in this thread. which looks like this:
std::vector<int> smoothedZScore(std::vector<float> input)
//lag 5 for the smoothing functions
int lag = 5;
//3.5 standard deviations for signal
float threshold = 3.5;
//between 0 and 1, where 1 is normal influence, 0.5 is half
float influence = .5;
if (input.size() <= lag + 2)
std::vector<int> emptyVec;
return emptyVec;
//Initialise variables
std::vector<int> signal(input.size(), 0.0);
std::vector<float> filteredY(input.size(), 0.0);
std::vector<float> avgFilter(input.size(), 0.0);
std::vector<float> stdFilter(input.size(), 0.0);
std::vector<float> subVecStart(input.begin(), input.begin() + lag);
double sum = std::accumulate(std::begin(subVecStart), std::end(subVecStart), 0.0);
double mean = sum / subVecStart.size();
double accum = 0.0;
std::for_each (std::begin(subVecStart), std::end(subVecStart), [&](const double d) {
accum += (d - mean) * (d - mean);
double stdev = sqrt(accum / (subVecStart.size()-1));
//avgFilter[lag] = mean(subVecStart);
avgFilter[lag] = mean;
//stdFilter[lag] = stdDev(subVecStart);
stdFilter[lag] = stdev;
for (size_t i = lag + 1; i < input.size(); i++)
if (std::abs(input[i] - avgFilter[i - 1]) > threshold * stdFilter[i - 1])
if (input[i] > avgFilter[i - 1])
signal[i] = 1; //# Positive signal
signal[i] = -1; //# Negative signal
//Make influence lower
filteredY[i] = influence* input[i] + (1 - influence) * filteredY[i - 1];
signal[i] = 0; //# No signal
filteredY[i] = input[i];
//Adjust the filters
std::vector<float> subVec(filteredY.begin() + i - lag, filteredY.begin() + i);
// avgFilter[i] = mean(subVec);
// stdFilter[i] = stdDev(subVec);
return signal;
In my code, I'm reading real-time 3 axis accelerometer values from IMU sensor and displaying it as a graph. I need to detect the peak of the signal using the above algorithm. I added the function to my code.
Let's say the realtime valuees are following:
double x = sample->acceleration_g[0];
double y = sample->acceleration_g[1];
double z = sample->acceleration_g[2];
How do I pass this value to the above function and detect the peak.
I tried calling this:
but gives me an error:
settings.cpp:230:40: error: no matching function for call to 'smoothedZScore'
settings.cpp:92:18: note: candidate function not viable: no known conversion from 'double' to 'std::vector<float>' for 1st argument
The algorithm needs a minimum of 7 samples to feed in. So I guess I may need to store my realtime data in a buffer.
But I've difficulty understanding how to store samples in a buffer and apply to the peak detection algorithm.
can you show me a possible solution to this?
You will need to rewrite the algorithm. Your problem isn't just a realtime problem, you also need a causal solution. The function you have is not causal.
Practically speaking, you will need a class, and that class will need to incrementally calculate the standard deviation.

Cross entropy applied to backpropagation in neural network

I watched this awesome video by Dave Miller on making a neural network from scratch in C++ here:
Here is the full source code referenced in the video:
It uses mean squared error as the cost function. I'm interested in using a neural network for binary classification though and so would like to use cross-entropy as the cost function. I was hoping to add this to this code if possible, since I've already been playing around with it.
How would that be applied specifically here?
Would the only difference be in how the error is calculated for the output layer...or do the equations change all the way through backpropogation?
Does anything change at all? Is MSE versus cross-entropy solely used to get an idea of the overall error and not independently relevant to backpropogation?
Edit for clarity:
Here are the relevant functions.
//output layer - seems like error is just target value minus calculated value
void Neuron::calcOutputGradients(double targetVal)
double delta = targetVal - m_outputVal;
m_gradient = delta * Neuron::transferFunctionDerivative(m_outputVal);
double Neuron::sumDOW(const Layer &nextLayer) const
double sum = 0.0;
// Sum our contributions of the errors at the nodes we feed.
for (unsigned n = 0; n < nextLayer.size() - 1; ++n) {
sum += m_outputWeights[n].weight * nextLayer[n].m_gradient;
return sum;
void Neuron::calcHiddenGradients(const Layer &nextLayer)
double dow = sumDOW(nextLayer);
m_gradient = dow * Neuron::transferFunctionDerivative(m_outputVal);
void Neuron::updateInputWeights(Layer &prevLayer)
// The weights to be updated are in the Connection container in the neurons in the preceding layer
for (unsigned n = 0; n < prevLayer.size(); ++n) {
Neuron &neuron = prevLayer[n];
double oldDeltaWeight = neuron.m_outputWeights[m_myIndex].deltaWeight;
//calculate new weight for neuron with momentum
double newDeltaWeight = eta * neuron.getOutputVal() * m_gradient + alpha * oldDeltaWeight;
neuron.m_outputWeights[m_myIndex].deltaWeight = newDeltaWeight;
neuron.m_outputWeights[m_myIndex].weight += newDeltaWeight;
Finally found the answer here:
You only have to change how the error at the output layer is calculated.
The relevant function to be changed is:
void Neuron::calcOutputGradients(double targetVal)
For mean square errors use:
double delta = targetVal - m_outputVal;
m_gradient = delta * Neuron::transferFunctionDerivative(m_outputVal);
For cross entropy just use:
m_gradient = targetVal - m_outputVal;

How to replace an instance with another instance via pointer?

I'm doing online destructive clustering (clusters replace clustered objects) on a list of class instances (stl::list).
My list of current percepUnits is: stl::list<percepUnit> units; and for each iteration I get a new list of input percepUnits stl::list<percepUnit> scratch; that need to be clustered with the units.
I want to maintain a fixed number of percepUnits (so units.size() is constant), so for each new scratch percepUnit I need to merge it with the nearest percepUnit in units. Following is a code snippet that builds a list (dists) of structures (percepUnitDist) that contain pointers to each pair of items in scratch and units percepDist.scratchUnit = &(*scratchUnit); and percepDist.unit = &(*unit); and their distance. Additionally, for each item in scratch I keep track of which item in units has the least distance minDists.
// For every scratch percepUnit:
for (scratchUnit = scratch.begin(); scratchUnit != scratch.end(); scratchUnit++) {
float minDist=2025.1172; // This is the max possible distance in unnormalized CIELuv, and much larger than the normalized dist.
// For every percepUnit:
for (unit = units.begin(); unit != units.end(); unit++) {
// compare pairs
float dist = featureDist(*scratchUnit, *unit, FGBG);
//cout << "distance: " << dist << endl;
// Put pairs in a structure that caches their distances
percepUnitDist percepDist;
percepDist.scratchUnit = &(*scratchUnit); // address of where scratchUnit points to.
percepDist.unit = &(*unit);
percepDist.dist = dist;
// Figure out the percepUnit that is closest to this scratchUnit.
if (dist < minDist)
minDist = dist;
dists.push_back(percepDist); // append dist struct
minDists.push_back(minDist); // append the min distance to the nearest percepUnit for this particular scratchUnit.
So now I just need to loop through the percepUnitDist items in dists and match the distances with the minimum distances to figure out which percepUnit in scratch should be merged with which percepUnit in units. The merging process mergePerceps() creates a new percepUnit which is a weighted average of the "parent" percepUnits in scratch and units.
I want to replace the instance in the units list with the new percepUnit constructed by mergePerceps(), but I would like to do so in the context of looping through the percepUnitDists. This is my current code:
// Loop through dists and merge all the closest pairs.
// Loop through all dists
for (distIter = dists.begin(); distIter != dists.end(); distIter++) {
// Loop through all minDists for each scratchUnit.
for (minDistsIter = minDists.begin(); minDistsIter != minDists.end(); minDistsIter++) {
// if this is the closest cluster, and the closest cluster has not already been merged, and the scratch has not already been merged.
if (*minDistsIter == distIter->dist and not distIter->scratchUnit->remove) {
percepUnit newUnit;
mergePerceps(*(distIter->scratchUnit), *(distIter->unit), newUnit, FGBG);
*(distIter->unit) = newUnit; // replace the cluster with the new merged version.
distIter->scratchUnit->remove = true;
I thought that I could replace the instance in units via the percepUnitDist pointer with the new percepUnit instance using *(distIter->unit) = newUnit;, but that does not seem to be working as I'm seeing a memory leak, implying the instances in the units are not getting replaced.
How do I delete the percepUnit in the units list and replace it with a new percepUnit instance such that the new unit is located in the same location?
Here is the percepUnit class. Note the cv::Mat members. Following is the mergePerceps() function and the mergeImages() function on which it depends:
// Function to construct an accumulation.
void clustering::mergeImages(Mat &scratch, Mat &unit, cv::Mat &merged, const string maskOrImage, const string FGBG, const float scratchWeight, const float unitWeight) {
int width, height, type=CV_8UC3;
Mat scratchImagePad, unitImagePad, scratchImage, unitImage;
// use the resolution and aspect of the largest of the pair.
if (unit.cols > scratch.cols)
width = unit.cols;
width = scratch.cols;
if (unit.rows > scratch.rows)
height = unit.rows;
height = scratch.rows;
if (maskOrImage == "mask")
type = CV_8UC1; // single channel mask
else if (maskOrImage == "image")
type = CV_8UC3; // three channel image
cout << "maskOrImage is not 'mask' or 'image'\n";
merged = Mat(height, width, type, Scalar::all(0));
scratchImagePad = Mat(height, width, type, Scalar::all(0));
unitImagePad = Mat(height, width, type, Scalar::all(0));
// weight images before summation.
// because these pass by reference, they mess up the images in memory!
scratch *= scratchWeight;
unit *= unitWeight;
// copy images into padded images.
merged = scratchImagePad+unitImagePad;
// Merge two perceps and return a new percept to replace them.
void clustering::mergePerceps(percepUnit scratch, percepUnit unit, percepUnit &mergedUnit, const string FGBG) {
Mat accumulation;
Mat accumulationMask;
Mat meanColour;
int x, y, w, h, area;
float l,u,v;
int numMerges=0;
std::vector<float> featuresVar; // Normalized, Sum, Variance.
//float featuresVarMin, featuresVarMax; // min and max variance accross all features.
float scratchWeight, unitWeight;
if (FGBG == "FG") {
// foreground percepts don't get merged as much.
scratchWeight = 0.65;
unitWeight = 1-scratchWeight;
} else {
scratchWeight = 0.85;
unitWeight = 1-scratchWeight;
// Images TODO remove the meanColour if needbe.
mergeImages(scratch.image, unit.image, accumulation, "image", FGBG, scratchWeight, unitWeight);
mergeImages(scratch.mask, unit.mask, accumulationMask, "mask", FGBG, scratchWeight, unitWeight);
mergeImages(scratch.meanColour, unit.meanColour, meanColour, "image", "FG", scratchWeight, unitWeight); // merge images
// Position and size.
x = (scratch.x1*scratchWeight) + (unit.x1*unitWeight);
y = (scratch.y1*scratchWeight) + (unit.y1*unitWeight);
w = (scratch.w*scratchWeight) + (unit.w*unitWeight);
h = (scratch.h*scratchWeight) + (unit.h*unitWeight);
// area
area = (scratch.area*scratchWeight) + (unit.area*unitWeight);
// colour
l = (scratch.l*scratchWeight) + (unit.l*unitWeight);
u = (scratch.u*scratchWeight) + (unit.u*unitWeight);
v = (scratch.v*scratchWeight) + (unit.v*unitWeight);
// Number of merges
if (scratch.numMerges < 1 and unit.numMerges < 1) { // both units are patches
numMerges = 1;
} else if (scratch.numMerges < 1 and unit.numMerges >= 1) { // unit A is a patch, B a percept
numMerges = unit.numMerges + 1;
} else if (scratch.numMerges >= 1 and unit.numMerges < 1) { // unit A is a percept, B a patch.
numMerges = scratch.numMerges + 1;
cout << "merged scratch??" <<endl;
// TODO this may be an impossible case.
} else { // both units are percepts
numMerges = scratch.numMerges + unit.numMerges;
cout << "Merging two already merged Percepts" <<endl;
// TODO this may be an impossible case.
// Create unit.
mergedUnit = percepUnit(accumulation, accumulationMask, x, y, w, h, area); // time is the earliest value in times?
mergedUnit.l = l; // members not in the constrcutor.
mergedUnit.u = u;
mergedUnit.v = v;
mergedUnit.numMerges = numMerges;
mergedUnit.meanColour = meanColour;
mergedUnit.pActivated = unit.pActivated; // new clusters retain parent's history of activation.
mergedUnit.scratch = false;
mergedUnit.habituation = unit.habituation; // we inherent the habituation of the cluster we merged with.
Changing the copy and assignment operators had performance side-effects and did not seem to resolve the problem. So I've added a custom function to do the replacement, which just like the copy operator makes copies of each member and make's sure those copies are deep. The problem is that I still end up with a leak.
So I've changed this line: *(distIter->unit) = newUnit;
to this: (*(distIter->unit)).clone(newUnit)
Where the clone method is as follows:
// Deep Copy of members
void percepUnit::clone(const percepUnit &source) {
// Deep copy of Mats
this->image = source.image.clone();
this->mask = source.mask.clone();
this->alphaImage = source.alphaImage.clone();
this->meanColour = source.meanColour.clone();
// shallow copies of everything else
this->alpha = source.alpha;
this->fadingIn = source.fadingIn;
this->fadingHold = source.fadingHold;
this->fadingOut = source.fadingOut;
this->l = source.l;
this->u = source.u;
this->v = source.v;
this->x1 = source.x1;
this->y1 = source.y1;
this->w = source.w;
this->h = source.h;
this->x2 = source.x2;
this->y2 = source.y2;
this->cx =;
this->cy =;
this->numMerges = source.numMerges;
this->id =;
this->area = source.area;
this->features = source.features;
this->featuresNorm = source.featuresNorm;
this->remove = source.remove;
this->fgKnockout = source.fgKnockout;
this->colourCalculated = source.colourCalculated;
this->normalized = source.normalized;
this->activation = source.activation;
this->activated = source.activated;
this->pActivated = source.pActivated;
this->habituation = source.habituation;
this->scratch = source.scratch;
this->FGBG = source.FGBG;
And yet, I still see a memory increase. The increase does not happen if I comment out that single replacement line. So I'm still stuck.
I can prevent memory from increasing if I disable the cv::Mat cloning code in the function above:
// Deep Copy of members
void percepUnit::clone(const percepUnit &source) {
/* try releasing Mats first?
// No effect on memory increase, but the refCount is decremented.
/* Deep copy of Mats
this->image = source.image.clone();
this->mask = source.mask.clone();
this->alphaImage = source.alphaImage.clone();
this->meanColour = source.meanColour.clone();*/
// shallow copies of everything else
this->alpha = source.alpha;
this->fadingIn = source.fadingIn;
this->fadingHold = source.fadingHold;
this->fadingOut = source.fadingOut;
this->l = source.l;
this->u = source.u;
this->v = source.v;
this->x1 = source.x1;
this->y1 = source.y1;
this->w = source.w;
this->h = source.h;
this->x2 = source.x2;
this->y2 = source.y2;
this->cx =;
this->cy =;
this->numMerges = source.numMerges;
this->id =;
this->area = source.area;
this->features = source.features;
this->featuresNorm = source.featuresNorm;
this->remove = source.remove;
this->fgKnockout = source.fgKnockout;
this->colourCalculated = source.colourCalculated;
this->normalized = source.normalized;
this->activation = source.activation;
this->activated = source.activated;
this->pActivated = source.pActivated;
this->habituation = source.habituation;
this->scratch = source.scratch;
this->FGBG = source.FGBG;
While I still can't explain this issue, I did notice another hint. I realized that this leak can also be stopped if I don't normalize those features I use to cluster via featureDist() (but continue to clone cv::Mats). The really odd thing is that I rewrote that code entirely and still the problem persists.
Here is the featureDist function:
float clustering::featureDist(percepUnit unitA, percepUnit unitB, const string FGBG) {
float distance=0;
if (FGBG == "BG") {
for (unsigned int i=0; i<unitA.featuresNorm.rows; i++) {
distance += pow(abs(<float>(i) -<float>(i)),0.5);
//cout << "unitA.featuresNorm[" << i << "]: " << unitA.featuresNorm[i] << endl;
//cout << "unitB.featuresNorm[" << i << "]: " << unitB.featuresNorm[i] << endl;
// for FG, don't use normalized colour features.
// TODO To include the area use i=4
} else if (FGBG == "FG") {
for (unsigned int i=4; i<unitA.features.rows; i++) {
distance += pow(abs(<float>(i) -<float>(i)),0.5);
} else {
cout << "FGBG argument was not FG or BG, returning 0." <<endl;
return 0;
return pow(distance,2);
Features used to be a vector of floats, and thus the normalization code was as follows:
void clustering::normalize(list<percepUnit> &scratch, list<percepUnit> &units) {
list<percepUnit>::iterator unit;
list<percepUnit*>::iterator unitPtr;
vector<float> min,max;
list<percepUnit*> masterList; // list of pointers.
// generate pointers
for (unit = scratch.begin(); unit != scratch.end(); unit++)
masterList.push_back(&(*unit)); // add pointer to where unit points to.
for (unit = units.begin(); unit != units.end(); unit++)
masterList.push_back(&(*unit)); // add pointer to where unit points to.
int numFeatures = masterList.front()->features.size(); // all percepts have the same number of features.
min.resize(numFeatures); // allocate for the number of features we have.
// Loop through all units to get feature values
for (int i=0; i<numFeatures; i++) {
min[i] = masterList.front()->features[i]; // starting point.
max[i] = min[i];
// calculate min and max for each feature.
for (unitPtr = masterList.begin(); unitPtr != masterList.end(); unitPtr++) {
if ((*unitPtr)->features[i] < min[i])
min[i] = (*unitPtr)->features[i];
if ((*unitPtr)->features[i] > max[i])
max[i] = (*unitPtr)->features[i];
// Normalize features according to min/max.
for (int i=0; i<numFeatures; i++) {
for (unitPtr = masterList.begin(); unitPtr != masterList.end(); unitPtr++) {
(*unitPtr)->featuresNorm[i] = ((*unitPtr)->features[i]-min[i]) / (max[i]-min[i]);
(*unitPtr)->normalized = true;
I changed the features type to a cv::Mat so I could use the opencv normalization function, so I rewrote the normalization function as follows:
void clustering::normalize(list<percepUnit> &scratch, list<percepUnit> &units) {
Mat featureMat = Mat(1,units.size()+scratch.size(), CV_32FC1, Scalar(0));
list<percepUnit>::iterator unit;
// For each feature
for (int i=0; i< units.begin()->features.rows; i++) {
// for each unit in units
int j=0;
float value;
for (unit = units.begin(); unit != units.end(); unit++) {
// Populate featureMat j is the unit index, i is the feature index.
value = unit-><float>(i);<float>(j) = value;
// for each unit in scratch
for (unit = scratch.begin(); unit != scratch.end(); unit++) {
// Populate featureMat j is the unit index, i is the feature index.
value = unit-><float>(i);<float>(j) = value;
// Normalize this featureMat in place
cv::normalize(featureMat, featureMat, 0, 1, NORM_MINMAX);
// set normalized values in percepUnits from featureMat
// for each unit in units
for (unit = units.begin(); unit != units.end(); unit++) {
// Populate percepUnit featuresNorm, j is the unit index, i is the feature index.
value =<float>(j);
unit-><float>(i) = value;
// for each unit in scratch
for (unit = scratch.begin(); unit != scratch.end(); unit++) {
// Populate percepUnit featuresNorm, j is the unit index, i is the feature index.
value =<float>(j);
unit-><float>(i) = value;
I can't understand what the interaction between mergePercepts and normalization, especially since normalization is an entirely rewritten function.
Massif and my /proc memory reporting don't agree. Massif says there is no effect of normalization on memory usage, only commenting out the percepUnit::clone() operation bypasses the leak.
Here is all the code, in case the interaction is somewhere else I am missing.
Here is another version of the same code with the dependence on OpenCV GPU removed, to facilitate testing...
It was recommended by Nghia (on the opencv forum) that I try and make the percepts a constant size. Sure enough, if I fix the dimensions and type of the cv::Mat members of percepUnit, then the leak disappears.
So it seems to me this is a bug in OpenCV that effects calling clone() and copyTo() on Mats of different sizes that are class members. So far unable to reproduce in a simple program. The leak does seem small enough that it may be the headers leaking, rather than the underlying image data.

Using OpenCV Cascade - Working solely with haartraning XML file

I'm trying to implement the Viola Johns face detection algorithm on Cuda platform (I'm aware that openCV already did that, I do that for my school).
My first phase is to implement the algorithm on CPU.
I'm using openCV library, I know openCV knows how to do face detection, In order to understand, I would like to get back to basic and do it my own way.
I created the integral sum representation, and the squere sum integral representation using openCV function.
I iterated through the cascade. iterated through the stages, classfiers and rects. Normalized each window, calculated the sum of each classifer and compared to the threshold, Sadly it's seems like I'm missing something. because I can't detect faces.
It seems like I need to get better understanding of the the cascade XML file.
Here is an example:
<!-- tree 158 -->
<!-- root node -->
<_>3 6 2 2 -1.</_>
<_>3 6 1 1 2.</_>
<_>4 7 1 1 2.</_></rects>
<!-- tree 159 -->
<!-- tree 159 -->
<!-- root node -->
<_>16 6 3 2 -1.</_>
<_>16 7 3 1 2.</_></rects>
I'd like to understand what is the meaning of the left_val and the right_val? What is the meaning of the parent, next values? How to calculate each classifier normalized sum? Is there anything I'm doing wrong here?
See my code attached.
int RunHaarClassifierCascadeSum(CascadeClassifier * face_cascade, CvMat* image , CvMat* sum , CvMat* sqsum,
CvMat* tilted,CvSize *scaningWindowSize, int iteratorRow, int iteratorCol )
// Normalize the current scanning window - Detection window
// Variance(x) = E(x^2) - (E(x))^2 = detectionWindowSquereExpectancy - detectionWindowExpectancy^2
// Expectancy(x) = E(x) = sum_of_pixels / size_of_window
double detectionWindowTotalSize = scaningWindowSize->height * scaningWindowSize->width;
// calculate the detection Window Expectancy , e.g the E(x)
double sumDetectionWindowPoint1,sumDetectionWindowPoint2,sumDetectionWindowPoint3,sumDetectionWindowPoint4; // ______________________
sumDetectionWindowPoint1 = cvGetReal2D(sum,iteratorRow,iteratorCol); // |R1 R2|
sumDetectionWindowPoint2 = cvGetReal2D(sum,iteratorRow+scaningWindowSize->width,iteratorCol); // | | Sum = R4-R2-R3+R1
sumDetectionWindowPoint3 = cvGetReal2D(sum,iteratorRow,iteratorCol+scaningWindowSize->height); // |R3________________R4|
sumDetectionWindowPoint4 = cvGetReal2D(sum,iteratorRow+scaningWindowSize->width,iteratorCol+scaningWindowSize->height);
double detectionWindowSum = calculateSum(sumDetectionWindowPoint1,sumDetectionWindowPoint2,sumDetectionWindowPoint3,sumDetectionWindowPoint4);
const double detectionWindowExpectancy = detectionWindowSum / detectionWindowTotalSize; // E(x)
// calculate the Square detection Window Expectancy , e.g the E(x^2)
double squareSumDetectionWindowPoint1,squareSumDetectionWindowPoint2,squareSumDetectionWindowPoint3,squareSumDetectionWindowPoint4; // ______________________
squareSumDetectionWindowPoint1 = cvGetReal2D(sqsum,iteratorRow,iteratorCol); // |R1 R2|
squareSumDetectionWindowPoint2 = cvGetReal2D(sqsum,iteratorRow+scaningWindowSize->width,iteratorCol); // | | Sum = R4-R2-R3+R1
squareSumDetectionWindowPoint3 = cvGetReal2D(sqsum,iteratorRow,iteratorCol+scaningWindowSize->height); // |R3________________R4|
squareSumDetectionWindowPoint4 = cvGetReal2D(sqsum,iteratorRow+scaningWindowSize->width,iteratorCol+scaningWindowSize->height);
double detectionWindowSquareSum = calculateSum(squareSumDetectionWindowPoint1,squareSumDetectionWindowPoint2,squareSumDetectionWindowPoint3,squareSumDetectionWindowPoint4);
const double detectionWindowSquareExpectancy = detectionWindowSquareSum / detectionWindowTotalSize; // E(x^2)
const double detectionWindowVariance = detectionWindowSquareExpectancy - std::pow(detectionWindowExpectancy,2); // Variance(x) = E(x^2) - (E(x))^2
const double detectionWindowStandardDeviation = std::sqrt(detectionWindowVariance);
if (detectionWindowVariance<=0)
return -1 ; // Error
// Normalize the cascade window to the normal scale window
double normalizeScaleWidth = double(scaningWindowSize->width / face_cascade->oldCascade->orig_window_size.width);
double normalizeScaleHeight = double(scaningWindowSize->height / face_cascade->oldCascade->orig_window_size.height);
// Calculate the cascade for each one of the windows
for( int stageIterator=0; stageIterator< face_cascade->oldCascade->count; stageIterator++ ) // Stage iterator
CvHaarStageClassifier* pCvHaarStageClassifier = face_cascade->oldCascade->stage_classifier + stageIterator;
for (int CvHaarStageClassifierIterator=0;CvHaarStageClassifierIterator<pCvHaarStageClassifier->count;CvHaarStageClassifierIterator++) // Classifier iterator
CvHaarClassifier* classifier = pCvHaarStageClassifier->classifier + CvHaarStageClassifierIterator;
float classifierSum=0.;
for( int CvHaarClassifierIterator = 0; CvHaarClassifierIterator < classifier->count;CvHaarClassifierIterator++ ) // Feature iterator
CvHaarFeature * pCvHaarFeature = classifier->haar_feature;
// Remark
if (pCvHaarFeature->tilted==1)
// Remark
for( int CvHaarFeatureIterator = 0; CvHaarFeatureIterator< CV_HAAR_FEATURE_MAX; CvHaarFeatureIterator++ ) // 3 Features iterator
CvRect * currentRect = &(pCvHaarFeature->rect[CvHaarFeatureIterator].r);
// Normalize the rect to the scaling window scale
CvRect normalizeRec;
normalizeRec.x = (int)(currentRect->x*normalizeScaleWidth);
normalizeRec.y = (int)(currentRect->y*normalizeScaleHeight);
normalizeRec.width = (int)(currentRect->width*normalizeScaleWidth);
normalizeRec.height = (int)(currentRect->height*normalizeScaleHeight);
double sumRectPoint1,sumRectPoint2,sumRectPoint3,sumRectPoint4; // ______________________
sumRectPoint1 = cvGetReal2D(sum,normalizeRec.x,normalizeRec.y); // |R1 R2|
sumRectPoint2 = cvGetReal2D(sum,normalizeRec.x+normalizeRec.width,normalizeRec.y); // | | Sum = R4-R2-R3+R1
sumRectPoint3 = cvGetReal2D(sum,normalizeRec.x,normalizeRec.y+normalizeRec.height); // |R3________________R4|
sumRectPoint4 = cvGetReal2D(sum,normalizeRec.x+normalizeRec.width,normalizeRec.y+normalizeRec.height);
double nonNormalizeRect = calculateSum(sumRectPoint1,sumRectPoint2,sumRectPoint3,sumRectPoint4); //
double sumMean = detectionWindowExpectancy*(normalizeRec.width*normalizeRec.height); // sigma(Pi) = normalizeRect = (sigma(Pi- rect) - sigma(mean)) / detectionWindowStandardDeviation
double normalizeRect = (nonNormalizeRect - sumMean)/detectionWindowStandardDeviation; //
classifierSum += (normalizeRect*(pCvHaarFeature->rect[CvHaarFeatureIterator].weight));
// if (classifierSum > (*(classifier->threshold)) )
// return 0; // That's not a face !
if (classifierSum > ((*(classifier->threshold))*detectionWindowStandardDeviation) )
return -stageIterator; // That's not a face ! , failed on stage number
return 1; // That's a face
You need to make some big changes. First of all classifier->threshold is a threshold for each feature. classifier->alpha points to an array made of 2 elements - left_val and right_val(to my understanding). You should put something like this after the classifier loop-
a = classifier->alpha[0]
b = classifier->alpha[1]
t = *(classifier->threshold)
stage_sum += classifierSum < t ? a : b
then compare stage_sum with CvHaarStageClassifier::threshold which is the stage threshold, loop through stage_classifiers[i] .if it passes all of them then its a face!
'parent' and 'next' are useless here if you use haarcascade_frontalface_alt.xml, it is just a stump based cascade and not a tree based.