Rcpp moving average - boundary error leads to fatal error - c++

I use a rolling weighted moving average function whose code is provided below. It is coded in C++ via Rcpp.
This function works for most times series there is no loop issues or anything like this. I provided below a times series of length 2 that sometimes triggers the fatal error.
I could not find the reason of the error.
Thanks for your help! =)
Here is the R code :
# Install packages
sourceCpp("partialMA.cpp")
spencer_weights=c( -3, -6, -5, 3, 21, 46, 67, 0, 67, 46, 21, 3, -5, -6, -3)
spencer_ma <- function(x) roll_mean(x,spencer_weights)
x=c(11.026420323685528,0.25933761651337001)
spencer_ma(x) # works
for(i in 1:1000) spencer_ma(x) # triggers the fatal error
I include the C++ code of my roll_mean function below :
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector roll_mean(const NumericVector& x,
const NumericVector& w) {
int n = x.size();
int w_size = w.size();
int size = (w_size - 1) / 2;
NumericVector res(n);
int i, ind_x, ind_w;
double w_sum = Rcpp::sum(w), tmp_wsum, tmp_xwsum, tmp_w;
// beginning
for (i = 0; i < size; i++) {
tmp_xwsum = tmp_wsum = 0;
for (ind_x = i + size, ind_w = w_size - 1; ind_x >= 0; ind_x--, ind_w--) {
tmp_w = w[ind_w];
tmp_wsum += tmp_w;
tmp_xwsum += x[ind_x] * tmp_w;
}
res[i] = tmp_xwsum / tmp_wsum;
}
// middle
int lim2 = n - size;
for (; i < lim2; i++) {
tmp_xwsum = 0;
for (ind_x = i - size, ind_w = 0; ind_w < w_size; ind_x++, ind_w++) {
tmp_xwsum += x[ind_x] * w[ind_w];
}
res[i] = tmp_xwsum / w_sum;
}
// end
for (; i < n; i++) {
tmp_xwsum = tmp_wsum = 0;
for (ind_x = i - size, ind_w = 0; ind_x < n; ind_x++, ind_w++) {
tmp_w = w[ind_w];
tmp_wsum += tmp_w;
tmp_xwsum += x[ind_x] * tmp_w;
}
res[i] = tmp_xwsum / tmp_wsum;
}
return res;
}

A Wild Index Out of Bounds Error Appeared!
You can pinpoint the issue by switching element accessors from [] to (). The latter has a built in bounds check, e.g. is index between 0 and n-1.
Running the code with the built-in check gives:
Error in roll_mean(x, spencer_weights) :
Index out of bounds: [index=7; extent=2].
So, the indices being used are greatly exceeding the length of the vector. Adding a trace statement indicates its the first loop that is wrong.
#include <Rcpp.h>
// [[Rcpp::export]]
NumericVector roll_mean(const NumericVector& x,
const NumericVector& w) {
int n = x.size();
int w_size = w.size();
int size = (w_size - 1) / 2;
Rcpp::Rcout << n << ", w_size: " << w_size << ", size: " << size << std::endl;
NumericVector res(n);
int i, ind_x, ind_w;
double w_sum = Rcpp::sum(w), tmp_wsum, tmp_xwsum, tmp_w;
// beginning
for (i = 0; i < size; i++) {
tmp_xwsum = tmp_wsum = 0;
// Fix this line
for (ind_x = i + size, ind_w = w_size - 1; ind_x >= 0; ind_x--, ind_w--) {
tmp_w = w(ind_w);
Rcpp::Rcout << "Loop at: " << ind_w << std::endl;
tmp_wsum += tmp_w;
tmp_xwsum += x(ind_x) * tmp_w;
}
res(i) = tmp_xwsum / tmp_wsum;
}
Rcpp::Rcout << "success" << std::endl;
return res;
}
And that's all folks!

Related

Low Accuracy of DNN

I've been implementing NN recently based on http://neuralnetworksanddeeplearning.com/. I've made whole algorithm for backprop and SGD almost the same way as author of this book. The problem is that while he gets accuracy around 90 % after one epoch i get 30% after 5 epochs even though i have the same hiperparameters. Do you have any idea what might be the cause ?
Here s my respository.
https://github.com/PiPower/Deep-Neural-Network
Here is part with algorithm for backprop and SGD implemented in Network.cpp:
void Network::Train(MatrixD_Array& TrainingData, MatrixD_Array& TrainingLabels, int BatchSize,int epochs, double LearningRate)
{
assert(TrainingData.size() == TrainingLabels.size() && CostFunc != nullptr && CostFuncDer != nullptr && LearningRate > 0);
std::vector<long unsigned int > indexes;
for (int i = 0; i < TrainingData.size(); i++) indexes.push_back(i);
std::random_device rd;
std::mt19937 g(rd());
std::vector<Matrix<double>> NablaWeights;
std::vector<Matrix<double>> NablaBiases;
NablaWeights.resize(Layers.size());
NablaBiases.resize(Layers.size());
for (int i = 0; i < Layers.size(); i++)
{
NablaWeights[i] = Matrix<double>(Layers[i].GetInDim(), Layers[i].GetOutDim());
NablaBiases[i] = Matrix<double>(1, Layers[i].GetOutDim());
}
//---- Epoch iterating
for (int i = 0; i < epochs; i++)
{
cout << "Epoch number: " << i << endl;
shuffle(indexes.begin(), indexes.end(), g);
// Batch iterating
for (int batch = 0; batch < TrainingData.size(); batch = batch + BatchSize)
{
for (int i = 0; i < Layers.size(); i++)
{
NablaWeights[i].Clear();
NablaBiases[i].Clear();
}
int i = 0;
while( i < BatchSize && (i+batch)< TrainingData.size())
{
std::vector<Matrix<double>> ActivationOutput;
std::vector<Matrix<double>> Z_Output;
ActivationOutput.resize(Layers.size() + 1);
Z_Output.resize(Layers.size());
ActivationOutput[0] = TrainingData[indexes[i + batch]];
int index = 0;
// Pushing values through
for (auto layer : Layers)
{
Z_Output[index] = layer.Mul(ActivationOutput[index]);
ActivationOutput[index + 1] = layer.ApplyActivation(Z_Output[index]);
index++;
}
// ---- Calculating Nabla that will be later devided by batch size element wise
auto DeltaNabla = BackPropagation(ActivationOutput, Z_Output, TrainingLabels[indexes[i + batch]]);
for (int i = 0; i < Layers.size(); i++)
{
NablaWeights[i] = NablaWeights[i] + DeltaNabla.first[i];
NablaBiases[i] = NablaBiases[i] + DeltaNabla.second[i];
}
i++;
}
for (int g = 0; g < Layers.size(); g++)
{
Layers[g].Weights = Layers[g].Weights - NablaWeights[g] * LearningRate;
Layers[g].Biases = Layers[g].Biases - NablaBiases[g] * LearningRate;
}
// std::transform(NablaWeights.begin(), NablaWeights.end(), NablaWeights.begin(),[BatchSize, LearningRate](Matrix<double>& Weight) {return Weight * (LearningRate / BatchSize);});
//std::transform(NablaBiases.begin(), NablaBiases.end(), NablaBiases.begin(), [BatchSize, LearningRate](Matrix<double>& Bias) {return Bias * (LearningRate / BatchSize); });
}
}
}
std::pair<MatrixD_Array, MatrixD_Array> Network::BackPropagation( MatrixD_Array& ActivationOutput, MatrixD_Array& Z_Output,Matrix<double>& label)
{
MatrixD_Array NablaWeight;
MatrixD_Array NablaBias;
NablaWeight.resize(Layers.size());
NablaBias.resize(Layers.size());
auto zs = Layers[Layers.size() - 1].ActivationPrime(Z_Output[Z_Output.size() - 1]);
Matrix<double> Delta_L = Hadamard(CostFuncDer(ActivationOutput[ActivationOutput.size() - 1],label), zs);
NablaWeight[Layers.size() - 1] = Delta_L * ActivationOutput[ActivationOutput.size() - 2].Transpose();
NablaBias[Layers.size() - 1] = Delta_L;
for (int j = 2; j <= Layers.size() ; j++)
{
auto sp = Layers[Layers.size() - j].ActivationPrime(Z_Output[Layers.size() -j]);
Delta_L = Hadamard(Layers[Layers.size() - j+1 ].Weights.Transpose() * Delta_L, sp);
NablaWeight[Layers.size() - j] = Delta_L * ActivationOutput[ActivationOutput.size() -j-1].Transpose();
NablaBias[Layers.size() - j] = Delta_L;
}
return make_pair(NablaWeight, NablaBias);
}
It turned out that mnist loader didnt work correctly.

C++ neural network implemented from scratch cannot get above 50% on MNIST

So I have implemented a fully connected one hidden layer neural network in C++ using Eigen for matrix multiplication. It uses minibatch gradient descent.
However, my model cannot get above 50% accuracy on mnist. I have tried learning rates from between 0.0001 and 10. The model does overfit on training sizes < 100 (with ~90% accuracy which is still pretty bad), albeit extremely slowly.
What might be causing this low accuracy and extremely slow learning? My main concern is that the backpropagation is incorrect. Furthermore, I would prefer not to add any other optimization techniques (learning rate schedule, regularization, etc.).
Feed forward and backprop code:
z1 = (w1 * mbX).colwise() + b1;
a1 = sigmoid(z1);
z2 = (w2 * a1).colwise() + b2;
a2 = sigmoid(z2);
MatrixXd err = ((double) epsilon)/((double) minibatch_size) * ((a2 - mbY).array() * sigmoid_derivative(z2).array()).matrix();
b2 = b2 - err * ones;
w2 = w2 - (err * a1.transpose());
err = ((w2.transpose() * err).array() * sigmoid_derivative(z1).array()).matrix();
b1 = b1 - err * ones;
w1 = w1 - (err * mbX.transpose());
Full program code:
#include <iostream>
#include <fstream>
#include <math.h>
#include <cstdlib>
#include <Eigen/Dense>
#include <vector>
#include <string>
using namespace Eigen;
#define N 30
#define epsilon 0.7
#define epoch 1000
//sizes
const int minibatch_size = 10;
const int training_size = 10000;
const int val_size = 10;
unsigned int num, magic, rows, cols;
//images
unsigned int image[training_size][28][28];
unsigned int val_image[val_size][28][28];
//labels
unsigned int label[training_size];
unsigned int val_label[val_size];
//inputs
MatrixXd X(784, training_size);
MatrixXd Y = MatrixXd::Zero(10, training_size);
//minibatch
MatrixXd mbX(784, minibatch_size);
MatrixXd mbY = MatrixXd::Zero(10, minibatch_size);
//validation
MatrixXd Xv(784, val_size);
MatrixXd Yv = MatrixXd::Zero(10, val_size);
//Image processing courtesy of https://stackoverflow.com/users/11146076/%e5%bc%a0%e4%ba%91%e9%93%ad
unsigned int in(std::ifstream& icin, unsigned int size) {
unsigned int ans = 0;
for (int i = 0; i < size; i++) {
unsigned char x;
icin.read((char*)&x, 1);
unsigned int temp = x;
ans <<= 8;
ans += temp;
}
return ans;
}
void input(std::string ipath, std::string lpath, std::string ipath2, std::string lpath2) {
std::ifstream icin;
//training data
icin.open(ipath, std::ios::binary);
magic = in(icin, 4), num = in(icin, 4), rows = in(icin, 4), cols = in(icin, 4);
for (int i = 0; i < training_size; i++) {
int val = 0;
for (int x = 0; x < rows; x++) {
for (int y = 0; y < cols; y++) {
image[i][x][y] = in(icin, 1);
X(val, i) = image[i][x][y]/255;
val++;
}
}
}
icin.close();
//training labels
icin.open(lpath, std::ios::binary);
magic = in(icin, 4), num = in(icin, 4);
for (int i = 0; i < training_size; i++) {
label[i] = in(icin, 1);
Y(label[i], i) = 1;
}
icin.close();
//validation data
icin.open(ipath2, std::ios::binary);
magic = in(icin, 4), num = in(icin, 4), rows = in(icin, 4), cols = in(icin, 4);
for (int i = 0; i < val_size; i++) {
int val = 0;
for (int x = 0; x < rows; x++) {
for (int y = 0; y < cols; y++) {
val_image[i][x][y] = in(icin, 1);
Xv(val, i) = val_image[i][x][y]/255;
val++;
}
}
}
icin.close();
//validation labels
icin.open(lpath2, std::ios::binary);
magic = in(icin, 4), num = in(icin, 4);
for (int i = 0; i < val_size; i++) {
val_label[i] = in(icin, 1);
Yv(val_label[i], i) = 1;
}
icin.close();
}
//Neural Network calculations
MatrixXd sigmoid(MatrixXd m) {
m *= -1;
return (1/(1 + m.array().exp())).matrix();
}
MatrixXd sigmoid_derivative(MatrixXd m) {
return (sigmoid(m).array() * (1 - sigmoid(m).array())).matrix();
}
//Initialize weights and biases
//hidden layer
VectorXd b1 = MatrixXd::Zero(N, 1);
MatrixXd w1 = MatrixXd::Random(N, 784);
//output
VectorXd b2 = MatrixXd::Zero(10, 1);
MatrixXd w2 = MatrixXd::Random(10, N);
//Initialize intermediate values
MatrixXd z1, z2, a1, a2, z1v, z2v, a1v, a2v;
MatrixXd ones = MatrixXd::Constant(minibatch_size, 1, 1);
int main() {
input("C:\\Users\\Aaron\\Documents\\Test\\train-images-idx3-ubyte\\train-images.idx3-ubyte", "C:\\Users\\Aaron\\Documents\\Test\\train-labels-idx1-ubyte\\train-labels.idx1-ubyte", "C:\\Users\\Aaron\\Documents\\Test\\t10k-images-idx3-ubyte\\t10k-images.idx3-ubyte", "C:\\Users\\Aaron\\Documents\\Test\\t10k-labels-idx1-ubyte\\t10k-labels.idx1-ubyte");
std::cout << "Finished Image Processing" << std::endl;
//std::cout << w1 << std::endl;
std::vector<double> val_ac;
std::vector<double> c;
std::vector<int> order;
for (int i = 0; i < training_size; i++) {
order.push_back(i);
}
for (int i = 0; i < epoch; i++) {
//feed forward
std::random_shuffle(order.begin(), order.end());
for (int j = 0; j < training_size/minibatch_size; j++) {
for (int k = 0; k < minibatch_size; k++) {
int index = order[j * minibatch_size + k];
mbX.col(k) = X.col(index);
mbY.col(k) = Y.col(index);
}
z1 = (w1 * mbX).colwise() + b1;
a1 = sigmoid(z1);
z2 = (w2 * a1).colwise() + b2;
a2 = sigmoid(z2);
MatrixXd err = ((double) epsilon)/((double) minibatch_size) * ((a2 - mbY).array() * sigmoid_derivative(z2).array()).matrix();
//std::cout << err << std::endl;
b2 = b2 - err * ones;
w2 = w2 - (err * a1.transpose());
err = ((w2.transpose() * err).array() * sigmoid_derivative(z1).array()).matrix();
//std::cout << err << std::endl;
b1 = b1 - err * ones;
w1 = w1 - (err * mbX.transpose());
}
//validation
z1 = (w1 * X).colwise() + b1;
a1 = sigmoid(z1);
z2 = (w2 * a1).colwise() + b2;
a2 = sigmoid(z2);
double cost = 1/((double) training_size) * ((a2 - Y).array() * (a2 - Y).array()).matrix().sum();
c.push_back(cost);
int correct = 0;
for (int i = 0; i < training_size; i++) {
double maxP = -1;
int na;
for (int j = 0; j < 10; j++) {
if (a2(j, i) > maxP) {
maxP = a2(j, i);
na = j;
}
}
if (na == label[i]) correct++;
}
val_ac.push_back(((double) correct) / ((double) training_size));
std::cout << "Finished Epoch " << i + 1 << std::endl;
std::cout << "Cost: " << cost << std::endl;
std::cout << "Accuracy: " << ((double) correct) / ((double) training_size) << std::endl;
}
//plot accuracy
FILE * gp = _popen("gnuplot", "w");
fprintf(gp, "set terminal wxt size 600,400 \n");
fprintf(gp, "set grid \n");
fprintf(gp, "set title '%s' \n", "NN");
fprintf(gp, "plot '-' w line, '-' w lines \n");
for (int i = 0; i < epoch; i++) {
fprintf(gp, "%f %f \n", i + 1.0, c[i]);
}
fprintf(gp, "e\n");
//validation accuracy
for (int i = 0; i < epoch; i++) {
fprintf(gp, "%f %f \n", i + 1.0, val_ac[i]);
}
fprintf(gp, "e\n");
fflush(gp);
system("pause");
_pclose(gp);
return 0;
}
UPD
Here is a graph of the accuracy on the training dataset (green) and the loss (purple)
https://i.stack.imgur.com/Ya2yR.png
Here is a graph of the loss for the training data and validation data:
https://imgur.com/a/4gmFCrk
The loss of the validation data is increasing past a certain point, which shows signs of overfitting. However, the accuracy still remains abysmal even on the training data.
unsigned int val_image[val_size][28][28];
Xv(val, i) = val_image[i][x][y]/255;
Can you try again with Xv(val, i) = val_image[i][x][y] / 255.0;
There too:
X(val, i) = image[i][x][y]/255;
With the code as written, Xv is 0 very often, and 1, when the image as value 255. With a floating point division, you'll get value between 0.0 and 1.0.
You'll need to check your code for other places where you may be dividing integers.
N.b.: In C++, 240/255 is 0.

How to select threshold values automatically uses the peaks of the histogram?

By the OpenCV library, I want to threshold an image like this:
threshold(image, thresh, 220, 255, THRESH_BINARY_INV)
But I want to automatically find the threshold value (220).
I use Otsu to estimate the threshold. But it doesn't work in my case.
therefore, I should use Histogram Peak Technique. I want to find the two peaks in the histogram corresponding to the background and object of the image. It sets the threshold value automatically halfway between the two peaks.
I use this book (pages: 117 and 496-505): "Image Processing in C" by Dwayne Phillips (http://homepages.inf.ed.ac.uk/rbf/BOOKS/PHILLIPS/). And I use source code for find the two peaks in the histogram corresponding to the background and object of the image. this is my image:
this is my c++ code:
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/opencv.hpp>
#include <iostream>
#include <stdio.h>
#include <fstream>
using namespace std;
using namespace cv;
int main()
{
Mat image0 = imread("C:/Users/Alireza/Desktop/contrast950318/2.bmp");
imshow("image0", image0);
Mat image, thresh, Tafrigh;
cvtColor(image0, image, CV_RGB2GRAY);
int N = image.rows*image.cols;
int histogram[256];
for (int i = 0; i < 256; i++) {
histogram[i] = 0;
}
//create histo
for (int i = 0; i < image.rows; i++){
for (int j = 0; j < image.cols; j++){
histogram[((int)image.at<uchar>(i, j))]++;
}
}
int peak1, peak2;
#define PEAKS 30
int distance[PEAKS], peaks[PEAKS][2];
int i, j = 0, max = 0, max_place = 0;
for (int i = 0; i<PEAKS; i++){
distance[i] = 0;
peaks[i][0] = -1;
peaks[i][1] = -1;
}
for (i = 0; i <= 255; i++){
max = histogram[i];
max_place = i;
//insert_into_peaks(peaks, max, max_place);
//int max, max_place, peaks[PEAKS][2];
//int i, j;
/* first case */
if (max > peaks[0][0]){
for (i = PEAKS - 1; i > 0; i--){
peaks[i][0] = peaks[i - 1][0];
peaks[i][1] = peaks[i - 1][1];
}
peaks[0][0] = max;
peaks[0][1] = max_place;
} /* ends if */
/* middle cases */
for (j = 0; j < PEAKS - 3; j++){
if (max < peaks[j][0] && max > peaks[j + 1][0]){
for (i = PEAKS - 1; i > j + 1; i--){
peaks[i][0] = peaks[i - 1][0];
peaks[i][1] = peaks[i - 1][1];
}
peaks[j + 1][0] = max;
peaks[j + 1][1] = max_place;
} /* ends if */
} /* ends loop over j */
/* last case */
if (max < peaks[PEAKS - 2][0] &&
max > peaks[PEAKS - 1][0]){
peaks[PEAKS - 1][0] = max;
peaks[PEAKS - 1][1] = max_place;
} /* ends if */
}/* ends loop over i */
for (int i = 1; i<PEAKS; i++){
distance[i] = peaks[0][1] - peaks[i][1];
if (distance[i] < 0)
distance[i] = distance[i] * (-1);
}
peak1 = peaks[0][1];
cout << " peak1= " << peak1;
for (int i = PEAKS - 1; i > 0; i--){
if (distance[i] > 1)
peak2 = peaks[i][1];
}
cout << " peak2= " << peak2;
int mid_point;
//int peak1, peak2;
short hi, low;
unsigned long sum1 = 0, sum2 = 0;
if (peak1 > peak2)
mid_point = ((peak1 - peak2) / 2) + peak2;
if (peak1 < peak2)
mid_point = ((peak2 - peak1) / 2) + peak1;
for (int i = 0; i<mid_point; i++)
sum1 = sum1 + histogram[i];
for (int i = mid_point; i <= 255; i++)
sum2 = sum2 + histogram[i];
if (sum1 >= sum2){
low = mid_point;
hi = 255;
}
else{
low = 0;
hi = mid_point;
}
cout << " low= " << low << " hi= " << hi;
double threshnum = 0.5* (low + hi);
threshold(image, thresh, threshnum, hi, THRESH_BINARY_INV);
waitKey(0);
return 0;
}
But I don't know this code correct is or not. If it correct, why is threshold value 202?
What ideas on how to solve this task would you suggest? Or on what resource on the internet can I find help?
You can use also the Max Entropy. In some cases using only the high frequency of the entropy could be better
int maxentropie(const cv::Mat1b& src)
{
// Histogram
cv::Mat1d hist(1, 256, 0.0);
for (int r=0; r<src.rows; ++r)
for (int c=0; c<src.cols; ++c)
hist(src(r,c))++;
// Normalize
hist /= double(src.rows * src.cols);
// Cumulative histogram
cv::Mat1d cumhist(1, 256, 0.0);
float sum = 0;
for (int i = 0; i < 256; ++i)
{
sum += hist(i);
cumhist(i) = sum;
}
cv::Mat1d hl(1, 256, 0.0);
cv::Mat1d hh(1, 256, 0.0);
for (int t = 0; t < 256; ++t)
{
// low range entropy
double cl = cumhist(t);
if (cl > 0)
{
for (int i = 0; i <= t; ++i)
{
if (hist(i) > 0)
{
hl(t) = hl(t) - (hist(i) / cl) * log(hist(i) / cl);
}
}
}
// high range entropy
double ch = 1.0 - cl; // constraint cl + ch = 1
if (ch > 0)
{
for (int i = t+1; i < 256; ++i)
{
if (hist(i) > 0)
{
hh(t) = hh(t) - (hist(i) / ch) * log(hist(i) / ch);
}
}
}
}
// choose best threshold
cv::Mat1d entropie(1, 256, 0.0);
double h_max = hl(0) + hh(0);
int threshold = 0;
entropie(0) = h_max;
for (int t = 1; t < 256; ++t)
{
entropie(t) = hl(t) + hh(t);
if (entropie(t) > h_max)
{
h_max = entropie(t);
threshold = uchar(t);
}
}
if(threshold==0) threshold=255;
return threshold;
}

Gradient descent converging towards the wrong value

I'm trying to implement a gradient descent algorithm in C++. Here's the code I have so far :
#include <iostream>
double X[] {163,169,158,158,161,172,156,161,154,145};
double Y[] {52, 68, 49, 73, 71, 99, 50, 82, 56, 46 };
double m, p;
int n = sizeof(X)/sizeof(X[0]);
int main(void) {
double alpha = 0.00004; // 0.00007;
m = (Y[1] - Y[0]) / (X[1] - X[0]);
p = Y[0] - m * X[0];
for (int i = 1; i <= 8; i++) {
gradientStep(alpha);
}
return 0;
}
double Loss_function(void) {
double res = 0;
double tmp;
for (int i = 0; i < n; i++) {
tmp = Y[i] - m * X[i] - p;
res += tmp * tmp;
}
return res / 2.0 / (double)n;
}
void gradientStep(double alpha) {
double pg = 0, mg = 0;
for (int i = 0; i < n; i++) {
pg += Y[i] - m * X[i] - p;
mg += X[i] * (Y[i] - m * X[i] - p);
}
p += alpha * pg / n;
m += alpha * mg / n;
}
This code converges towards m = 2.79822, p = -382.666, and an error of 102.88. But if I use my calculator to find out the correct linear regression model, I find that the correct values of m and p should respectively be 1.601 and -191.1.
I also noticed that the algorithm won't converge for alpha > 0.00007, which seems quite low, and the value of p barely changes during the 8 iterations (or even after 2000 iterations).
What's wrong with my code?
Here's a good overview of the algorithm I'm trying to implement. The values of theta0 and theta1 are called p and m in my program.
Other implementation in python
More about the algorithm
This link gives a comprehensive view of the algorithm; it turns out I was following a completely wrong approach.
The following code does not work properly (and I have no plans to work on it further), but should put on track anyone who's confronted to the same problem as me :
#include <vector>
#include <iostream>
typedef std::vector<double> vect;
std::vector<double> y, omega(2, 0), omega2(2, 0);;
std::vector<std::vector<double>> X;
int n = 10;
int main(void) {
/* Initialize x so that each members contains (1, x_i) */
/* Initialize x so that each members contains y_i */
double alpha = 0.00001;
display();
for (int i = 1; i <= 8; i++) {
gradientStep(alpha);
display();
}
return 0;
}
double f_function(const std::vector<double> &x) {
double c;
for (unsigned int i = 0; i < omega.size(); i++) {
c += omega[i] * x[i];
}
return c;
}
void gradientStep(double alpha) {
for (int i = 0; i < n; i++) {
for (unsigned int j = 0; j < X[0].size(); j++) {
omega2[j] -= alpha/(double)n * (f_function(X[i]) - y[i]) * X[i][j];
}
}
omega = omega2;
}
void display(void) {
double res = 0, tmp = 0;
for (int i = 0; i < n; i++) {
tmp = y[i] - f_function(X[i]);
res += tmp * tmp; // Loss functionn
}
std::cout << "omega = ";
for (unsigned int i = 0; i < omega.size(); i++) {
std::cout << "[" << omega[i] << "] ";
}
std::cout << "\tError : " << res * .5/(double)n << std::endl;
}

why vector use less memory than pointers in this code?

I wrote paralell program based on a Strassen multiplication algorithm using pointers.
this program return the result of multiplication of two matrices that are the same size.
when the size is 256 , program fill about 1 GB of ram, and when it is 512 ram total\y become full and my windows doesn't work then I must restart.
I replace whole pointers with vectors then incredibly Ram usage decreased!.for 1024 size , just 80 MB of ram used.
I know a little about vector that is bound statically at first then if we need more space during runtime its bound dynamically.
Why pointers needed more space than vectors ?
this is my first code :
#include <iostream>
#include<cilk\cilk.h>
#include <cilk/cilk_api.h>
#include<conio.h>
#include<ctime>
#include<string>
#include<random>
#include <Windows.h>
#include <Psapi.h>
#include<vector>
using namespace std;
int ** matrix_1;
int ** matrix_2;
#define number_thread:4;
void show(string name, int n, int **show)
{
cout << " matrix " << name << " :" << endl;
for (int i = 0; i < n; i++)
{
for (int j = 0; j < n; j++)
cout << show[i][j] << " ";
cout << endl;
}
}
int ** strassen(int n, int **matrix_a, int ** matrix_b)
{
int ** A11;
int ** A12;
int ** A21;
int ** A22;
int ** B11;
int ** B12;
int ** B21;
int ** B22;
int ** result;
int **m1, **m2, **m3, ** m4, ** m5, ** m6, ** m7, ** m8;
A11 = new int*[n / 2];
A12 = new int*[n / 2];
A21 = new int*[n / 2];
A22 = new int*[n / 2];
B11 = new int*[n / 2];
B12 = new int*[n / 2];
B21 = new int*[n / 2];
B22 = new int*[n / 2];
result = new int *[n];
m1 = new int*[n / 2];
m2 = new int*[n / 2];
m3 = new int*[n / 2];
m4 = new int*[n / 2];
m5 = new int*[n / 2];
m6 = new int*[n / 2];
m7 = new int*[n / 2];
m8 = new int*[n / 2];
cilk_for(int i = 0; i < n / 2; i++)
{
//cout << " value i : " << i << endl;
A11[i] = new int[n / 2];
A12[i] = new int[n / 2];
A21[i] = new int[n / 2];
A22[i] = new int[n / 2];
B11[i] = new int[n / 2];
B12[i] = new int[n / 2];
B21[i] = new int[n / 2];
B22[i] = new int[n / 2];
m1[i] = new int[n / 2];
m2[i] = new int[n / 2];
m3[i] = new int[n / 2];
m4[i] = new int[n / 2];
m5[i] = new int[n / 2];
m6[i] = new int[n / 2];
m7[i] = new int[n / 2];
m8[i] = new int[n / 2];
}
cilk_for(int i = 0; i < n; i++) // matrix result
result[i] = new int[n];
if (n == 2)
{
result[0][0] = matrix_a[0][0] * matrix_b[0][0] + matrix_a[0][1] * matrix_b[1][0];
result[0][1] = matrix_a[0][0] * matrix_b[0][1] + matrix_a[0][1] * matrix_b[1][1];
result[1][0] = matrix_a[1][0] * matrix_b[0][0] + matrix_a[1][1] * matrix_b[1][0];
result[1][1] = matrix_a[1][0] * matrix_b[0][1] + matrix_a[1][1] * matrix_b[1][1];
return result;
}
// for (int i = 0; i < n;i++)
cilk_for(int i = 0; i < (n / 2); i++)
{
for (int j = 0; j < (n / 2); j++)
{
A11[i][j] = matrix_a[i][j];
B11[i][j] = matrix_b[i][j];
A12[i][j] = matrix_a[i][j + n / 2];
B12[i][j] = matrix_b[i][j + n / 2];
A21[i][j] = matrix_a[i + n / 2][j];
B21[i][j] = matrix_b[i + n / 2][j];
A22[i][j] = matrix_a[i + n / 2][j + n / 2];
B22[i][j] = matrix_b[i + n / 2][j + n / 2];
}
}
/*
show("A11", n / 2, A11);
show("A12", n / 2, A12);
show("A21", n / 2, A21);
show("A22", n / 2, A22);
show("B11", n / 2, B11);
show("B12", n / 2, B12);
show("B21", n / 2, B21);
show("B22", n / 2, B22);*/
// Run By eight_thread
m1 = cilk_spawn(strassen(n / 2, A11, B11));// A11B11
m2 = cilk_spawn(strassen(n / 2, A12, B21));// A12B21
m3 = cilk_spawn(strassen(n / 2, A11, B12));// A11B12
m4 = cilk_spawn(strassen(n / 2, A12, B22));// A12B22
m5 = cilk_spawn(strassen(n / 2, A21, B11));// A21B11
m6 = cilk_spawn(strassen(n / 2, A22, B21));// A22B21
m7 = cilk_spawn(strassen(n / 2, A21, B12));// A21B12
m8 = cilk_spawn(strassen(n / 2, A22, B22));// A22B22
cilk_sync;
/*
cout << "****************************\n";
cout << "*********** before add :\n";
show("m1", n / 2, m1);
show("m2", n / 2, m2);
show("m3", n / 2, m3);
show("m4", n / 2, m4);
show("m5", n / 2, m5);
show("m6", n / 2, m6);
show("m7", n / 2, m7);
show("m8", n / 2, m8);*/
cilk_for(int i = 0; i < n / 2; i++)
for (int j = 0; j < n / 2; j++)
{
m1[i][j] = m1[i][j] + m2[i][j];
m3[i][j] = m3[i][j] + m4[i][j];
m5[i][j] = m5[i][j] + m6[i][j];
m7[i][j] = m7[i][j] + m8[i][j];
}
/*cout << "after adding hello \n";
show("m1", n / 2, m1);
show("m3", n / 2, m3);
show("m5", n / 2, m5);
show("m7", n / 2, m7);*/
cilk_for(int i = 0; i < n; i++)
{
for (int j = 0; j < n; j++)
{
if (i < n / 2 && j < n / 2)
{
result[i][j] = m1[i][j];
}
else if (i < n / 2 && j >= n / 2)
{
result[i][j] = m3[i][j - n / 2];
}
else if (i >= n / 2 && j < n / 2)
{
result[i][j] = m5[i - n / 2][j];
}
else if (i >= n / 2 && j >= n / 2)
{
result[i][j] = m7[i - n / 2][j - n / 2];
}
}
}
/*
cilk_for(int i = 0; i < n / 2; i++)
{
for (int j = 0; j < n / 2; j++)
{
delete A11[i][j];
delete A12[i][j];
delete A21[i][j];
delete A22[i][j];
delete B11[i][j];
delete B12[i][j];
delete B21[i][j];
delete B22[i][j];
delete m1[i][j];
delete m2[i][j];
delete m3[i][j];
delete m4[i][j];
delete m5[i][j];
delete m6[i][j];
delete m7[i][j];
delete m8[i][j];*/
/* }
delete[] A11[i];
delete[] A12[i];
delete[] A21[i];
delete[] A22[i];
delete[] B11[i];
delete[] B12[i];
delete[] B21[i];
delete[] B22[i];
delete[] m1[i];
delete[] m2[i];
delete[] m3[i];
delete[] m4[i];
delete[] m5[i];
delete[] m6[i];
delete[] m7[i];
delete[] m8[i];
}*/
delete[] A11;
delete[] A12;
delete[] A21;
delete[] A22;
delete[] B11;
delete[] B12;
delete[] B21;
delete[] B22;
delete[] m1;
delete[] m2;
delete[] m3;
delete[] m4;
delete[] m5;
delete[] m6;
delete[] m7;
delete[] m8;
return result;
}
int main()
{
int size;
freopen("in.txt", "r", stdin);
freopen("out.txt", "w", stdout);
__cilkrts_set_param("nworkers", "4");
//cout << " please Enter the size OF ur matrix /n";
cin >> size;
matrix_1 = new int*[size];
matrix_2 = new int*[size];
if (size % 2 == 0)
{
//instialize matrix1
//cout << "matrix_1 :" << endl;
for (int i = 0; i < size; i++)
{
matrix_1[i] = new int[size];
for (int j = 0; j < size; j++)
{
matrix_1[i][j] = rand() % 3;
//cin >> matrix_1[i][j];
//cout << matrix_1[i][j] << " ";
}
//cout << endl;
}
//instialize matrix2
//cout << "matrix2_is :\n";
for (int i = 0; i < size; i++)
{
matrix_2[i] = new int[size];
for (int j = 0; j < size; j++)
{
matrix_2[i][j] = rand() % 3;
//cout << matrix_2[i][j]<<" ";
//cin >> matrix_2[i][j];
}
// cout << endl;
}
clock_t begin = clock();
matrix_2 = strassen(size, matrix_1, matrix_2);
clock_t end = clock();
double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
cout << "*******\ntime is : " << elapsed_secs << endl;
//answer:
/* for (int i = 0; i < size; i++)
{
for (int j = 0; j < size; j++)
{
cout<< matrix_2[i][j]<<" ";
}
cout << endl;
}*/
}
else
cout << " we couldnt use strasen ";
cout << "\nTotal Virtual Memory:" << endl;
MEMORYSTATUSEX memInfo;
memInfo.dwLength = sizeof(MEMORYSTATUSEX);
GlobalMemoryStatusEx(&memInfo);
DWORDLONG totalVirtualMem = memInfo.ullTotalPageFile;
printf("%u", totalVirtualMem);
cout << "\nVirtual Memory currently used:" << endl;
// MEMORYSTATUSEX memInfo;
memInfo.dwLength = sizeof(MEMORYSTATUSEX);
GlobalMemoryStatusEx(&memInfo);
DWORDLONG virtualMemUsed = memInfo.ullTotalPageFile - memInfo.ullAvailPageFile;
printf("%u", virtualMemUsed);
cout << "\nVirtual Memory currently used by current process:" << endl;
PROCESS_MEMORY_COUNTERS_EX pmc;
GetProcessMemoryInfo(GetCurrentProcess(), (PROCESS_MEMORY_COUNTERS*)&pmc, sizeof(pmc));
SIZE_T virtualMemUsedByMe = pmc.PrivateUsage;
printf("%u", virtualMemUsedByMe);
cout << "\nPhysical Memory currently used: " << endl;
//MEMORYSTATUSEX memInfo;
memInfo.dwLength = sizeof(MEMORYSTATUSEX);
GlobalMemoryStatusEx(&memInfo);
DWORDLONG physMemUsed = memInfo.ullTotalPhys - memInfo.ullAvailPhys;
printf("%u", physMemUsed);
cout << endl;
cout << "\nPhysical Memory currently used by current process : " << endl;
// PROCESS_MEMORY_COUNTERS_EX pmc;
GetProcessMemoryInfo(GetCurrentProcess(), (PROCESS_MEMORY_COUNTERS*)&pmc, sizeof(pmc));
SIZE_T physMemUsedByMe = pmc.WorkingSetSize;
printf("%u", physMemUsedByMe);
//cout << "memory usage :"<<double(totalVirtualMem) << endl;
//_getch();
return 0;
}
I replace whole pointers array with vectors :
#include <iostream>
#include<cilk\cilk.h>
#include <cilk/cilk_api.h>
#include<conio.h>
#include<ctime>
#include<string>
#include<random>
#include <Windows.h>
#include <Psapi.h>
#include<vector>
using namespace std;
vector<vector<int> > matrix_1, matrix_2;
//int matrix_1;
//int ** matrix_2;
#define number_thread:4;
void show(string name ,int n, int **show)
{
cout << " matrix " << name<<" :" << endl;
for (int i = 0; i < n; i++)
{
for (int j = 0; j < n; j++)
cout << show[i][j] << " ";
cout << endl;
}
}
vector<vector<int>> strassen(int n, vector<vector<int>> matrix_a, vector<vector<int>> matrix_b)
{
vector<vector<int>> A11;
vector<vector<int>> A12;
vector<vector<int>> A21;
vector<vector<int>> A22;
vector<vector<int>> B11;
vector<vector<int>> B12;
vector<vector<int>> B21;
vector<vector<int>> B22;
vector<vector<int>> result;
vector<int> help;
vector<vector<int>> m1, m2, m3, m4, m5, m6, m7, m8;
help.clear();
for (int j = 0; j < n / 2; j++)
{
help.push_back(2);
}
for(int i = 0; i < n / 2; i++)
{
A11.push_back(help);
A12.push_back(help);
A21.push_back(help);
A22.push_back(help);
B11.push_back(help);
B12.push_back(help);
B21.push_back(help);
B22.push_back(help);
m1.push_back(help);
m2.push_back(help);
m3.push_back(help);
m4.push_back(help);
m5.push_back(help);
m6.push_back(help);
m7.push_back(help);
m8.push_back(help);
}
for (int j = 0; j < n / 2; j++)
help.push_back(2);
for(int i = 0; i < n; i++)
{
result.push_back(help);
}
if (n == 2)
{
result[0][0] = matrix_a[0][0] * matrix_b[0][0] + matrix_a[0][1] * matrix_b[1][0];
result[0][1] = matrix_a[0][0] * matrix_b[0][1] + matrix_a[0][1] * matrix_b[1][1];
result[1][0] = matrix_a[1][0] * matrix_b[0][0] + matrix_a[1][1] * matrix_b[1][0];
result[1][1] = matrix_a[1][0] * matrix_b[0][1] + matrix_a[1][1] * matrix_b[1][1];
return result;
}
// for (int i = 0; i < n;i++)
for(int i = 0; i < (n / 2); i++)
{
for(int j = 0; j <( n / 2); j++)
{
A11[i][j] = matrix_a[i][j];
B11[i][j] = matrix_b[i][j];
A12[i][j] = matrix_a[i][j + n / 2];
B12[i][j] = matrix_b[i][j + n / 2];
A21[i][j] = matrix_a[i + n / 2][j];
B21[i][j] = matrix_b[i + n / 2][j];
A22[i][j] = matrix_a[i + n / 2][j + n / 2];
B22[i][j] = matrix_b[i + n / 2][j + n / 2];
}
}
/*
show("A11", n / 2, A11);
show("A12", n / 2, A12);
show("A21", n / 2, A21);
show("A22", n / 2, A22);
show("B11", n / 2, B11);
show("B12", n / 2, B12);
show("B21", n / 2, B21);
show("B22", n / 2, B22);*/
// Run By eight_thread
m1 = cilk_spawn(strassen(n / 2, A11, B11));// A11B11
m2 = cilk_spawn(strassen(n / 2, A12, B21));// A12B21
m3 = cilk_spawn(strassen(n / 2, A11, B12));// A11B12
m4 = cilk_spawn(strassen(n / 2, A12, B22));// A12B22
m5 = cilk_spawn(strassen(n / 2, A21, B11));// A21B11
m6 = cilk_spawn(strassen(n / 2, A22, B21));// A22B21
m7 = cilk_spawn(strassen(n / 2, A21, B12));// A21B12
m8 = cilk_spawn(strassen(n / 2, A22, B22));// A22B22
cilk_sync;
/*
cout << "****************************\n";
cout << "*********** before add :\n";
show("m1", n / 2, m1);
show("m2", n / 2, m2);
show
("m3", n / 2, m3);
show("m4", n / 2, m4);
show("m5", n / 2, m5);
show("m6", n / 2, m6);
show("m7", n / 2, m7);
show("m8", n / 2, m8);*/
for(int i = 0; i < n / 2; i++)
for (int j = 0; j < n / 2; j++)
{
m1[i][j] = m1[i][j] + m2[i][j];
m3[i][j] = m3[i][j] + m4[i][j];
m5[i][j] = m5[i][j] + m6[i][j];
m7[i][j] = m7[i][j] + m8[i][j];
}
/*cout << "after adding hello \n";
show("m1", n / 2, m1);
show("m3", n / 2, m3);
show("m5", n / 2, m5);
show("m7", n / 2, m7);*/
for(int i = 0; i < n ; i++)
{
for(int j = 0; j < n ; j++)
{
if (i < n / 2 && j < n / 2)
{
result[i][j] = m1[i][j];
}
else if (i < n / 2 && j >= n / 2)
{
result[i][j] = m3[i][j - n / 2];
}
else if (i >= n / 2 && j < n / 2)
{
result[i][j] = m5[i - n / 2][j];
}
else if (i >= n / 2 && j >= n / 2)
{
result[i][j] = m7[i - n / 2][j - n / 2];
}
}
}
/*
cilk_for(int i = 0; i < n / 2; i++)
{
for (int j = 0; j < n / 2; j++)
{
delete A11[i][j];
delete A12[i][j];
delete A21[i][j];
delete A22[i][j];
delete B11[i][j];
delete B12[i][j];
delete B21[i][j];
delete B22[i][j];
delete m1[i][j];
delete m2[i][j];
delete m3[i][j];
delete m4[i][j];
delete m5[i][j];
delete m6[i][j];
delete m7[i][j];
delete m8[i][j];*/
/* }
delete[] A11[i];
delete[] A12[i];
delete[] A21[i];
delete[] A22[i];
delete[] B11[i];
delete[] B12[i];
delete[] B21[i];
delete[] B22[i];
delete[] m1[i];
delete[] m2[i];
delete[] m3[i];
delete[] m4[i];
delete[] m5[i];
delete[] m6[i];
delete[] m7[i];
delete[] m8[i];
}*/
/* for (int i = 0; i < n; i++)
{
for (int j = 0; j < n; j++)
{
cout << result[i][j] << " ";
}
cout << endl;
}*/
return result;
}
int main()
{
int size;
freopen("in.txt","r",stdin);
freopen("out.txt", "w", stdout);
__cilkrts_set_param("nworkers", "1");
//cout << " please Enter the size OF ur matrix /n";
cin >> size;
vector<int> inner;
if (size % 2 == 0)
{
//instialize matrix1
cout << "matrix_1 :" << endl;
for (int i = 0; i < size; i++)
{
inner.clear();
for (int j = 0; j < size; j++)
{
inner.push_back(rand()%3);
//cin >> matrix_1[i][j];
cout << inner[j]<<" ";
}
cout << endl;
matrix_1.push_back(inner);
}
//instialize matrix2
cout << "matrix2_is :\n";
inner.clear();
for (int i = 0; i < size; i++)
{
inner.clear();
//matrix_2[i] = new int[size];
for (int j = 0; j < size; j++)
{
inner.push_back(rand()%3);
cout << inner[j]<<" ";
//cin >> matrix_2[i][j];
}
cout << endl;
matrix_2.push_back(inner);
}
clock_t begin = clock();
matrix_2 = strassen(size, matrix_1, matrix_2);
clock_t end = clock();
double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
cout << "*******\ntime is : " << elapsed_secs << endl;
//answer:
cout << "answerrr :" << endl;
for (int i = 0; i < size; i++)
{
for (int j = 0; j < size; j++)
{
cout<< matrix_2[i][j]<<" ";
}
cout << endl;
}
}
else
cout << " we couldnt use strasen ";
cout << "\nTotal Virtual Memory:" << endl;
MEMORYSTATUSEX memInfo;
memInfo.dwLength = sizeof(MEMORYSTATUSEX);
GlobalMemoryStatusEx(&memInfo);
DWORDLONG totalVirtualMem = memInfo.ullTotalPageFile;
printf("%u", totalVirtualMem);
cout << "\nVirtual Memory currently used:" << endl;
// MEMORYSTATUSEX memInfo;
memInfo.dwLength = sizeof(MEMORYSTATUSEX);
GlobalMemoryStatusEx(&memInfo);
DWORDLONG virtualMemUsed = memInfo.ullTotalPageFile - memInfo.ullAvailPageFile;
printf("%u", virtualMemUsed);
cout << "\nVirtual Memory currently used by current process:" << endl;
PROCESS_MEMORY_COUNTERS_EX pmc;
GetProcessMemoryInfo(GetCurrentProcess(), (PROCESS_MEMORY_COUNTERS*)&pmc, sizeof(pmc));
SIZE_T virtualMemUsedByMe = pmc.PrivateUsage;
printf("%u", virtualMemUsedByMe);
cout << "\nPhysical Memory currently used: " << endl;
//MEMORYSTATUSEX memInfo;
memInfo.dwLength = sizeof(MEMORYSTATUSEX);
GlobalMemoryStatusEx(&memInfo);
DWORDLONG physMemUsed = memInfo.ullTotalPhys - memInfo.ullAvailPhys;
printf("%u", physMemUsed);
cout << endl;
cout << "\nPhysical Memory currently used by current process : " << endl;
// PROCESS_MEMORY_COUNTERS_EX pmc;
GetProcessMemoryInfo(GetCurrentProcess(), (PROCESS_MEMORY_COUNTERS*)&pmc, sizeof(pmc));
SIZE_T physMemUsedByMe = pmc.WorkingSetSize;
printf("%u", physMemUsedByMe);
//cout << "memory usage :"<<double(totalVirtualMem) << endl;
//_getch();
return 0;
}
Two likely reasons come to mind:
If you allocate memory manually and don't free it correctly you create memory leaks. With raw pointers this is much more likely to happen than with vectors.
If you allocate 1000 integers in 1000 separate allocations it will take much more space than allocating a single block of 1000 integers (what vectors do). Each allocation requires some additional memory for bookkeeping.
I am going to guess this is an allocation issue. Allocation from the OS seems to be quite time consuming from what I have seen.
Just a guess but maybe the std::vector default allocator is grabbing a much larger contiguous block of memory from the OS and is drawing from that to satisfy smaller vector allocations?
This answer may provide some insight:
https://stackoverflow.com/a/29659791/3807729
I managed to reduce the time taken to run a test program simply by allocating, then deallocating a large std::vector before running the timing operations.
I am speculating that the C++ runtime system (in some implementations) may hold on to memory it has received from the OS even after it has been deallocated because grabbing small chunks from the OS each time is much more expensive.