I have a program that solves generally for 1D brownian motion using an Euler's Method.
Being a stochastic process, I want to average it over many particles. But I find that as I ramp up the number of particles, it overloads and i get the std::badalloc error, which I understand is a memory error.
Here is my full code
#include <iostream>
#include <vector>
#include <fstream>
#include <cmath>
#include <cstdlib>
#include <limits>
#include <ctime>
using namespace std;
// Box-Muller Method to generate gaussian numbers
double generateGaussianNoise(double mu, double sigma) {
const double epsilon = std::numeric_limits<double>::min();
const double tau = 2.0 * 3.14159265358979323846;
static double z0, z1;
static bool generate;
generate = !generate;
if (!generate) return z1 * sigma + mu;
double u1, u2;
do {
u1 = rand() * (1.0 / RAND_MAX);
u2 = rand() * (1.0 / RAND_MAX);
} while (u1 <= epsilon);
z0 = sqrt(-2.0 * log(u1)) * cos(tau * u2);
z1 = sqrt(-2.0 * log(u1)) * sin(tau * u2);
return z0 * sigma + mu;
int main() {
// Initialize Variables
double gg; // Gaussian Number Picked from distribution
// Integrator
double t0 = 0; // Setting the Time Window
double tf = 10;
double n = 5000; // Number of Steps
double h = (tf - t0) / n; // Time Step Size
// Set Constants
const double pii = atan(1) * 4; // pi
const double eta = 1; // viscous constant
const double m = 1; // mass
const double aa = 1; // radius
const double Temp = 30; // Temperature in Kelvins
const double KB = 1; // Boltzmann Constant
const double alpha = (6 * pii * eta * aa);
// More Constants
const double mu = 0; // Gaussian Mean
const double sigma = 1; // Gaussian Std Deviation
const double ng = n; // No. of pts to generate for Gauss distribution
const double npart = 1000; // No. of Particles
// Initial Conditions
double x0 = 0;
double y0 = 0;
double t = t0;
// Vectors
vector<double> storX; // Vector that keeps displacement values
vector<double> storY; // Vector that keeps velocity values
vector<double> storT; // Vector to store time
vector<double> storeGaussian; // Vector to store Gaussian numbers generated
vector<double> holder; // Placeholder Vector for calculation operations
vector<double> mainstore; // Vector that holds the final value desired
// Prepares mainstore
for (int z = 0; z < (n+1); z++) {
for (int NN = 0; NN < npart; NN++) {
// Prepares holder
for (int z = 0; z < (n+1); z++) {
// Gaussian Generator
for (double iiii = 0; iiii < ng; iiii++) {
gg = generateGaussianNoise(0, 1); // generateGaussianNoise(mu,sigma)
// Solver
for (int ii = 0; ii < n; ii++) {
storY[ii + 1] =
storY[ii] - (alpha / m) * storY[ii] * h +
(sqrt(2 * alpha * KB * Temp) / m) * sqrt(h) * storeGaussian[ii];
storX[ii + 1] = storX[ii] + storY[ii] * h;
holder[ii + 1] =
pow(storX[ii + 1], 2); // Finds the displacement squared
t = t + h;
// Updates the Main Storage
for (int z = 0; z < storX.size(); z++) {
mainstore[z] = mainstore[z] + holder[z];
// Average over the number of particles
for (int z = 0; z < storX.size(); z++) {
mainstore[z] = mainstore[z] / (npart);
// Outputs the data
ofstream fout("LangevinEulerTest.txt");
for (int jj = 0; jj < storX.size(); jj++) {
fout << storT[jj] << '\t' << mainstore[jj] << '\t' << storX[jj] << endl;
return 0;
As you can see, npart is the variable that I change to vary the number of particles. But after each iteration, I do clear my storage vectors like storX,storY... So on paper, the number of particles should not affect memory? I am only just calling the compiler to repeat many more times, and add onto the main storage vector mainstore. I am running my code on a computer with 4GB ram.
Would greatly appreciate it if anyone could point out my errors in logic or suggest improvements.
Edit: Currently the number of particles is set to npart = 1000.
So when I try to ramp it up to like npart = 20000 or npart = 50000, it gives me memory errors.
Edit2 I've edited the code to allocate an extra index to each of the storage vectors. But it does not seem to fix the memory overflow
There is an out of bounds exception in the solver part. storY has size n and you access ii+1 where i goes up to n-1. So for your code provided. storY has size 5000. It is allowed to access with indices between 0 and 4999 (including) but you try to access with index 5000. The same for storX, holder and mainstore.
Also, storeGaussian does not get cleared before adding new variables. It grows by n for each npart loop. You access only the first n values of it in the solver part anyway.
Please note, that vector::clear removes all elements from the vector, but does not necessarily change the vector's capacity (i.e. it's storage array), see the documentation.
This won't cause the problem here, because you'll reuse the same array in the next runs, but it's something to be aware when using vectors.
I am currently implementing Clenshaw's algorithm with Rcpp to speed up my previous implementation in R. My current implementation is as follows (note that I am using RcppParallel for other functions defined in the same source file; RcppParalell is not used in this specific function, but I've left the headers in case this is somehow relevant):
#include <Rcpp.h>
#include <RcppParallel.h>
using namespace Rcpp;
using namespace RcppParallel;
// [[Rcpp::plugins("cpp11")]]
// [[Rcpp::export]]
NumericVector clenshawAllDerivatives(double t, int N, double Ta, double Tb, NumericVector Coeffs, int derivativesOrder) {
double tau = (2*t-Ta-Tb)/(Tb-Ta);
double helperValues[derivativesOrder + 1][3];
double scale;
for(double i = N; i > 1; i--) {
helperValues[0][2] = helperValues[0][1];
helperValues[0][1] = helperValues[0][0];
helperValues[0][0] = 2*tau*helperValues[0][1]-helperValues[0][2] + Coeffs[i - 1];
for(int j = 1; j <= derivativesOrder; j++) {
helperValues[j][2] = helperValues[j][1];
helperValues[j][1] = helperValues[j][0];
helperValues[j][0] = scale*helperValues[j-1][1] + 2*tau*helperValues[j][1] - helperValues[j][2];
scale += 2.0;
NumericVector output(derivativesOrder + 1);
output[0] = tau*helperValues[0][0] - helperValues[0][1] + Coeffs[0];
scale = 1.0;
double scale2initial = ((Tb-Ta)/2 * 86400.0), scale2 = scale2initial;
for(int j = 1; j <= derivativesOrder; j++) {
output[j] = (scale*helperValues[j-1][0] + tau*helperValues[j][0] - helperValues[j][1]) / scale2;
scale += 1.0;
scale2 = scale2 * scale2initial;
return output;
An example of application of the function, with example input values:
clenshawAllDerivatives(59568.5, 11, 59568, 59584, c(-1.281626e+06, -4.069960e+03, 2.725817e+01, -9.715712e-02, -1.115373e-03, -5.121949e-04, -9.068147e-05, -6.829206e-06, 1.657523e-07 , 1.406006e-07, 2.273966e-08), 1)
When run multiple times, this returns most often the expected correct output of c(-1.277790e+06, -6.037188e-03). However, sometimes it returns instead wrong values, typically very high numbers.
Any help to identify the cause of this unexpected behavior would be greatly appreciated!
Suppose we need to generate a very long harmonic signal, ideally infinitely long. At first glance, the solution seems trivial:
float t = 0;
while (runned)
float v = sinf(w * t);
t += dt;
Unfortunately, this is a non-working solution. For t >> dt due to limited float precision incorrect values will be obtained. Fortunately we can call to mind that sin(2*PI* n + x) = sin(x) where n - arbitrary integer value, therefore modifying the example is not difficult to get an "infinite" analog
float t = 0;
float tau = 2 * M_PI / w;
while (runned)
float v = sinf(w * t);
t += dt;
if (t > tau) t -= tau;
For one physical simulation, I needed to get an infinite signal, which is the sum of harmonic signals, like that:
float getSignal(float x)
float ret = 0;
for (int i = 0; i < modNum; i++)
ret += sin(w[i] * x);
return ret;
float t = 0;
while (runned)
float v = getSignal(t);
t += dt;
In this form, the code does not work correctly for large t, for similar reasons for the Sample1. The question is - how to get an "infinite" implementation of the Sample3 algorithm? I assume that the solution should looks like an Sample2. A very important note - generally speaking, w[i] is arbitrary and not harmonics, that is, all frequencies are not multiples of some base frequency, so i can't find common tau. Using types with greater precission (double, long double) is not allowed.
Thanks for your advice!
You can choose an arbitrary tau and store the phase reminders for each mod when subtracting it from t (as #Damien suggested in the comments).
Also, representing the time as t = dt * it where it is an integer can improve numerical stability (i think).
Maybe something like this:
int ndt = 1000; // accumulate phase every 1000 steps for example
float tau = dt * ndt;
std::vector<float> phases(modNum, 0.0f);
int it = 0;
float t = 0.0f;
while (runned)
t = dt * it;
float v = 0.0f;
for (int i = 0; i < modNum; i++)
v += sinf(w[i] * t + phases[i]);
if (++it >= ndt)
it = 0;
for (int i = 0; i < modNum; ++i)
phases[i] = fmod(w[i] * tau + phases[i], 2 * M_PI);
I am trying to build a spars Matrix using a Eigen or Armadillo library in C++ to solve a system of linear equations Ax=b. A is the coefficient matrix with a dimension of n*n, and B is a vector of right hand side with a dimension of n
the Spars Matrix A is like this, see the figure
I had a look though the Eigen document but I have a problem with defining and filling the Spars Matrix in C++.
could you please give me an example code to define the spars matrix and how to fill the values into the matrix using Eigen library in c++?
consider for example a simple spars matrix A:
1 2 0 0
0 3 0 0
0 0 4 5
0 0 6 7
int main()
SparseMatrix<double> A;
// fill the A matrix ????
VectorXd b, x;
SparseCholesky<SparseMatrix<double> > solver;
x = solver.solve(b);
return 0;
The sparse matrix could be filled with the values mentioned in the post by using the .coeffRef() member function, as shown in this routine:
SparseMatrix<double> fillMatrix() {
int N = 4;
int M = 4;
SparseMatrix<double> m1(N,M);
m1.reserve(VectorXi::Constant(M, 4)); // 4: estimated number of non-zero enties per column
m1.coeffRef(0,0) = 1;
m1.coeffRef(0,1) = 2.;
m1.coeffRef(1,1) = 3.;
m1.coeffRef(2,2) = 4.;
m1.coeffRef(2,3) = 5.;
m1.coeffRef(3,2) = 6.;
m1.coeffRef(3,3) = 7.;
return m1;
However, the SparseCholesky module (SimplicialCholesky<SparseMatrix<double> >) won't work in this case because the matrix is not Hermitian. The system could be solved with a LU or BiCGStab solver. Also note that sizes ofx and b need to be defined:
VectorXd b(A.rows()), x(A.cols());
In case of larger sparse matrices you may also want to look at the .reserve() function in order to allocate memory before filling the elements. The .reserve() function can be used to provide an estimate of the number of non-zero entries per column (or row, depending on the storage order. The default is comumn-major). In the example above that estimate is 4, but it does not make sense in such a small matrix. The documentation states that it is preferable to overestimate the number of non-zeros per column.
Since this question also asks about Armadillo, here is the corresponding Armadillo-based code. Best to use Armadillo version 9.100+ or later, and link with SuperLU.
#include <armadillo>
using namespace arma;
int main()
sp_mat A(4,4); // don't need to explicitly reserve the number of non-zeros
// fill with direct element access
A(0,0) = 1.0;
A(0,1) = 2.0;
A(1,1) = 3.0;
A(2,2) = 4.0;
A(2,3) = 5.0;
A(3,2) = 6.0;
A(3,3) = 7.0; // etc
// or load the sparse matrix from a text file with the data stored in coord format
sp_mat AA;
AA.load("my_sparse_matrix.txt", coord_ascii)
vec b; // ... fill b here ...
vec x = spsolve(A,b); // solve sparse system
return 0;
See also the documentation for SpMat, element access, .load(), spsolve().
The coord file format is simple. It stores non-zeros values.
Each line contains:
row col value
The row and column counts start at zero. Example:
0 0 1.0
0 1 2.0
1 1 3.0
2 2 4.0
2 3 5.0
3 2 6.0
3 3 7.0
1000 2000 9.0
Values not explicitly listed are assumed to be zero.
#include <vector>
#include <iostream>
#include <Eigen/Dense>
#include <Eigen/Sparse>
#include <Eigen/Core>
#include <cstdlib>
using namespace Eigen;
using namespace std;
int main()
double L = 5; // Length
const int N = 120; // No of cells
double L_cell = L / N;
double k = 100; // Thermal Conductivity
double T_A = 100.;
double T_B = 200.;
double S = 1000.;
Vector<double, N> d, D, A, aL, aR, aP, S_u, S_p;
vector<double> xp;
xp.push_back((0 + L_cell) / 2.0);
double xm = xp[0];
for (int i = 0; i < N - 1; i++)
xm = xm + L_cell;
for (int i = 0; i < N; i++)
A(i) = .1;
d(i) = L_cell;
D(i) = k / d(i);
aL(0) = 0;
aR(0) = D(0) * A(0);
S_p(0) = -2 * D(0) * A(0);
aP(0) = aL(0) + aR(0) - S_p(0);
S_u(0) = 2 * D(0) * A(0) * T_A + S * L_cell * A(0);
for (int i = 1; i < N - 1; i++)
aL(i) = D(i) * A(i);
aR(i) = D(i) * A(i);
S_p(i) = 0;
aP(i) = aL(i) + aR(i) - S_p(i);
S_u(i) = S * A(i) * L_cell;
aL(N - 1) = D(N - 1) * A(N - 1);
aR(N - 1) = 0;
S_p(N - 1) = -2 * D(N - 1) * A(N - 1);
aP(N - 1) = aL(N - 1) + aR(N - 1) - S_p(N - 1);
S_u(N - 1) = 2 * D(N - 1) * A(N - 1) * T_B + S * L_cell * A(N - 1);
typedef Eigen::Triplet<double> T;
std::vector<T> tripletList;
tripletList.reserve(N * 3);
Matrix<double, N, 3> v; // v is declared here
v << (-1) * aL, aP, (-1) * aR;
for (int i = 0, j = 0; i < N && j < N; i++, j++)
tripletList.push_back(T(i, j, v(i, 1)));
if (i + 1 < N && j + 1 < N)
tripletList.push_back(T(i + 1, j, v(i + 1, 0)));
tripletList.push_back(T(i, j + 1, v(i, 2)));
SparseMatrix<double> coeff(N, N);
coeff.setFromTriplets(tripletList.begin(), tripletList.end());
SimplicialLDLT<SparseMatrix<double> > solver;
if (solver.info() != Success) {
cout << "decomposition failed" << endl;
Vector<double, N> temparature;
temparature = solver.solve(S_u);
if (solver.info() != Success)
cout << "Solving failed" << endl;
vector<double> Te = {}, x = {};
for (int i = 0; i < N; i++)
for (int i = 0; i < N + 2; i++)
cout << x[i] << " " << Te[i] << endl;
return 0;
Here is a full code of a solution to numerical problem which uses SparseMatrix. Look at the matrix v. It has the values of all the nonzero elements of coeff matrix yet to be defined. In the next loop I made a series of tripletList.push_back(...) adding a triplet consisting of row and column index and corresponding value taken from v for each non-zero element of coeff. Now declare a Sparse Matrix coeff with appropriate size and use the method setFromTriplets (documentation) to set its non-zero elements from tripletList triplets.
I need to evaluate a double integral where the inner upper Bound is variable:
integral2 between -5 and 5 ( integral1 between 0 and y f(x)dx )dy.
I'm stuck in the calculation of the outer loop which is dependent on the inner loop. My code runs for a really long time but returns zero.
How can i calculate a integral with variable limits?
First I created a function doubleIntegrate. In the first place the function holds the arrays with coefficients for the trapeziodal rule.
double NumericIntegrationDouble::doubleIntegrate(double (*doubleFunc
(const double &x), double dy, const double &innerLowBound, const double
double innerValue = 0.0;
double outerValue = 0.0;
// arrays which store function values for the inner (X) and the outer (Y) integration loop
// vector filled with coefficients for the inner poop (trapezoidal rule)
std::vector<double> vecCoeffsX(numberOfIntervalsDouble+1, 2);
vecCoeffsX[0] = 1; // fist coeff = 1
vecCoeffsX[vecCoeffsX.size()-1] = 1; // last coeff = 1
std::vector<double> funcValuesX(numberOfIntervalsDouble+1);
// vector filled with coefficients for the inner poop (trapezoidal rule)
std::vector<double> vecCoeffsY(numberOfIntervalsDouble+1, 2);
vecCoeffsY[0] = 1; // same as above
vecCoeffsY[vecCoeffsY.size()-1] = 1; // same as above
std::vector<double> funcValuesY(numberOfIntervalsDouble+1)
// Then i created a loop in a loop where dy and dy stands for step size of integration. The variables xi and yi stand for the current x and y value.
// outer integration loop dy
for(int i=0; i<=numberOfIntervalsDouble; i++)
double yi = outerLowBound + dy*i;
funcValuesY[i] = (*doubleFunc)(yi);
// inner integration loop dx
for(int j=0; j<=numberOfIntervalsDouble; j++)
double dx = abs(yi - innerLowBound) / (double)numberOfIntervalsDouble;
double xi = innerLowBound + j*dx;
funcValuesX[j] = (*doubleFunc)(xi);
double multValueX = std::inner_product(vecCoeffsX.begin(), vecCoeffsX.end(), funcValuesX.begin(), 0.0);
double innerValue = 0.5 * dx * multValueX;
suminnerValue = suminnerValue + innerValue;
//auto multValueY = std::inner_product(vecCoeffsY.begin(), vecCoeffsY.end(), funcValuesY.begin(), 0.0);
outerValue = 0.5 * dy * suminnerValue;
return outerValue;
I am converting equations to c++. Is this correct for a running standard deviation.
this->runningStandardDeviation = (this->sumOfProcessedSquaredSamples - sumSquaredDividedBySampleCount) / (sampleCount - 1);
Here is the full function:
void BM_Functions::standardDeviationForRunningSamples (float samples [], int sampleCount)
// update the running process samples count
this->totalSamplesProcessed += sampleCount;
// get the mean of the samples
double mean = meanForSamples(samples, sampleCount);
// sum the deviations
// sum the squared deviations
for (int i = 0; i < sampleCount; i++)
// update the deviation sum of processed samples
double deviation = samples[i] - mean;
this->sumOfProcessedSamples += deviation;
// update the squared deviations sum
double deviationSquared = deviation * deviation;
this->sumOfProcessedSquaredSamples += deviationSquared;
// get the sum squared
double sumSquared = this->sumOfProcessedSamples * this->sumOfProcessedSamples;
// get the sum/N
double sumSquaredDividedBySampleCount = sumSquared / this->totalSamplesProcessed;
this->runningStandardDeviation = sqrt((this->sumOfProcessedSquaredSamples - sumSquaredDividedBySampleCount) / (sampleCount - 1));
A numerically stable and efficient algorithm for computing the running mean and variance/SD is Welford's algorithm.
One C++ implementation would be:
std::pair<double,double> getMeanVariance(const std::vector<double>& vec) {
double mean = 0, M2 = 0, variance = 0;
size_t n = vec.size();
for(size_t i = 0; i < n; ++i) {
double delta = vec[i] - mean;
mean += delta / (i + 1);
M2 += delta * (vec[i] - mean);
variance = M2 / (i + 1);
if (i >= 2) {
// <-- You can use the running mean and variance here
return std::make_pair(mean, variance);
Note: to get the SD, just take sqrt(variance)
You may check for sufficient sampleSount (1 would cause division by zero)
MAke sure that the variables have suitable data type (floating point)
Otherwise this looks correct...