C++ - Improve computation time for complex number math - c++

I am trying to calculate complex numbers for a 2D array in C++. The code is running very slowly and I have narrowed down the main cause to be the exp function (the program runs quickly when I comment out that line, even though I have 4 nested loops).
int main() {
typedef vector< complex<double> > complexVect;
typedef vector<double> doubleVect;
const int SIZE = 256;
vector<doubleVect> phi_w(SIZE, doubleVect(SIZE));
vector<complexVect> phi_k(SIZE, complexVect(SIZE));
complex<double> i (0, 1), cmplx (0, 0);
complex<double> temp;
int x, y, t, k, w;
double dk = 2.0*M_PI / (SIZE-1);
double dt = M_PI / (SIZE-1);
int xPos, yPos;
double arg, arg2, arg4;
complex<double> arg3;
double angle;
vector<complexVect> newImg(SIZE, complexVect(SIZE));
for (x = 0; x < SIZE; ++x) {
xPos = -127 + x;
for (y = 0; y < SIZE; ++y) {
yPos = -127 + y;
for (t = 0; t < SIZE; ++t) {
temp = cmplx;
angle = dt * t;
arg = xPos * cos(angle) + yPos * sin(angle);
for (k = 0; k < SIZE; ++k) {
arg2 = -M_PI + dk*k;
arg3 = exp(-i * arg * arg2);
arg4 = abs(arg) * M_PI / (abs(arg) + M_PI);
temp = temp + arg4 * arg3 * phi_k[k][t];
}
}
newImg[y][x] = temp;
}
}
}
Is there a way I can improve computation time? I have tried using the following helper function but it doesn't noticeably help.
complex<double> complexexp(double arg) {
complex<double> temp (sin(arg), cos(arg));
return temp;
}
I am using clang++ to compile my code
edit: I think the problem is the fact that I'm trying to calculate complex numbers. Would it be faster if I just used Euler's formula to calculate the real and imaginary parts in separate arrays and not have to deal with the complex class?

maybe this will work for you:
http://martin.ankerl.com/2007/02/11/optimized-exponential-functions-for-java/

I've had a look with callgrind. The only marginal improvement (~1.3% with size = 50) I could find was to change:
temp = temp + arg4 * arg3 * phi_k[k][t];
to
temp += arg4 * arg3 * phi_k[k][t];

The most costly function calls were sin()/cos(). I suspect that calling exp() with a complex number argument calls those functions in the background.
To retain precision, the function will compute very slowly and there doesn't seem to be a way around it. However, you could trade precision for accuracy, which seems to be what game developers would do: sin and cos are slow, is there an alternatve?

You can define number e as a constant and use std::pow() function

Related

Stack around the variable 'Yarray' was corrupted

When I declare an array to store the Y values of each coordinate, define its values then use each of the element values to send into a rounding function, i obtain the error 'Run-Time Check Failure #2 - Stack around the variable 'Yarray; was corrupted. The output is mostly what is expected although i'm wondering why this is happening and if i can mitigate it, cheers.
void EquationElement::getPolynomial(int * values)
{
//Takes in coefficients to calculate Y values for a polynomial//
double size = 40;
double step = 1;
int Yarray[40];
int third = *values;
int second = *(values + 1);
int first = *(values + 2);
int constant = *(values + 3);
double x, Yvalue;
for (int i = 0; i < size + size + 1; ++i) {
x = (i - (size));
x = x * step;
double Y = (third *(x*x*x)) + (second *(x*x)) + (first * (x))
Yvalue = Y / step;
Yarray[i] = int(round(Yvalue)); //<-MAIN ISSUE HERE?//
cout << Yarray[i] << endl;
}
}
double EquationElement::round(double number)
{
return number < 0.0 ? ceil(number - 0.5) : floor(number + 0.5);
// if n<0 then ceil(n-0.5) else if >0 floor(n+0.5) ceil to round up floor to round down
}
// values could be null, you should check that
// if instead of int* values, you took std::vector<int>& values
// You know besides the values, the quantity of them
void EquationElement::getPolynomial(const int* values)
{
//Takes in coefficients to calculate Y values for a polynomial//
static const int size = 40; // No reason for size to be double
static const int step = 1; // No reason for step to be double
int Yarray[2*size+1]{}; // 40 will not do {} makes them initialized to zero with C++11 onwards
int third = values[0];
int second = values[1]; // avoid pointer arithmetic
int first = values[2]; // [] will work with std::vector and is clearer
int constant = values[3]; // Values should point at least to 4 numbers; responsability goes to caller
for (int i = 0; i < 2*size + 1; ++i) {
double x = (i - (size)) * step; // x goes from -40 to 40
double Y = (third *(x*x*x)) + (second *(x*x)) + (first * (x)) + constant;
// Seems unnatural that x^1 is values and x^3 is values+2, being constant at values+3
double Yvalue= Y / step; // as x and Yvalue will not be used outside the loop, no need to declare them there
Yarray[i] = int(round(Yvalue)); //<-MAIN ISSUE HERE?//
// Yep, big issue, i goes from 0 to size*2; you need size+size+1 elements
cout << Yarray[i] << endl;
}
}
Instead of
void EquationElement::getPolynomial(const int* values)
You could also declare
void EquationElement::getPolynomial(const int (&values)[4])
Which means that now you need to call it with a pointer to 4 elements; no more and no less.
Also, with std::vector:
void EquationElement::getPolynomial(const std::vector<int>& values)
{
//Takes in coefficients to calculate Y values for a polynomial//
static const int size = 40; // No reason for size to be double
static const int step = 1; // No reason for step to be double
std::vector<int> Yarray;
Yarray.reserve(2*size+1); // This is just optimization. Yarran *Can* grow above this limit.
int third = values[0];
int second = values[1]; // avoid pointer arithmetic
int first = values[2]; // [] will work with std::vector and is clearer
int constant = values[3]; // Values should point at least to 4 numbers; responsability goes to caller
for (int i = 0; i < 2*size + 1; ++i) {
double x = (i - (size)) * step; // x goes from -40 to 40
double Y = (third *(x*x*x)) + (second *(x*x)) + (first * (x)) + constant;
// Seems unnatural that x^1 is values and x^3 is values+2, being constant at values+3
double Yvalue= Y / step; // as x and Yvalue will not be used outside the loop, no need to declare them there
Yarray.push_back(int(round(Yvalue)));
cout << Yarray.back() << endl;
}
}

Limited float precision and infinitely harmonic signal generation problem

Suppose we need to generate a very long harmonic signal, ideally infinitely long. At first glance, the solution seems trivial:
Sample1:
float t = 0;
while (runned)
{
float v = sinf(w * t);
t += dt;
}
Unfortunately, this is a non-working solution. For t >> dt due to limited float precision incorrect values will be obtained. Fortunately we can call to mind that sin(2*PI* n + x) = sin(x) where n - arbitrary integer value, therefore modifying the example is not difficult to get an "infinite" analog
Sample2:
float t = 0;
float tau = 2 * M_PI / w;
while (runned)
{
float v = sinf(w * t);
t += dt;
if (t > tau) t -= tau;
}
For one physical simulation, I needed to get an infinite signal, which is the sum of harmonic signals, like that:
Sample3:
float getSignal(float x)
{
float ret = 0;
for (int i = 0; i < modNum; i++)
ret += sin(w[i] * x);
return ret;
}
float t = 0;
while (runned)
{
float v = getSignal(t);
t += dt;
}
In this form, the code does not work correctly for large t, for similar reasons for the Sample1. The question is - how to get an "infinite" implementation of the Sample3 algorithm? I assume that the solution should looks like an Sample2. A very important note - generally speaking, w[i] is arbitrary and not harmonics, that is, all frequencies are not multiples of some base frequency, so i can't find common tau. Using types with greater precission (double, long double) is not allowed.
Thanks for your advice!
You can choose an arbitrary tau and store the phase reminders for each mod when subtracting it from t (as #Damien suggested in the comments).
Also, representing the time as t = dt * it where it is an integer can improve numerical stability (i think).
Maybe something like this:
int ndt = 1000; // accumulate phase every 1000 steps for example
float tau = dt * ndt;
std::vector<float> phases(modNum, 0.0f);
int it = 0;
float t = 0.0f;
while (runned)
{
t = dt * it;
float v = 0.0f;
for (int i = 0; i < modNum; i++)
{
v += sinf(w[i] * t + phases[i]);
}
if (++it >= ndt)
{
it = 0;
for (int i = 0; i < modNum; ++i)
{
phases[i] = fmod(w[i] * tau + phases[i], 2 * M_PI);
}
}
}

Memory Overflow? std::badalloc

I have a program that solves generally for 1D brownian motion using an Euler's Method.
Being a stochastic process, I want to average it over many particles. But I find that as I ramp up the number of particles, it overloads and i get the std::badalloc error, which I understand is a memory error.
Here is my full code
#include <iostream>
#include <vector>
#include <fstream>
#include <cmath>
#include <cstdlib>
#include <limits>
#include <ctime>
using namespace std;
// Box-Muller Method to generate gaussian numbers
double generateGaussianNoise(double mu, double sigma) {
const double epsilon = std::numeric_limits<double>::min();
const double tau = 2.0 * 3.14159265358979323846;
static double z0, z1;
static bool generate;
generate = !generate;
if (!generate) return z1 * sigma + mu;
double u1, u2;
do {
u1 = rand() * (1.0 / RAND_MAX);
u2 = rand() * (1.0 / RAND_MAX);
} while (u1 <= epsilon);
z0 = sqrt(-2.0 * log(u1)) * cos(tau * u2);
z1 = sqrt(-2.0 * log(u1)) * sin(tau * u2);
return z0 * sigma + mu;
}
int main() {
// Initialize Variables
double gg; // Gaussian Number Picked from distribution
// Integrator
double t0 = 0; // Setting the Time Window
double tf = 10;
double n = 5000; // Number of Steps
double h = (tf - t0) / n; // Time Step Size
// Set Constants
const double pii = atan(1) * 4; // pi
const double eta = 1; // viscous constant
const double m = 1; // mass
const double aa = 1; // radius
const double Temp = 30; // Temperature in Kelvins
const double KB = 1; // Boltzmann Constant
const double alpha = (6 * pii * eta * aa);
// More Constants
const double mu = 0; // Gaussian Mean
const double sigma = 1; // Gaussian Std Deviation
const double ng = n; // No. of pts to generate for Gauss distribution
const double npart = 1000; // No. of Particles
// Initial Conditions
double x0 = 0;
double y0 = 0;
double t = t0;
// Vectors
vector<double> storX; // Vector that keeps displacement values
vector<double> storY; // Vector that keeps velocity values
vector<double> storT; // Vector to store time
vector<double> storeGaussian; // Vector to store Gaussian numbers generated
vector<double> holder; // Placeholder Vector for calculation operations
vector<double> mainstore; // Vector that holds the final value desired
storT.push_back(t0);
// Prepares mainstore
for (int z = 0; z < (n+1); z++) {
mainstore.push_back(0);
}
for (int NN = 0; NN < npart; NN++) {
holder.clear();
storX.clear();
storY.clear();
storT.clear();
storT.push_back(0);
// Prepares holder
for (int z = 0; z < (n+1); z++) {
holder.push_back(0);
storX.push_back(0);
storY.push_back(0);
}
// Gaussian Generator
srand(time(NULL));
for (double iiii = 0; iiii < ng; iiii++) {
gg = generateGaussianNoise(0, 1); // generateGaussianNoise(mu,sigma)
storeGaussian.push_back(gg);
}
// Solver
for (int ii = 0; ii < n; ii++) {
storY[ii + 1] =
storY[ii] - (alpha / m) * storY[ii] * h +
(sqrt(2 * alpha * KB * Temp) / m) * sqrt(h) * storeGaussian[ii];
storX[ii + 1] = storX[ii] + storY[ii] * h;
holder[ii + 1] =
pow(storX[ii + 1], 2); // Finds the displacement squared
t = t + h;
storT.push_back(t);
}
// Updates the Main Storage
for (int z = 0; z < storX.size(); z++) {
mainstore[z] = mainstore[z] + holder[z];
}
}
// Average over the number of particles
for (int z = 0; z < storX.size(); z++) {
mainstore[z] = mainstore[z] / (npart);
}
// Outputs the data
ofstream fout("LangevinEulerTest.txt");
for (int jj = 0; jj < storX.size(); jj++) {
fout << storT[jj] << '\t' << mainstore[jj] << '\t' << storX[jj] << endl;
}
return 0;
}
As you can see, npart is the variable that I change to vary the number of particles. But after each iteration, I do clear my storage vectors like storX,storY... So on paper, the number of particles should not affect memory? I am only just calling the compiler to repeat many more times, and add onto the main storage vector mainstore. I am running my code on a computer with 4GB ram.
Would greatly appreciate it if anyone could point out my errors in logic or suggest improvements.
Edit: Currently the number of particles is set to npart = 1000.
So when I try to ramp it up to like npart = 20000 or npart = 50000, it gives me memory errors.
Edit2 I've edited the code to allocate an extra index to each of the storage vectors. But it does not seem to fix the memory overflow
There is an out of bounds exception in the solver part. storY has size n and you access ii+1 where i goes up to n-1. So for your code provided. storY has size 5000. It is allowed to access with indices between 0 and 4999 (including) but you try to access with index 5000. The same for storX, holder and mainstore.
Also, storeGaussian does not get cleared before adding new variables. It grows by n for each npart loop. You access only the first n values of it in the solver part anyway.
Please note, that vector::clear removes all elements from the vector, but does not necessarily change the vector's capacity (i.e. it's storage array), see the documentation.
This won't cause the problem here, because you'll reuse the same array in the next runs, but it's something to be aware when using vectors.

pseudo code for sqrt function

I managed to get my sqrt function to run perfectly, but I'm second guessing if I wrote this code correctly based on the pseudo code I was given.
Here is the pseudo code:
x = 1
repeat 10 times: x = (x + n / x) / 2
return x.
The code I wrote,
#include <iostream>
#include <math.h>
using namespace std;
double my_sqrt_1(double n)
{
double x= 1; x<10; ++x;
return (x+n/x)/2;
}
No, your code is not following your pseudo-code. For example, you're not repeating anything in your code. You need to add a loop to do that:
#include <iostream>
#include <math.h>
using namespace std;
double my_sqrt_1(double n)
{
double x = 1;
for(int i = 0; i < 10; ++i) // repeat 10 times
x = (x+n/x)/2;
return x;
}
Let's analyze your code:
double x = 1;
// Ok, x set to 1
x < 10;
// This is true, as 1 is less than 10, but it is not used anywhere
++x;
// Increment x - now x == 2
return (x + n / x) / 2
// return value is always (2 + n / 2) / 2
As you don't have any loop, function will always exit in the first "iteration" with the return value (2 + n / 2) / 2.
Just as another approach that you can use binary search or the another pretty elegant solution is to use the Newton's method.
Newton's method is a method for finding roots of a function, making use of a function's derivative. At each step, a value is calculated as: x(step) = x(step-1) - f(x(step-1))/f'(x(step-1)) Newton's_method
This might be faster than binary search.My implementation in C++:
double NewtonMethod(double x) {
double eps = 0.0001; //the precision
double x0 = 10;
while( fabs(x-x0) > eps) {
double a = x0*x0-n;
double r = a/(2*x0);
x = x0 - r;
x0 = x;
}
return x;
}
Since people are showing different approaches to calculating the square root, I couldn't resist ;)...
Below is the exact copy (with the original comments, but without preprocessor directives) of the inverse square root implementation from Quake III Arena:
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the...?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}

Recursive Sine function

I'm writing a sine function that has to be recursive. I have written a sine function but am not really sure how to do it recursively. Could someone explain how to get started on this?
This is what I have so far:
/*--------------------------------------------------------------
Name: sine( double X );
Return: Function "sine" will return the
sine of X, where X is measured in radians.
--------------------------------------------------------------*/
double sine(double X)
{
double result = 0;
double term;
int k;
double lim;
k = 0;
lim = power(10, -8);
term = power(-1, k)*power(X, ((2*k) + 1)) / (factorial((2*k)+1));
result = term;
while (absolute(term) > lim)
{
k += 1;
term = power(-1, k)*power(X, ((2*k) + 1)) / (factorial((2*k)+1));
result += term;
}
return result;
}
EDIT: I used a wrapper function to solve this. Basically created another function called
double sine_rec(double X, double k)
and changed around the current code to fit in with that.
The way I would approach this would be to have another function sine(double X, int n) which takes another integer parameter - the number of terms to include in the power series approximation. Then this function could return something like [nth term in series] + sine(X, n - 1) (just remember a prior if statement to deal with n = 1).
You can eliminate the while loop by recursion in following way:
double sine(double X, int k = 0)
{
double result = 0;
double term;
double lim;
lim = power(10, -8);
term = power(-1, k)*power(X, ((2*k) + 1)) / (factorial((2*k)+1));
if (absolute(term) > lim)
{
return sine(X, k+1) + term;
}
else
{
return term;
}
}
But I cannot recommend doing this at all. (There are better solutions even to this recursion, but find them on your own)