Is this C++ FFT function equivalent to "fft" matlab function? - c++

I have always used the function "fft(x)" in matlab where "x" is a vector of complex numbers. I am looking for an easy to use function in C++ that would return complex numbers.
I have found this code : http://paulbourke.net/miscellaneous/dft/
If it is equivalent, how can I use it ? Thank you for your time !
/*
This computes an in-place complex-to-complex FFT
x and y are the real and imaginary arrays of 2^m points.
dir = 1 gives forward transform
dir = -1 gives reverse transform
*/
short FFT(short int dir,long m,double *x,double *y)
{
long n,i,i1,j,k,i2,l,l1,l2;
double c1,c2,tx,ty,t1,t2,u1,u2,z;
/* Calculate the number of points */
n = 1;
for (i=0;i<m;i++)
n *= 2;
/* Do the bit reversal */
i2 = n >> 1;
j = 0;
for (i=0;i<n-1;i++) {
if (i < j) {
tx = x[i];
ty = y[i];
x[i] = x[j];
y[i] = y[j];
x[j] = tx;
y[j] = ty;
}
k = i2;
while (k <= j) {
j -= k;
k >>= 1;
}
j += k;
}
/* Compute the FFT */
c1 = -1.0;
c2 = 0.0;
l2 = 1;
for (l=0;l<m;l++) {
l1 = l2;
l2 <<= 1;
u1 = 1.0;
u2 = 0.0;
for (j=0;j<l1;j++) {
for (i=j;i<n;i+=l2) {
i1 = i + l1;
t1 = u1 * x[i1] - u2 * y[i1];
t2 = u1 * y[i1] + u2 * x[i1];
x[i1] = x[i] - t1;
y[i1] = y[i] - t2;
x[i] += t1;
y[i] += t2;
}
z = u1 * c1 - u2 * c2;
u2 = u1 * c2 + u2 * c1;
u1 = z;
}
c2 = sqrt((1.0 - c1) / 2.0);
if (dir == 1)
c2 = -c2;
c1 = sqrt((1.0 + c1) / 2.0);
}
/* Scaling for forward transform */
if (dir == 1) {
for (i=0;i<n;i++) {
x[i] /= n;
y[i] /= n;
}
}
return(TRUE);
}

Alternative Suggestion:
I had same problem. I used fft library from fftw.
http://www.fftw.org/download.html
Its performance is similar to matlab.

The code looks fine at first glance. The original FFT is not a lot of code.
One feature of the FFT is that it is an inplace operation. Many higher level bindings effectively hide that fact.
So you put your real and imaginary parts into the x and y arrays. After the executing the function you read those same arrays for your result.
This particularly simple implementation will only work with powers of 2 as the original FFT. If you have inputs of lengths other than powers of 2 you could zero-pad your signal.
Google the book Numerical Recipes and fft (older versions are freely available) if you want to read up on the background of the FFT. The version in that book is different from other implementations in that you have to feed in the real and imaginary parts interleaved.
What I'm missing on the implementation that you are quoting is the use of pi or trigonometric functions. You'll have to try it out to compare against Matlab.

Related

Mapping points to and from a Hilbert curve

I have been trying to write a function for the Hilbert curve map and inverse map. Fortunately there was another SE post on it, and the accepted answer was highly upvoted, and featured code based on a paper in a peer-reviewed academic journal.
Unfortunately, I played around with the code above and looked at the paper, and I'm not sure how to get this to work. What appears to be broken is that my code drawing the second half of a 2-bit 2-dimensional Hilbert Curve backwards. If you draw out the 2-d coordinates in the last column, you'll see the second half of the curve (position 8 and on) backwards.
I don't think I'm allowed to post the original C code, but the C++ version below is only lightly edited. A few things that are different in my code.
C code is less strict on types, so I had to use std::bitset
In addition to the bug mentioned by #PaulChernoch in the aforementioned SE post, the next for loop segfaults, too.
The paper represents one-dimensional coordinates weirdly. They call it the number's "Transpose." I wrote a function that produces a "Transpose" from a regular integer.
Another thing about this algorithm: it doesn't produce a map between unit intervals and unit hypercubes. Rather, it stretches the problem out and maps between intervals and cubes with unit spacing.
NB: HTranspose is the representation of H they use in the paper
H, HTranspose, mappedCoordinates
------------------------------------
0: (00, 00), (0, 0)
1: (00, 01), (1, 0)
2: (01, 00), (1, 1)
3: (01, 01), (0, 1)
4: (00, 10), (0, 2)
5: (00, 11), (0, 3)
6: (01, 10), (1, 3)
7: (01, 11), (1, 2)
8: (10, 00), (3, 2)
9: (10, 01), (3, 3)
10: (11, 00), (2, 3)
11: (11, 01), (2, 2)
12: (10, 10), (2, 1)
13: (10, 11), (3, 1)
14: (11, 10), (3, 0)
15: (11, 11), (2, 0)
Here's the code (in c++).
#include <array>
#include <bitset>
#include <iostream>
#include <cmath>
namespace hilbert {
/// The Hilbert index is expressed as an array of transposed bits.
///
/// Example: 5 bits for each of n=3 coordinates.
/// 15-bit Hilbert integer = A B C D E F G H I J K L M N O is stored
/// as its Transpose ^
/// X[0] = A D G J M X[2]| 7
/// X[1] = B E H K N <-------> | /X[1]
/// X[2] = C F I L O axes |/
/// high low 0------> X[0]
template<size_t num_bits, size_t num_dims>
std::array<std::bitset<num_bits>,num_dims> TransposeToAxes(std::array<std::bitset<num_bits>,num_dims> X)
{
using coord_t = std::bitset<num_bits>;
using coords_t = std::array<coord_t, num_dims>;
coord_t N = 2 << (num_bits-1);
// Gray decode by H ^ (H/2)
coord_t t = X[num_dims-1] >> 1;
for(size_t i = num_dims-1; i > 0; i-- ) // https://stackoverflow.com/a/10384110
X[i] ^= X[i-1];
X[0] ^= t;
// Undo excess work
for( coord_t Q = 2; Q != N; Q <<= 1 ) {
coord_t P = Q.to_ulong() - 1;
for( size_t i = num_dims-1; i > 0 ; i-- ){ // did the same stackoverflow thing
if( (X[i] & Q).any() )
X[0] ^= P;
else{
t = (X[0]^X[i]) & P;
X[0] ^= t;
X[i] ^= t;
}
}
}
return X;
}
template<size_t num_bits, size_t num_dims>
std::array<std::bitset<num_bits>,num_dims> AxesToTranspose(std::array<std::bitset<num_bits>, num_dims> X)
{
using coord_t = std::bitset<num_bits>;
using coords_t = std::array<coord_t, num_dims>;
coord_t M = 1 << (num_bits-1);
// Inverse undo
for( coord_t Q = M; Q.to_ulong() > 1; Q >>= 1 ) {
coord_t P = Q.to_ulong() - 1;
for(size_t i = 0; i < num_bits; i++ ){
if( (X[i] & Q).any() )
X[0] ^= P;
else{
coord_t t = (X[0]^X[i]) & P;
X[0] ^= t;
X[i] ^= t;
}
}
} // exchange
// Gray encode
for( size_t i = 1; i < num_bits; i++ )
X[i] ^= X[i-1];
coord_t t = 0;
for( coord_t Q = M; Q.to_ulong() > 1; Q >>= 1 ){
if( (X[num_dims-1] & Q).any() )
t ^= Q.to_ulong()-1;
}
for( size_t i = 0; i < num_bits; i++ )
X[i] ^= t;
return X;
}
template<size_t num_bits, size_t num_dims>
std::array<std::bitset<num_bits>,num_dims> makeHTranspose(unsigned int H)
{
using coord_t = std::bitset<num_bits>;
using coords_t = std::array<coord_t, num_dims>;
using big_coord_t = std::bitset<num_bits*num_dims>;
big_coord_t Hb = H;
coords_t X;
for(size_t dim = 0; dim < num_dims; ++dim){
coord_t tmp;
unsigned c = num_dims - 1;
for(size_t rbit = dim; rbit < num_bits*num_dims; rbit += num_dims){
tmp[c] =Hb[num_bits*num_dims - 1 - rbit];
c--;
}
X[dim] = tmp;
}
return X;
}
} //namespace hilbert
int main()
{
constexpr unsigned nb = 2;
constexpr unsigned nd = 2;
using coord_t = std::bitset<nb>;
using coords_t = std::array<coord_t,nd>;
std::cout << "NB: HTranspose is the representation of H they use in the paper\n";
std::cout << "H, HTranspose, mappedCoordinates \n";
std::cout << "------------------------------------\n";
for(unsigned H = 0; H < pow(2,nb*nd); ++H){
// H with the representation they use in the paper
coords_t weirdH = hilbert::makeHTranspose<nb,nd>(H);
std::cout << H << ": ("
<< weirdH[0] << ", "
<< weirdH[1] << "), ("
<< hilbert::TransposeToAxes<nb,nd>(weirdH)[0].to_ulong() << ", "
<< hilbert::TransposeToAxes<nb,nd>(weirdH)[1].to_ulong() << ")\n";
}
}
Some strange things I noticed about the other post:
In addition to the bug mentioned by #PaulChernoch in the above mentioned SE post, the next for loop segfaults, too.
Nobody is talking about how the paper doesn't provide a mapping between the unit interval and the unit cubes, but rather a mapping from integers to big cubes, and
I don't see any mention here about the weird representation the paper uses for unsigned integers "Transpose".
If you draw out the 2-d coordinates in the last column, you'll see the second half of the curve (position 8 and on) backwards.
"In addition to the bug mentioned by #PaulChernoch in the above mentioned SE post, the next for loop segfaults, too." Actually it was a bug in my code--I was having a hard time iterating over a container backwards. I started looking at myself after I realized there were other Python packages (e.g. this and this) that use the same underlying C code. Neither of them were complaining about the other for loop.
Second, there was a bug in my function above that generates the transpose, too. And third, the inverse function, AxesToTranspose confused number of bits with number of dimensions.
All four correct functions (the two provided in the paper that do all the heavy lifting, and two more for converting between integers and "Transpose"s) are as follows:
.
/**
* #brief converts an integer in a transpose form to a position on the Hilbert Curve.
* Code is based off of John Skilling , "Programming the Hilbert curve",
* AIP Conference Proceedings 707, 381-387 (2004) https://doi.org/10.1063/1.1751381
* #file resamplers.h
* #tparam num_bits how "accurate/fine/squiggly" you want the Hilbert curve
* #tparam num_dims the number of dimensions the curve is in
* #param X an unsigned integer in a "Transpose" form.
* #return a position on the hilbert curve
*/
template<size_t num_bits, size_t num_dims>
std::array<std::bitset<num_bits>,num_dims> TransposeToAxes(std::array<std::bitset<num_bits>,num_dims> X)
{
using coord_t = std::bitset<num_bits>;
using coords_t = std::array<coord_t, num_dims>;
// Gray decode by H ^ (H/2)
coord_t t = X[num_dims-1] >> 1;
for(int i = num_dims-1; i > 0; i-- ) // https://stackoverflow.com/a/10384110
X[i] ^= X[i-1];
X[0] ^= t;
// Undo excess work
coord_t N = 2 << (num_bits-1);
for( coord_t Q = 2; Q != N; Q <<= 1 ) {
coord_t P = Q.to_ulong() - 1;
for( int i = num_dims - 1; i >= 0; i--){
if( (X[i] & Q).any() ){ // invert low bits of X[0]
X[0] ^= P;
} else{ // exchange low bits of X[i] and X[0]
t = (X[0]^X[i]) & P;
X[0] ^= t;
X[i] ^= t;
}
}
}
return X;
}
/**
* #brief converts a position on the Hilbert curve into an integer in a "transpose" form.
* Code is based off of John Skilling , "Programming the Hilbert curve",
* AIP Conference Proceedings 707, 381-387 (2004) https://doi.org/10.1063/1.1751381
* #file resamplers.h
* #tparam num_bits how "accurate/fine/squiggly" you want the Hilbert curve
* #tparam num_dims the number of dimensions the curve is in
* #param X a position on the hilbert curve (each dimension coordinate is in base 2)
* #return a position on the real line (in a "Transpose" form)
*/
template<size_t num_bits, size_t num_dims>
std::array<std::bitset<num_bits>,num_dims> AxesToTranspose(std::array<std::bitset<num_bits>, num_dims> X)
{
using coord_t = std::bitset<num_bits>;
using coords_t = std::array<coord_t, num_dims>;
// Inverse undo
coord_t M = 1 << (num_bits-1);
for( coord_t Q = M; Q.to_ulong() > 1; Q >>= 1 ) {
coord_t P = Q.to_ulong() - 1;
for(size_t i = 0; i < num_dims; i++ ){
if( (X[i] & Q).any() )
X[0] ^= P;
else{
coord_t t = (X[0]^X[i]) & P;
X[0] ^= t;
X[i] ^= t;
}
}
} // exchange
// Gray encode
for( size_t i = 1; i < num_dims; i++ )
X[i] ^= X[i-1];
coord_t t = 0;
for( coord_t Q = M; Q.to_ulong() > 1; Q >>= 1 ){
if( (X[num_dims-1] & Q).any() )
t ^= Q.to_ulong()-1;
}
for( size_t i = 0; i < num_dims; i++ )
X[i] ^= t;
return X;
}
/**
* #brief converts an integer on the positive integers into its "Transpose" representation..
* This code supplements the above two functions that are
* based off of John Skilling , "Programming the Hilbert curve",
* AIP Conference Proceedings 707, 381-387 (2004) https://doi.org/10.1063/1.1751381
* #file resamplers.h
* #tparam num_bits how "accurate/fine/squiggly" you want the Hilbert curve
* #tparam num_dims the number of dimensions the curve is in
* #param H a position on the hilbert curve (0,1,..,2^(num_dims * num_bits) )
* #return a position on the real line (in a "Transpose" form)
*/
template<size_t num_bits, size_t num_dims>
std::array<std::bitset<num_bits>,num_dims> makeHTranspose(unsigned int H)
{
using coord_t = std::bitset<num_bits>;
using coords_t = std::array<coord_t, num_dims>;
using big_coord_t = std::bitset<num_bits*num_dims>;
big_coord_t Hb = H;
coords_t X;
for(size_t dim = 0; dim < num_dims; ++dim){
coord_t dim_coord_tmp;
unsigned start_bit = num_bits*num_dims-1-dim;
unsigned int c = num_bits - 1;
for(int bit = start_bit; bit >= 0; bit -= num_dims){
dim_coord_tmp[c] = Hb[bit];
c--;
}
X[dim] = dim_coord_tmp;
}
return X;
}
/**
* #brief converts an integer in its "Transpose" representation into a positive integer..
* This code supplements two functions above that are
* based off of John Skilling , "Programming the Hilbert curve",
* AIP Conference Proceedings 707, 381-387 (2004) https://doi.org/10.1063/1.1751381
* #file resamplers.h
* #tparam num_bits how "accurate/fine/squiggly" you want the Hilbert curve
* #tparam num_dims the number of dimensions the curve is in
* #param Htrans a position on the real line (in a "Transpose" form)
* #return a position on the hilbert curve (0,1,..,2^(num_dims * num_bits) )
*/
template<size_t num_bits, size_t num_dims>
unsigned int makeH(std::array<std::bitset<num_bits>,num_dims> Htrans)
{
using coord_t = std::bitset<num_bits>;
using coords_t = std::array<coord_t, num_dims>;
using big_coord_t = std::bitset<num_bits*num_dims>;
big_coord_t H;
unsigned int which_dim = 0;
unsigned which_bit;
for(int i = num_bits*num_dims - 1; i >= 0; i--){
which_bit = i / num_dims;
H[i] = Htrans[which_dim][which_bit];
which_dim = (which_dim + 1) % num_dims;
}
return H.to_ulong();
}
I have some unit tests here as well.

C++ Induction Algorithm very slow and Dynamical Programming

I have a mathematical control problem which I solve through Backward induction. The mathematical problem is the following :
with K less than n.
And final conditions
What is J(0,0,0) ?
For this purpose I am using c++ and mingw 32 bit as a compiler.
The problem is the code (below) which solve the problem is an induction and does not provide any results if n,M > 15.
I have tried to launch n=M=100 for 4 days but no results.
Does anyone have a solution? Is it a compiler option to change (the processor memory is not enough)? The complexity is too big?
Here my code
const int n = 10;
const int M = 10;
double J_naive (double K, double Z, double W)
{
double J_tmp = exp(100.0);
double WGreaterThanZero = 0.0;
//Final condition : Boundaries
if (K == n)
{
if (W > 0) WGreaterThanZero = 1.0;
else WGreaterThanZero = 0.0;
if (Z >= WGreaterThanZero) return 0.0;
return exp(100.0);//Infinity
}
//Induction
else if (K < n)
{
double y;
for (int i = 0; i <= M; i++)
{
y = ((double) i)/M;
{
J_tmp = std::min (J_tmp, ((double) n)*y*y +
0.5*J_naive(K+1.0, Z+y, W + 1.0/sqrt(n)) +
0.5*J_naive(K+1.0, Z+y, W - 1.0/sqrt(n)) );
}
}
}
return J_tmp;
}
int main()
{
J_naive(0.0, 0.0, 0.0);
}
You can try the following, completely untested DP code. It needs around 24*n^3*M bytes of memory; if you have that much memory, it should run within a few seconds. If there is some value that will never appear as a true return value, you can get rid of seen_[][][] and use that value in result_[][][] to indicate that the subproblem has not yet been solved; this will reduce memory requirements by about a third. It's based on your code before you made edits to fix bugs.
const int n = 10;
const int M = 10;
bool seen_[n][n * M][2 * n]; // Initially all false
double result_[n][n * M][2 * n];
double J_naive(unsigned K, unsigned ZM, double W0, int Wdsqrtn)
{
double J_tmp = exp(100.0);
double WGreaterThanZero = 0.0;
double Z = (double) ZM / M;
double W = W0 + Wdsqrtn * 1./sqrt(n);
//Final condition : Boundaries
if (K == n)
{
if (W > 0) WGreaterThanZero = 1.0;
else WGreaterThanZero = 0.0;
if (Z >= WGreaterThanZero) return 0.0;
return exp(100.0);//Infinity
}
//Induction
else if (K < n)
{
if (!seen_[K][ZM][Wdsqrtn + n]) {
// Haven't seen this subproblem yet: compute the answer
for (int i = 0; i <= M; i++)
{
J_tmp = std::min (J_tmp, ((double) n)*i/M*i/M +
0.5*J_naive(K+1, ZM+i, W0, Wdsqrtn+1) +
0.5*J_naive(K+1, ZM+i, W0, Wdsqrtn-1) );
}
result_[K][ZM][Wdsqrtn + n] = J_tmp;
seen_[K][ZM][Wdsqrtn + n] = true;
}
}
return result_[K][ZM][Wdsqrtn + n];
}

Time dependent 1D Schrodinger equation C++

I wrote the code in C++ which solves the time-dependent 1D Schrodinger equation for the anharmonic potential V = x^2/2 + lambda*x^4, using Thomas algorithm. My code is working and I animate the results in Mathematica, to check what is going on. I test the code against the known solution for the harmonic potential (I put lambda = 0), but the animation shows that abs(Psi) is changing with time, and I know that is not correct for the harmonic potential. Actually, I see that in one point it time it becomes constant, but before that is oscillating.
So I understand that I need to have constant magnitude of the wave function over the time interval, but I don't know how to do it, or where am I doing mistake.
Here is my code and the animation for 100 time steps and 100 points on the grid.
#include <iostream>
#include <iomanip>
#include <cmath>
#include <vector>
#include <cstdlib>
#include <complex>
#include <fstream>
using namespace std;
// Mandatory parameters
const int L = 1; //length of domain in x direction
const int tmax = 10; //end time
const int nx = 100, nt = 100; //number of the grid points and time steps respectively
double lambda; //dictates the shape of the potential (we can use lambda = 0.0
// to test the code against the known solution for the harmonic
// oscillator)
complex<double> I(0.0, 1.0); //imaginary unit
// Derived parameters
double delta_x = 1. / (nx - 1);
//spacing between the grid points
double delta_t = 1. / (nt - 1);
//the time step
double r = delta_t / (delta_x * delta_x); //used to simplify expressions for
// the coefficients of the lhs and
// rhs of the matrix eqn
// Algorithm for solving the tridiagonal matrix system
vector<complex<double> > thomas_algorithm(vector<double>& a,
vector<complex<double> >& b,
vector<double>& c,
vector<complex<double> >& d)
{
// Temporary wave function
vector<complex<double> > y(nx + 1, 0.0);
// Modified matrix coefficients
vector<complex<double> > c_prime(nx + 1, 0.0);
vector<complex<double> > d_prime(nx + 1, 0.0);
// This updates the coefficients in the first row
c_prime[0] = c[0] / b[0];
d_prime[0] = d[0] / b[0];
// Create the c_prime and d_prime coefficients in the forward sweep
for (int i = 1; i < nx + 1; i++)
{
complex<double> m = 1.0 / (b[i] - a[i] * c_prime[i - 1]);
c_prime[i] = c[i] * m;
d_prime[i] = (d[i] - a[i] * d_prime[i - 1]) * m;
}
// This gives the value of the last equation in the system
y[nx] = d_prime[nx];
// This is the reverse sweep, used to update the solution vector
for (int i = nx - 1; i > 0; i--)
{
y[i] = d_prime[i] - c_prime[i] * y[i + 1];
}
return y;
}
void calc()
{
// First create the vectors to store the coefficients
vector<double> a(nx + 1, 1.0);
vector<complex<double> > b(nx + 1, 0.0);
vector<double> c(nx + 1, 1.0);
vector<complex<double> > d(nx + 1, 0.0);
vector<complex<double> > psi(nx + 1, 0.0);
vector<complex<double> > phi(nx + 1, 0.0);
vector<double> V(nx + 1, 0.0);
vector<double> x(nx + 1, 0);
vector<vector<complex<double> > > PSI(nt + 1,
vector<complex<double> >(nx + 1,
0.0));
vector<double> prob(nx + 1, 0);
// We don't have the first member of the left diagonal and the last member
// of the right diagonal
a[0] = 0.0;
c[nx] = 0.0;
for (int i = 0; i < nx + 1; i++)
{
x[i] = (-nx / 2) + i; // Values on the x axis
// Eigenfunction of the harmonic oscillator in the ground state
phi[i] = exp(-pow(x[i] * delta_x, 2) / 2) / (pow(M_PI, 0.25));
// Anharmonic potential
V[i] = pow(x[i] * delta_x, 2) / 2 + lambda * pow(x[i] * delta_x, 4);
// The main diagonal coefficients
b[i] = 2.0 * I / r - 2.0 + V[i] * delta_x * delta_x;
}
double sum0 = 0.0;
for (int i = 0; i < nx + 1; i++)
{
PSI[0][i] = phi[i]; // Initial condition for the wave function
sum0 += abs(pow(PSI[0][i], 2)); // Needed for the normalization
}
sum0 = sum0 * delta_x;
for (int i = 0; i < nx + 1; i++)
{
PSI[0][i] = PSI[0][i] / sqrt(sum0); // Normalization of the initial
// wave function
}
for (int j = 0; j < nt; j++)
{
PSI[j][0] = 0.0;
PSI[j][nx] = 0.0; // Boundary conditions for the wave function
d[0] = 0.0;
d[nx] = 0.0; // Boundary conditions for the rhs
// Fill in the current time step vector d representing the rhs
for (int i = 1; i < nx + 1; i++)
{
d[i] = PSI[j][i + 1]
+ (2.0 - 2.0 * I / r - V[i] * delta_x * delta_x) * PSI[j][i]
+ PSI[j][i - 1];
}
// Now solve the tridiagonal system
psi = thomas_algorithm(a, b, c, d);
for (int i = 1; i < nx; i++)
{
PSI[j + 1][i] = psi[i]; // Assign values to the wave function
}
for (int i = 0; i < nx + 1; i++)
{
// Probability density of the wave function in the next time step
prob[i] = abs(PSI[j + 1][i] * conj(PSI[j + 1][i]));
}
double sum = 0.0;
for (int i = 0; i < nx + 1; i++)
{
sum += prob[i] * delta_x;
}
for (int i = 0; i < nx + 1; i++)
{
// Normalization of the wave function in the next time step
PSI[j + 1][i] /= sqrt(sum);
}
}
// Opening files for writing the results
ofstream file_psi_re, file_psi_imag, file_psi_abs, file_potential,
file_phi0;
file_psi_re.open("psi_re.dat");
file_psi_imag.open("psi_imag.dat");
file_psi_abs.open("psi_abs.dat");
for (int i = 0; i < nx + 1; i++)
{
file_psi_re << fixed << x[i] << " ";
file_psi_imag << fixed << x[i] << " ";
file_psi_abs << fixed << x[i] << " ";
for (int j = 0; j < nt + 1; j++)
{
file_psi_re << fixed << setprecision(6) << PSI[j][i].real() << " ";
file_psi_imag << fixed << setprecision(6) << PSI[j][i].imag()
<< " ";
file_psi_abs << fixed << setprecision(6) << abs(PSI[j][i]) << " ";
}
file_psi_re << endl;
file_psi_imag << endl;
file_psi_abs << endl;
}
}
int main(int argc, char **argv)
{
calc();
return 0;
}
The black line is abs(psi), the red one is Im(psi) and the blue one is Re(psi).
(Bear in mind that my computational physics course was ten years ago now)
You say you are solving a time-dependent system, but I don't see any time-dependence (even if lambda != 0). In the Schrodinger Equation, if the potential function does not depend on time then the different equation is called separable because you can solve the time component and spatial component of the differential equation separately.
The general solution in that case is just the solution to the time-independent Schrodinger Equation multiplied by exp(-iE/h_bar). When you plot the magnitude of the probability that term just becomes 1 and so the probability doesn't change over time. In these cases people quite typically just ignore the time component altogether.
All this is to say that since your potential function doesn't depend on time then you aren't solving a time-dependent Schrodinger Equation. The Tridiagonal Matrix Algorithm can only be used to solve ordinary differential equations, whereas if your potential depended on time you would have a partial differential equation and would need a different method to solve it. Also as a result of that plotting the probability density over time is rarely interesting.
As for why your potential is not constant, numerical methods for finding eigenvalues and eigenvectors rarely produce the normalised eigenvectors naturally, so are you manually normalising your eigenvector before computing your probabilities?

C++: replicating matlab's interp1 spline interpolation function

Can anyone give me some direction to replicating MATLAB's interp1 function, using spline interpolation? I tried closely replicating the algorithm on the wikipedia page, but the results don't really match up.
#include <stdio.h>
#include <stdint.h>
#include <iostream>
#include <vector>
//MATLAB: interp1(x,test_array,query_points,'spline')
int main(){
int size = 10;
std::vector<float> test_array(10);
test_array[0] = test_array[4] = test_array[8] = 1;
test_array[1] = test_array[3] = test_array[5] = test_array[7] = test_array[9] = 4;
test_array[2] = test_array[6] = 7;
std::vector<float> query_points;
for (int i = 0; i < 10; i++)
query_points.push_back(i +.05);
int n = (size - 1);
std::vector<float> a(n+1);
std::vector<float> x(n+1); //sample_points vector
for (int i = 0; i < (n+1); i++){
x[i] = i + 1.0;
a[i] = test_array[i];
}
std::vector<float> b(n);
std::vector<float> d(n);
std::vector<float> h(n);
for (int i = 0; i < (n); ++i)
h[i] = x[i+1] - x[i];
std::vector<float> alpha(n);
for (int i = 1; i < n; ++i)
alpha[i] = ((3 / h[i]) * (a[i+1] - a[i])) - ((3 / h[i-1]) * (a[i] - a[i-1]));
std::vector<float> c(n+1);
std::vector<float> l(n+1);
std::vector<float> u(n+1);
std::vector<float> z(n+1);
l[0] = 1.0;
u[0] = z[0] = 0.0;
for (int i = 1; i < n; ++i){
l[i] = (2 * (x[i+1] - x[i-1])) - (h[i-1] * u[i-1]);
u[i] = h[i] / l[i];
z[i] = (alpha[i] - (h[i-1] * z[i-1])) / l[i];
}
l[n] = 1.0;
z[n] = c[n] = 0.0;
for (int j = (n - 1); j >= 0; j--){
c[j] = z[j] - (u[j] * c[j+1]);
b[j] = ((a[j+1] - a[j]) / h[j]) - ((h[j] / 3) * (c[j+1] + (2 * c[j])));
d[j] = (c[j+1] - c[j]) / (3 * h[j]);
}
std::vector<float> output_array(10);
for (int i = 0; i < n-1; i++){
float eval_point = (query_points[i] - x[i]);
output_array[i] = a[i] + (eval_point * b[i]) + ( eval_point * eval_point * c[i]) + (eval_point * eval_point * eval_point * d[i]);
std::cout << output_array[i] << std::endl;
}
system("pause");
return 0;
}
In hindsight, your code seems to be coded properly referring to the Wikipedia article. However, there is something you need to know about interp1 which I don't think you have taken into account when using it to check your answers.
MATLAB's interp1 when you specify the spline flag assumes that the end point conditions are not-a-knot. The algorithm specified on Wikipedia is the code for a natural spline.
As such, this is probably why your points do not match up. FWIW, consult: http://www.cs.tau.ac.il/~turkel/notes/numeng/spline_note.pdf and look at the diagram on the last page. You'll see that not-a-knot splines and natural splines bear the same shape, but have different y-values when your data consists of just the end points of your spline. However, should you have data points in between the end points, all of the different kinds of splines (more or less) have the same y values.
For the sake of completeness, here is the figure extracted from the PDF notes I referenced above:
If you want to use natural splines, use csape instead of interp1. This provides a cubic spline with end conditions. You call csape like this:
pp = csape(x,y);
x and y are the control points defined for your spline. By default, this returns a natural spline, which is what you're after, and is a struct of type ppform. You can then figure out what the spline evaluates to by using fnval:
yval = fnval(pp, xval);
xval and yval is the input x co-ordinate and the output evaluated for the spline at this particular x.
Use this, then check to see if your code matches up with the values provided by csape.
Minor Note
You need the Curve Fitting Toolbox in MATLAB to use csape. If you don't have this, then unfortunately this method will not work.
I think the interp1 is supported by MATLAB CODER.
Just use the CODER to generate the C code and you have what you need.

OpenCV Sum of squared differences speed

I've been using the openCV to do some block matching and I've noticed it's sum of squared differences code is very fast compared to a straight forward for loop like this:
int SSD = 0;
for(int i =0; i < arraySize; i++)
SSD += (array1[i] - array2[i] )*(array1[i] - array2[i]);
If I look at the source code to see where the heavy lifting happens, the
OpenCV folks have their for loops do 4 squared difference calculations at a time in each iteration of the loop. The function to do the block matching looks like this.
int64
icvCmpBlocksL2_8u_C1( const uchar * vec1, const uchar * vec2, int len )
{
int i, s = 0;
int64 sum = 0;
for( i = 0; i <= len - 4; i += 4 )
{
int v = vec1[i] - vec2[i];
int e = v * v;
v = vec1[i + 1] - vec2[i + 1];
e += v * v;
v = vec1[i + 2] - vec2[i + 2];
e += v * v;
v = vec1[i + 3] - vec2[i + 3];
e += v * v;
sum += e;
}
for( ; i < len; i++ )
{
int v = vec1[i] - vec2[i];
s += v * v;
}
return sum + s;
}
This calculation is for unsigned 8 bit integers. They perform a similar calculation for 32-bit floats in this function:
double
icvCmpBlocksL2_32f_C1( const float *vec1, const float *vec2, int len )
{
double sum = 0;
int i;
for( i = 0; i <= len - 4; i += 4 )
{
double v0 = vec1[i] - vec2[i];
double v1 = vec1[i + 1] - vec2[i + 1];
double v2 = vec1[i + 2] - vec2[i + 2];
double v3 = vec1[i + 3] - vec2[i + 3];
sum += v0 * v0 + v1 * v1 + v2 * v2 + v3 * v3;
}
for( ; i < len; i++ )
{
double v = vec1[i] - vec2[i];
sum += v * v;
}
return sum;
}
I was wondering if anyone had any idea if breaking a loop up into chunks of 4 like this might speed up code? I should add that there is no multithreading occuring in this code.
My guess is that this is just a simple implementation of unrolling the loop - it saves 3 additions and 3 compares on each pass of the loop, which can be a great savings if, for example, checking len involves a cache miss. The downside is that this optimization adds code complexity (e.g. the additional for loop at the end to finish the loop for the len % 4 items left if the length is not evenly divisible by 4) and, of course, it's an architecture-dependent optimization whose magnitude of improvement will vary by hardware/compiler/etc...
Still, it's straightforward to follow compared to most optimizations and will probably result in some sort of performance increase regardless of the architecture, so it's low risk to just throw it in there and hope for the best. Since OpenCV is such a well-supported chunk of code, I'm sure that someone instrumented these chunks of code and found them to be well worth it - as you yourself have done.
There is one obvious optimisation of your code, viz:
int SSD = 0;
for(int i = 0; i < arraySize; i++)
{
int v = array1[i] - array2[i];
SSD += v * v;
}