Changing the center of a gaussian using FFTW in C++

Changing the center of a gaussian using FFTW in C++ - c++

For a project I need to propagate a gaussian in real space using the Fourier transform of a gaussian centered at the origin using
What I want to calculate
Here is the latex code, since I can't include images yet
N(x | \mu, \sigma) = F^{-1}{F{ N(x |0, \sigma)} e^{-i\ 2\pi \mu\omega} \right},
where \omega is the frequency in Fourier space.
Now the problem I am having is I don't know how to calculate the frequency for some bin after doing the fft with fftw. Here is the code of what I am trying to do.
int main(){
int N = 128; //Number of "pixels" in real space
int N_fft = N/2 + 1;
double *in, *result, *x;
fftw_complex *out;
fftw_plan fw, bw;
in = (double*) fftw_malloc(sizeof(double) * N);
x = (double*) malloc(sizeof(double) * N);
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N_fft);
result = (double*) fftw_malloc(sizeof(double) * N);
fw = fftw_plan_dft_r2c_1d(N, in, out, FFTW_ESTIMATE);
bw = fftw_plan_dft_c2r_1d(N, out, result, FFTW_ESTIMATE);
double min_x = -9.0, max_x = 9.0; //Limits in real space
for (int i=0; i<N; i++){
x[i] = min_x + 2*max_x*i / (N - 1);
in[i] = std::exp(-(x[i]) * (x[i]));
}
for (int i=0; i<N_fft; i++){
out[i][0] = 0.0;
out[i][1] = 0.0;
}
fftw_execute(fw);
double w;
fftw_complex z;
double w_norm;
for (int i=0; i<N_fft; i++){
w = -2*M_PI*i / (max_x - min_x); //WHAT I DON'T KNOW
// Calculating the product with the exponential for translating the gaussian
z[0] = out[i][0]*std::cos(w) - out[i][1]*std::sin(w);
z[1] = out[i][0]*std::sin(w) + out[i][0]*std::cos(w);
out[i][0] = z[0];
out[i][1] = z[1];
}
fftw_execute(bw);
for (int i=0; i<N; i++){
std::cout << x[i] << " " << result[i]/N << " " << std::exp(-x[i] * x[i]) << std::endl;
}
fftw_destroy_plan(fw);
fftw_free(in);
fftw_free(out);
return 0;
}
For the moment I've tried using w-nth = -2*np.pi * 1/(max_x - min_x) * n, which worked in python, but for some reason it doesn't work in c++
Here is the result I am obtaining with c++
result
Here ref is the gaussian centered at 0, the one I obtaing should be centered at 1.0, but that's clearly is not happening.
Here is the latex code, since I can't include images yet
(Here is the latex code, since I can't include images yet)

Generally, the more obvious is the mistake, the more time it takes to find it.
This was verified here.
The mistake is simply here:
z[1] = out[i][0]*std::sin(w) + out[i][0]*std::cos(w);
It should be:
z[1] = out[i][0]*std::sin(w) + out[i][1]*std::cos(w);
Besides, I don't know why you didn't use N__ft = N, but I guess it is related the way fftw works.

Related

Screened Poisson solution with FFTW3

I want to solve the screened Poisson equation described in this paper, in order to compute an image from a coarse approximation and its vertical and horizontal gradients. I am using FFTW3 for the first time for this. However, I have very weird results.
My coarse approximation of the image looks like this:
But the output of my solution looks like this:
.
To debug it, I tried multiple things –
First, I suspected that my computation of the spatial frequencies was off, so I tried to compute the horizontal gradient of the image in the Fourier domain, which resulted in this:
.
Finally, when I only compute the FT of my image and its inverse transform, I get this:
But when I simply copy the FT in a new array and compute its inverse FT, I get this:
.
Here is my function:
void RayTrace::computeFT(double* &image_in, double* &dx_in, double* &dy_in, double* &result){
fftw_complex image_FT[width*height];
fftw_complex dx_FT[width*height];
fftw_complex dy_FT[width*height];
fftw_plan image_plan = fftw_plan_dft_r2c_2d(width, height, image_in, image_FT, FFTW_ESTIMATE);
fftw_plan dx_plan = fftw_plan_dft_r2c_2d(width, height, dx_in, dx_FT, FFTW_ESTIMATE);
fftw_plan dy_plan = fftw_plan_dft_r2c_2d(width, height, dy_in, dy_FT, FFTW_ESTIMATE);
//compute the FTs
fftw_execute(image_plan);
fftw_execute(dx_plan);
fftw_execute(dy_plan);
//delete plans after that
fftw_destroy_plan(image_plan);
fftw_destroy_plan(dx_plan);
fftw_destroy_plan(dy_plan);
fftw_cleanup();
//modify the image in the Fourier domain
fftw_complex result_FT[width*height];
for(int i=0 ; i<height ; i++){
for(int j=0 ; j<width ; j++){
float sx = (i>height/2 ? i-height : i) / float(height);
float sy = (j>width/2 ? j-width : j) / float(width);
int idx = i*width+j;
//These are the things that I tried to do
//compute the screened Poisson solution
result_FT[idx][0] = (alpha*image_FT[idx][0] + 2*M_PI*sx*dx_FT[idx][1] + 2*M_PI*sy*dy_FT[idx][1])/(alpha + 4*M_PI*M_PI*(sx*sx+sy*sy));
result_FT[idx][1] = (alpha*image_FT[idx][1] - 2*M_PI*sx*dx_FT[idx][0] - 2*M_PI*sy*dy_FT[idx][0])/(alpha + 4*M_PI*M_PI*(sx*sx+sy*sy));
//horizontal gradient
result_FT[idx][0] = -2*M_PI*sx*image_FT[idx][1];
result_FT[idx][1] = 2*M_PI*sx*image_FT[idx][0];
//simple copy
result_FT[idx][0] = image_FT[idx][0];
result_FT[idx][1] = image_FT[idx][1];
}
}
//inverse transform
fftw_plan r_back_plan = fftw_plan_dft_c2r_2d(width, height, image_FT, result, FFTW_ESTIMATE);
fftw_execute(r_back_plan);
//delete plan after that
fftw_destroy_plan(r_back_plan);
fftw_cleanup();
//normalize the result
float norm = height * width;
for(int i=0 ; i<height ; i++){
for(int j=0 ; j<width ; j++){
result[i*width + j] /= norm;
}
}
}
Please tell me if you see an error in my implementation
Also, here is the method from which I call the method, once for each color component in my image:
void RayTrace::computeGradient(glm::vec3* &image, glm::vec3* &xGradient, glm::vec3* &zGradient, glm::vec3* &gradient){
//compute the gradient of the image in the fourier domain
double *r_in = new double[width*height];
double *r_dx_in = new double[width*height];
double *r_dy_in = new double[width*height];
double *r_out = new double[width*height];
double *g_in = new double[width*height];
double *g_dx_in = new double[width*height];
double *g_dy_in = new double[width*height];
double *g_out = new double[width*height];
double *b_in = new double[width*height];
double *b_dx_in = new double[width*height];
double *b_dy_in = new double[width*height];
double *b_out = new double[width*height];
for(int i=0 ; i<height ; i++){
for(int j=0 ; j<width ; j++){
r_in[i*width + j] = image[i*width + j].x;
r_dx_in[i*width + j] = xGradient[i*width + j].x;
r_dy_in[i*width + j] = zGradient[i*width + j].x;
g_in[i*width + j] = image[i*width + j].y;
g_dx_in[i*width + j] = xGradient[i*width + j].y;
g_dy_in[i*width + j] = zGradient[i*width + j].y;
b_in[i*width + j] = image[i*width + j].z;
b_dx_in[i*width + j] = xGradient[i*width + j].z;
b_dy_in[i*width + j] = zGradient[i*width + j].z;
}
}
std::cout << width << " " << height << std::endl;
computeFT(r_in, r_dx_in, r_dy_in, r_out);
computeFT(g_in, g_dx_in, g_dy_in, g_out);
computeFT(b_in, b_dx_in, b_dy_in, b_out);
for(int i=0 ; i<height ; i++){
for(int j=0 ; j<width ; j++){
gradient[i*width + j] = glm::vec3(r_out[i*width + j], g_out[i*width + j], b_out[i*width + j]);
}
}
delete r_in;
delete r_out;
delete r_dx_in;
delete r_dy_in;
delete g_in;
delete g_out;
delete g_dx_in;
delete g_dy_in;
delete b_in;
delete b_out;
delete b_dx_in;
delete b_dy_in;
}
EDIT: It turns out that my path tracer output was not normalized at all, therefore the values in my arrays ranged from 1e-3 to 1e6, so I guess FFTW had trouble scaling that. Now that I normalized it, I can get my image back after computing its FFT and then its IFFT without modification which is some progress. However, the gradient computation gives this:
So I guess there is still a problem with my spatial frequency computation

Time dependent 1D Schrodinger equation C++

I wrote the code in C++ which solves the time-dependent 1D Schrodinger equation for the anharmonic potential V = x^2/2 + lambda*x^4, using Thomas algorithm. My code is working and I animate the results in Mathematica, to check what is going on. I test the code against the known solution for the harmonic potential (I put lambda = 0), but the animation shows that abs(Psi) is changing with time, and I know that is not correct for the harmonic potential. Actually, I see that in one point it time it becomes constant, but before that is oscillating.
So I understand that I need to have constant magnitude of the wave function over the time interval, but I don't know how to do it, or where am I doing mistake.
Here is my code and the animation for 100 time steps and 100 points on the grid.
#include <iostream>
#include <iomanip>
#include <cmath>
#include <vector>
#include <cstdlib>
#include <complex>
#include <fstream>
using namespace std;
// Mandatory parameters
const int L = 1; //length of domain in x direction
const int tmax = 10; //end time
const int nx = 100, nt = 100; //number of the grid points and time steps respectively
double lambda; //dictates the shape of the potential (we can use lambda = 0.0
// to test the code against the known solution for the harmonic
// oscillator)
complex<double> I(0.0, 1.0); //imaginary unit
// Derived parameters
double delta_x = 1. / (nx - 1);
//spacing between the grid points
double delta_t = 1. / (nt - 1);
//the time step
double r = delta_t / (delta_x * delta_x); //used to simplify expressions for
// the coefficients of the lhs and
// rhs of the matrix eqn
// Algorithm for solving the tridiagonal matrix system
vector<complex<double> > thomas_algorithm(vector<double>& a,
vector<complex<double> >& b,
vector<double>& c,
vector<complex<double> >& d)
{
// Temporary wave function
vector<complex<double> > y(nx + 1, 0.0);
// Modified matrix coefficients
vector<complex<double> > c_prime(nx + 1, 0.0);
vector<complex<double> > d_prime(nx + 1, 0.0);
// This updates the coefficients in the first row
c_prime[0] = c[0] / b[0];
d_prime[0] = d[0] / b[0];
// Create the c_prime and d_prime coefficients in the forward sweep
for (int i = 1; i < nx + 1; i++)
{
complex<double> m = 1.0 / (b[i] - a[i] * c_prime[i - 1]);
c_prime[i] = c[i] * m;
d_prime[i] = (d[i] - a[i] * d_prime[i - 1]) * m;
}
// This gives the value of the last equation in the system
y[nx] = d_prime[nx];
// This is the reverse sweep, used to update the solution vector
for (int i = nx - 1; i > 0; i--)
{
y[i] = d_prime[i] - c_prime[i] * y[i + 1];
}
return y;
}
void calc()
{
// First create the vectors to store the coefficients
vector<double> a(nx + 1, 1.0);
vector<complex<double> > b(nx + 1, 0.0);
vector<double> c(nx + 1, 1.0);
vector<complex<double> > d(nx + 1, 0.0);
vector<complex<double> > psi(nx + 1, 0.0);
vector<complex<double> > phi(nx + 1, 0.0);
vector<double> V(nx + 1, 0.0);
vector<double> x(nx + 1, 0);
vector<vector<complex<double> > > PSI(nt + 1,
vector<complex<double> >(nx + 1,
0.0));
vector<double> prob(nx + 1, 0);
// We don't have the first member of the left diagonal and the last member
// of the right diagonal
a[0] = 0.0;
c[nx] = 0.0;
for (int i = 0; i < nx + 1; i++)
{
x[i] = (-nx / 2) + i; // Values on the x axis
// Eigenfunction of the harmonic oscillator in the ground state
phi[i] = exp(-pow(x[i] * delta_x, 2) / 2) / (pow(M_PI, 0.25));
// Anharmonic potential
V[i] = pow(x[i] * delta_x, 2) / 2 + lambda * pow(x[i] * delta_x, 4);
// The main diagonal coefficients
b[i] = 2.0 * I / r - 2.0 + V[i] * delta_x * delta_x;
}
double sum0 = 0.0;
for (int i = 0; i < nx + 1; i++)
{
PSI[0][i] = phi[i]; // Initial condition for the wave function
sum0 += abs(pow(PSI[0][i], 2)); // Needed for the normalization
}
sum0 = sum0 * delta_x;
for (int i = 0; i < nx + 1; i++)
{
PSI[0][i] = PSI[0][i] / sqrt(sum0); // Normalization of the initial
// wave function
}
for (int j = 0; j < nt; j++)
{
PSI[j][0] = 0.0;
PSI[j][nx] = 0.0; // Boundary conditions for the wave function
d[0] = 0.0;
d[nx] = 0.0; // Boundary conditions for the rhs
// Fill in the current time step vector d representing the rhs
for (int i = 1; i < nx + 1; i++)
{
d[i] = PSI[j][i + 1]
+ (2.0 - 2.0 * I / r - V[i] * delta_x * delta_x) * PSI[j][i]
+ PSI[j][i - 1];
}
// Now solve the tridiagonal system
psi = thomas_algorithm(a, b, c, d);
for (int i = 1; i < nx; i++)
{
PSI[j + 1][i] = psi[i]; // Assign values to the wave function
}
for (int i = 0; i < nx + 1; i++)
{
// Probability density of the wave function in the next time step
prob[i] = abs(PSI[j + 1][i] * conj(PSI[j + 1][i]));
}
double sum = 0.0;
for (int i = 0; i < nx + 1; i++)
{
sum += prob[i] * delta_x;
}
for (int i = 0; i < nx + 1; i++)
{
// Normalization of the wave function in the next time step
PSI[j + 1][i] /= sqrt(sum);
}
}
// Opening files for writing the results
ofstream file_psi_re, file_psi_imag, file_psi_abs, file_potential,
file_phi0;
file_psi_re.open("psi_re.dat");
file_psi_imag.open("psi_imag.dat");
file_psi_abs.open("psi_abs.dat");
for (int i = 0; i < nx + 1; i++)
{
file_psi_re << fixed << x[i] << " ";
file_psi_imag << fixed << x[i] << " ";
file_psi_abs << fixed << x[i] << " ";
for (int j = 0; j < nt + 1; j++)
{
file_psi_re << fixed << setprecision(6) << PSI[j][i].real() << " ";
file_psi_imag << fixed << setprecision(6) << PSI[j][i].imag()
<< " ";
file_psi_abs << fixed << setprecision(6) << abs(PSI[j][i]) << " ";
}
file_psi_re << endl;
file_psi_imag << endl;
file_psi_abs << endl;
}
}
int main(int argc, char **argv)
{
calc();
return 0;
}
The black line is abs(psi), the red one is Im(psi) and the blue one is Re(psi).

(Bear in mind that my computational physics course was ten years ago now)
You say you are solving a time-dependent system, but I don't see any time-dependence (even if lambda != 0). In the Schrodinger Equation, if the potential function does not depend on time then the different equation is called separable because you can solve the time component and spatial component of the differential equation separately.
The general solution in that case is just the solution to the time-independent Schrodinger Equation multiplied by exp(-iE/h_bar). When you plot the magnitude of the probability that term just becomes 1 and so the probability doesn't change over time. In these cases people quite typically just ignore the time component altogether.
All this is to say that since your potential function doesn't depend on time then you aren't solving a time-dependent Schrodinger Equation. The Tridiagonal Matrix Algorithm can only be used to solve ordinary differential equations, whereas if your potential depended on time you would have a partial differential equation and would need a different method to solve it. Also as a result of that plotting the probability density over time is rarely interesting.
As for why your potential is not constant, numerical methods for finding eigenvalues and eigenvectors rarely produce the normalised eigenvectors naturally, so are you manually normalising your eigenvector before computing your probabilities?

Vectors and matrices in C++ for generating a spectrogram

This is my first attempt to generate a spectrogram of a sinusoidal signal with C++.
To generate the spectrogram:
I divided the real sinusoidal signal into B blocks
Applied Hanning window on each block (I assumed there is no overlap). This should give me the inputs for the fft, in[j][k] where k is the block number
Apply fft on in[j][k] for each block and store it.
Here is the script:
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <fftw3.h>
#include <iostream>
#include <cmath>
#include <fstream>
using namespace std;
int main(){
int i;
int N = 500; // sampled
int Windowsize = 100;
double Fs = 200; // sampling frequency
double T = 1 / Fs; // sample time
double f = 50; // frequency
double *in;
fftw_complex *out;
double t[N]; // time vector
fftw_plan plan_forward;
std::vector<double> signal(N);
int B = N / Windowsize; //number of blocks
in = (double*)fftw_malloc(sizeof(double) * N);
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
//Generating the signal
for(int i = 0; i < = N; i++){
t[i] = i * T;
signal[i] = 0.7 * sin(2 * M_PI * f * t[i]);// generate sine waveform
}
//Applying the Hanning window function on each block B
for(int k = 0; i <= B; k++){
for(int j = 0; j <= Windowsize; j++){
double multiplier = 0.5 * (1 - cos(2 * M_PI * j / (N-1))); // Hanning Window
in[j][k] = multiplier * signal[j];
}
plan_forward = fftw_plan_dft_r2c_1d (Windowsize, in, out, FFTW_ESTIMATE );
fftw_execute(plan_forward);
v[j][k]=(20 * log(sqrt(out[i][0] * out[i][0] + out[i][1] * out[i][1]))) / N;
}
fftw_destroy_plan(plan_forward);
fftw_free(in);
fftw_free(out);
return 0;
}
So, the question is: What is the correct way to declare in[j][k] and v[j][k] variables.
Update:I have declared my v [j] [k] as a matrix : double v [5][249]; according to this site :http://www.cplusplus.com/doc/tutorial/arrays/ so now my script looks like:
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <fftw3.h>
#include <iostream>
#include <cmath>
#include <fstream>
using namespace std;
int main()
{
int i;
double y;
int N=500;//Number of pints acquired inside the window
double Fs=200;//sampling frequency
int windowsize=100;
double dF=Fs/N;
double T=1/Fs;//sample time
double f=50;//frequency
double *in;
fftw_complex *out;
double t[N];//time vector
double tt[5];
double ff[N];
fftw_plan plan_forward;
double v [5][249];
in = (double*) fftw_malloc(sizeof(double) * N);
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
plan_forward = fftw_plan_dft_r2c_1d ( N, in, out, FFTW_ESTIMATE );
for (int i=0; i<= N;i++)
{
t[i]=i*T;
in[i] =0.7 *sin(2*M_PI*f*t[i]);// generate sine waveform
}
for (int k=0; k< 5;k++){
for (int i = 0; i<windowsize; i++){
double multiplier = 0.5 * (1 - cos(2*M_PI*i/(windowsize-1)));//Hanning Window
in[i] = multiplier * in[i+k*windowsize];
fftw_execute ( plan_forward );
for (int i = 0; i<= (N/2); i++)
{
v[k][i]=(20*log10(sqrt(out[i][0]*out[i][0]+ out[i][1]*out[i] [1])));//Here I have calculated the y axis of the spectrum in dB
}
}
}
for (int k=0; k< 5;k++)//Center time for each block
{
tt[k]=(2*k+1)*T*(windowsize/2);
}
fstream myfile;
myfile.open("example2.txt",fstream::out);
myfile << "plot '-' using 1:2" << std::endl;
for (int k=0; k< 5;k++){
for (int i = 0; i<= ((N/2)-1); i++)
{
myfile << v[k][i]<< " " << tt[k]<< std::endl;
}
}
myfile.close();
fftw_destroy_plan ( plan_forward );
fftw_free ( in );
fftw_free ( out );
return 0;
}
I do not get errors anymore but the spectrogram plot is not right.

As indicated in FFTW's documentation, the size of the output (out in your case) when using fftw_plan_dft_r2c_1d is not the same as the size of the input. More specifically for an input of N real samples, the output consists of N/2+1 complex values. You may then allocate out with:
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * (N/2 + 1));
For the spectrogram output you will then similarly have (N/2+1) magnitudes for each of the B blocks, resulting in the 2D array:
double** v = new double*[B];
for (int i = 0; i < B; i++){
v[i] = new double[(N/2+1)];
}
Also, note that you may reuse the input buffer in for each iteration (filling it with data for a new block). However since you have chosen to compute an N-point FFT and will be storing smaller blocks of Windowsize samples (in this case N=500 and Windowsize=100), make sure to initialize the remaining samples with zeros:
in = (double*)fftw_malloc(sizeof(double) * N);
for (int i = 0; i < N; i++){
in[i] = 0;
}
Note that in addition to the declaration and allocation of the in and v variables, the code you posted suffers from a few additional issues:
When computing the Hanning window, you should divide by the Windowsize-1 not N-1 (since in your case N correspond to the FFT size).
You are taking the FFT of the same block of signal over and over again since you are always indexing with j in the [0,Windowsize] range. You would most likely want to add an offset each time you process a different block.
Since the FFT size does not change, you only need to create the plan once. At the very least if you are going to create your plan at every iteration, you should similarly destroy it (with fftw_destroy_plan) at every iteration.
And a few additional points which may require some thoughts:
Scaling the log-scaled magnitudes by dividing by N might not do what you think. You are much more likely to want to scale the linear-scale magnitudes (ie. divide the magnitude before taking the logarithm). Note that this will result in a constant offset of the spectrum curve, which for many application is not that significant. If the scaling is important for your application, you may have a look at another answer of mine for more details.
The common formula 20*log10(x) typically used to convert linear scale to decibels uses a base-10 logarithm instead of the natural log (base e~2.7182) function which you've used. This would result in a multiplicative scaling (stretching), which may or may not be significant depending on your application.
To summarize, the following code might be more in line with what you are trying to do:
// Allocate & initialize buffers
in = (double*)fftw_malloc(sizeof(double) * N);
for (int i = 0; i < N; i++){
in[i] = 0;
}
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * (N/2 + 1));
v = new (double*)[B];
for (int i = 0; i < B; i++){
v[i] = new double[(N/2+1)];
}
// Generate the signal
...
// Create the plan once
plan_forward = fftw_plan_dft_r2c_1d (Windowsize, in, out, FFTW_ESTIMATE);
// Applying the Hanning window function on each block B
for(int k = 0; k < B; k++){
for(int j = 0; j < Windowsize; j++){
// Hanning Window
double multiplier = 0.5 * (1 - cos(2 * M_PI * j / (Windowsize-1)));
in[j] = multiplier * signal[j+k*Windowsize];
}
fftw_execute(plan_forward);
for (int j = 0; j <= N/2; j++){
// Factor of 2 is to account for the fact that we are only getting half
// the spectrum (the other half is not return by a R2C plan due to symmetry)
v[k][j] = 2*(out[j][0] * out[j][0] + out[j][1] * out[j][1])/(N*N);
}
// DC component and at Nyquist frequency do not have a corresponding symmetric
// value, so should not have been doubled up above. Correct those special cases.
v[k][0] *= 0.5;
v[k][N/2] *= 0.5;
// Convert to decibels
for (int j = 0; j <= N/2; j++){
// 20*log10(sqrt(x)) is equivalent to 10*log10(x)
// also use some small epsilon (e.g. 1e-5) to avoid taking the log of 0
v[k][j] = 10 * log10(v[k][j] + epsilon);
}
}
// Clean up
fftw_destroy_plan(plan_forward);
fftw_free(in);
fftw_free(out);
// Delete this last one after you've done something useful with the spectrogram
for (int i = 0; i < B; i++){
delete[] v[i];
}
delete[] v;

Looks like you're missing the initial declaration for 'v' altogether, and 'in' is not declared properly.
See this page for a related question about creating 2D arrays in C++. As I understand, fftw_malloc() is basically new() or malloc() but aligns the variable properly for the FFTW algorithm.
Since you're not supplying 'v' to the anything related to FFTW, you could use standard malloc() for that.

fftw in C++ gets slower for power of 2?

I'm working with the fftw library in C++. I know that the calculation of the fft is most efficient for powers of 2, but I created a minimal example of a two-dimensional fft and I get a different result. The 2d-fft with no power of 2 is calculated much faster than the other one. Here is my code:
int N = 2083;
int M = 2087;
int Npow2 = pow(2, ceil(log2(N)));
int Mpow2 = pow(2, ceil(log2(M)));
fftw_complex * signala = (fftw_complex *)fftw_malloc(sizeof(fftw_complex)* N * M);
for (int i = 0; i < N; i++)
{
for (int j = 0; j < M; j++)
{
signala[i*M + j][0] = rand();
signala[i*M + j][0] = 0;
}
}
fftw_complex * signala_ext = (fftw_complex *)fftw_malloc(sizeof(fftw_complex)* Npow2 * Mpow2);
fftw_complex * outa = (fftw_complex *)fftw_malloc(sizeof(fftw_complex)* N * M);
fftw_complex * outaext = (fftw_complex *)fftw_malloc(sizeof(fftw_complex)* Npow2 * Mpow2);
//Create Plans
fftw_plan pa = fftw_plan_dft_2d(N, M, signala, outa, FFTW_FORWARD, FFTW_ESTIMATE);
fftw_plan paext = fftw_plan_dft_2d(Npow2, Mpow2, signala_ext, outaext, FFTW_FORWARD, FFTW_ESTIMATE);
//zeropadding
memset(signala_ext, 0, sizeof(fftw_complex)* Npow2 * Mpow2); //Null setzen
for (int i = 0; i < N; i++)
{
for (int j = 0; j < M; j++)
{
signala_ext[i*Mpow2 + j][0] = signala[i*M + j][0];
signala_ext[i*Mpow2 + j][1] = signala[i*M + j][1];
}
}
//Execute FFT
double tstart1 = clock();
fftw_execute(pa);
double time1 = (clock() - tstart1) / CLOCKS_PER_SEC;
printf("Time: %f sec\n", time1);
double tstart2 = clock();
fftw_execute(paext);
double time2 = (clock() - tstart2) / CLOCKS_PER_SEC;
printf("Time: %f sec\n", time2);
I choosed prime numbers for N and M. My programms returns:
For signala (non-power-of-2): 2.95 sec
For signala_ext (power-of-2): 5.232 sec
Why is the fft with power of 2 so much slower? What have I done wrong?
I will be thankful for any help!

FFTW likes dimensions which are products of powers of small primes. The nearest value above 2083 or 2087 which meets this criterion is 2100 (2100 = 22 * 3 * 52 * 7), so if you go for dimensions of 2100 x 2100 then you should see decent performance.

Confusion testing fftw3 - poisson equation 2d test

I am having trouble explaining/understanding the following phenomenon:
To test fftw3 i am using the 2d poisson test case:
laplacian(f(x,y)) = - g(x,y) with periodic boundary conditions.
After applying the fourier transform to the equation we obtain : F(kx,ky) = G(kx,ky) /(kx² + ky²) (1)
if i take g(x,y) = sin (x) + sin(y) , (x,y) \in [0,2 \pi] i have immediately f(x,y) = g(x,y)
which is what i am trying to obtain with the fft :
i compute G from g with a forward Fourier transform
From this i can compute the Fourier transform of f with (1).
Finally, i compute f with the backward Fourier transform (without forgetting to normalize by 1/(nx*ny)).
In practice, the results are pretty bad?
(For instance, the amplitude for N = 256 is twice the amplitude obtained with N = 512)
Even worse, if i try g(x,y) = sin(x)*sin(y) , the curve has not even the same form of the solution.
(note that i must change the equation; i divide by two the laplacian in this case : (1) becomes F(kx,ky) = 2*G(kx,ky)/(kx²+ky²)
Here is the code:
/*
* fftw test -- double precision
*/
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <fftw3.h>
using namespace std;
int main()
{
int N = 128;
int i, j ;
double pi = 3.14159265359;
double *X, *Y ;
X = (double*) malloc(N*sizeof(double));
Y = (double*) malloc(N*sizeof(double));
fftw_complex *out1, *in2, *out2, *in1;
fftw_plan p1, p2;
double L = 2.*pi;
double dx = L/(N - 1);
in1 = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*(N*N) );
out2 = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*(N*N) );
out1 = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*(N*N) );
in2 = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*(N*N) );
p1 = fftw_plan_dft_2d(N, N, in1, out1, FFTW_FORWARD,FFTW_MEASURE );
p2 = fftw_plan_dft_2d(N, N, in2, out2, FFTW_BACKWARD,FFTW_MEASURE);
for(i = 0; i < N; i++){
X[i] = -pi + i*dx ;
for(j = 0; j < N; j++){
Y[j] = -pi + j*dx ;
in1[i*N + j][0] = sin(X[i]) + sin(Y[j]) ; // row major ordering
//in1[i*N + j][0] = sin(X[i]) * sin(Y[j]) ; // 2nd test case
in1[i*N + j][1] = 0 ;
}
}
fftw_execute(p1); // FFT forward
for ( i = 0; i < N; i++){ // f = g / ( kx² + ky² )
for( j = 0; j < N; j++){
in2[i*N + j][0] = out1[i*N + j][0]/ (i*i+j*j+1e-16);
in2[i*N + j][1] = out1[i*N + j][1]/ (i*i+j*j+1e-16);
//in2[i*N + j][0] = 2*out1[i*N + j][0]/ (i*i+j*j+1e-16); // 2nd test case
//in2[i*N + j][1] = 2*out1[i*N + j][1]/ (i*i+j*j+1e-16);
}
}
fftw_execute(p2); //FFT backward
// checking the results computed
double erl1 = 0.;
for ( i = 0; i < N; i++) {
for( j = 0; j < N; j++){
erl1 += fabs( in1[i*N + j][0] - out2[i*N + j][0]/N/N )*dx*dx;
cout<< i <<" "<< j<<" "<< sin(X[i])+sin(Y[j])<<" "<< out2[i*N+j][0]/N/N <<" "<< endl; // > output
}
}
cout<< erl1 << endl ; // L1 error
fftw_destroy_plan(p1);
fftw_destroy_plan(p2);
fftw_free(out1);
fftw_free(out2);
fftw_free(in1);
fftw_free(in2);
return 0;
}
I can't find any (more) mistakes in my code (i installed the fftw3 library last week) and i don't see a problem with the maths either but i don't think it's the fft's fault. Hence my predicament. I am all out of ideas and all out of google as well.
Any help solving this puzzle would be greatly appreciated.
note :
compiling : g++ test.cpp -lfftw3 -lm
executing : ./a.out > output
and i use gnuplot in order to plot the curves :
(in gnuplot ) splot "output" u 1:2:4 ( for the computed solution )

Here are some little points to be modified :
You need to account for all small frequencies, including the negative ones ! Index i corresponds to the frequency 2PI i/N but also to the frequency 2PI (i-N)/N. In the Fourier space, the end of the array matters as much as the beginning ! In our case, we keep the smallest frequency : it's 2PI i/N for the first half of the array, and 2PI(i-N)/N on the second half.
Of course, as Paul said, N-1 should be Nin double dx = L/(N - 1); => double dx = L/(N ); N-1 does not correspond to a continious periodic signal. It woud be hard to use it as a test case...
Scaling...I did it empirically
The result i obtain is closer to the expected one, for both cases. Here is the code :
/*
* fftw test -- double precision
*/
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <fftw3.h>
using namespace std;
int main()
{
int N = 128;
int i, j ;
double pi = 3.14159265359;
double *X, *Y ;
X = (double*) malloc(N*sizeof(double));
Y = (double*) malloc(N*sizeof(double));
fftw_complex *out1, *in2, *out2, *in1;
fftw_plan p1, p2;
double L = 2.*pi;
double dx = L/(N );
in1 = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*(N*N) );
out2 = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*(N*N) );
out1 = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*(N*N) );
in2 = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*(N*N) );
p1 = fftw_plan_dft_2d(N, N, in1, out1, FFTW_FORWARD,FFTW_MEASURE );
p2 = fftw_plan_dft_2d(N, N, in2, out2, FFTW_BACKWARD,FFTW_MEASURE);
for(i = 0; i < N; i++){
X[i] = -pi + i*dx ;
for(j = 0; j < N; j++){
Y[j] = -pi + j*dx ;
in1[i*N + j][0] = sin(X[i]) + sin(Y[j]) ; // row major ordering
// in1[i*N + j][0] = sin(X[i]) * sin(Y[j]) ; // 2nd test case
in1[i*N + j][1] = 0 ;
}
}
fftw_execute(p1); // FFT forward
for ( i = 0; i < N; i++){ // f = g / ( kx² + ky² )
for( j = 0; j < N; j++){
double fact=0;
in2[i*N + j][0]=0;
in2[i*N + j][1]=0;
if(2*i<N){
fact=((double)i*i);
}else{
fact=((double)(N-i)*(N-i));
}
if(2*j<N){
fact+=((double)j*j);
}else{
fact+=((double)(N-j)*(N-j));
}
if(fact!=0){
in2[i*N + j][0] = out1[i*N + j][0]/fact;
in2[i*N + j][1] = out1[i*N + j][1]/fact;
}else{
in2[i*N + j][0] = 0;
in2[i*N + j][1] = 0;
}
//in2[i*N + j][0] = out1[i*N + j][0];
//in2[i*N + j][1] = out1[i*N + j][1];
// in2[i*N + j][0] = out1[i*N + j][0]*(1.0/(i*i+1e-16)+1.0/(j*j+1e-16)+1.0/((N-i)*(N-i)+1e-16)+1.0/((N-j)*(N-j)+1e-16))*N*N;
// in2[i*N + j][1] = out1[i*N + j][1]*(1.0/(i*i+1e-16)+1.0/(j*j+1e-16)+1.0/((N-i)*(N-i)+1e-16)+1.0/((N-j)*(N-j)+1e-16))*N*N;
//in2[i*N + j][0] = 2*out1[i*N + j][0]/ (i*i+j*j+1e-16); // 2nd test case
//in2[i*N + j][1] = 2*out1[i*N + j][1]/ (i*i+j*j+1e-16);
}
}
fftw_execute(p2); //FFT backward
// checking the results computed
double erl1 = 0.;
for ( i = 0; i < N; i++) {
for( j = 0; j < N; j++){
erl1 += fabs( in1[i*N + j][0] - out2[i*N + j][0]/(N*N))*dx*dx;
cout<< i <<" "<< j<<" "<< sin(X[i])+sin(Y[j])<<" "<< out2[i*N+j][0]/(N*N) <<" "<< endl; // > output
// cout<< i <<" "<< j<<" "<< sin(X[i])*sin(Y[j])<<" "<< out2[i*N+j][0]/(N*N) <<" "<< endl; // > output
}
}
cout<< erl1 << endl ; // L1 error
fftw_destroy_plan(p1);
fftw_destroy_plan(p2);
fftw_free(out1);
fftw_free(out2);
fftw_free(in1);
fftw_free(in2);
return 0;
}
This code is far from being perfect, it is neither optimized nor beautiful. But it gives almost what is expected.
Bye,

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Changing the center of a gaussian using FFTW in C++ - c++

Related

Screened Poisson solution with FFTW3

Time dependent 1D Schrodinger equation C++

Vectors and matrices in C++ for generating a spectrogram

fftw in C++ gets slower for power of 2?

Confusion testing fftw3 - poisson equation 2d test

Categories

Resources