what is the exact equivalent of this MATLAB line code in C++ and using FFTW?
note: X is an array of 4096 double data.
now I use these lines of code in c++ and FFTW to compute fft
int n = 4096
fftw_complex *x;
fftw_complex *y;
x = (fftw_complex *)fftw_malloc(sizeof(fftw_complex) * n);
y = (fftw_complex *)fftw_malloc(sizeof(fftw_complex) * n);
for (int i=0; i<n; i++)
x[i][REAL] = MyDoubleData[i];
x[i][IMAG] = 0;
fftw_plan plan = fftw_plan_dft_1d(n, x, y, FFTW_FORWARD, FFTW_ESTIMATE);
It is just equivalent of FFT function in MATLAB.
Is there any equivalent function for FftShift in FFTW library?

The FFTW function calls you've provided would be the equivalent of fft(x,4096). If x is real, matlab knows to give you the conjugate symmetric FFT (I think). If you want to do this with FFTW, you need to use the r2c and c2r functions (real-to-complex/complex-to-real).
You have to do the shift yourself. You can do direct substitution (poor performance, but should be intuitive)
for (int i=0; i<n; i++)
fftw_complex tmp;
int src = i;
int dst = (i + n/2 - 1) % n;
Alternatively use a couple memcpy's (and/or memmove's) or modify your input data

The output of the fftw is stored base on following format of frequencies sequence:
Where N is number of frequencies and is oven.
And fftshift change it to:
[-(N-1)/2,..., 0..., (N-1)/2]
but you should note that fftw output has equivalent as:
[0,.., N-1] is same as [0,...,(N-1)/2,-(N-1)/2,...,-1]
This means that in DFT, frequency -i is same as N-i.


generating correct spectrogram using fftw and window function

For a project I need to be able to generate a spectrogram from a .WAV file. I've read the following should be done:
Get N (transform size) samples
Apply a window function
Do a Fast Fourier Transform using the samples
Normalise the output
Generate spectrogram
On the image below you see two spectrograms of a 10000 Hz sine wave both using the hanning window function. On the left you see a spectrogram generated by audacity and on the right my version. As you can see my version has a lot more lines/noise. Is this leakage in different bins? How would I get a clear image like the one audacity generates. Should I do some post-processing? I have not yet done any normalisation because do not fully understand how to do so.
I found this tutorial explaining how to generate a spectrogram in c++. I compiled the source to see what differences I could find.
My math is very rusty to be honest so I'm not sure what the normalisation does here:
for(i = 0; i < half; i++){
out[i][0] *= (2./transform_size);
out[i][6] *= (2./transform_size);
processed[i] = out[i][0]*out[i][0] + out[i][7]*out[i][8];
//sets values between 0 and 1?
processed[i] =10. * (log (processed[i] + 1e-6)/log(10)) /-60.;
after doing this I got this image (btw I've inverted the colors):
I then took a look at difference of the input samples provided by my sound library and the one of the tutorial. Mine were way higher so I manually normalised is by dividing it by the factor 32767.9. I then go this image which looks pretty ok I think. But dividing it by this number seems wrong. And I would like to see a different solution.
Here is the full relevant source code.
void Spectrogram::process(){
int i;
int transform_size = 1024;
int half = transform_size/2;
int step_size = transform_size/2;
double in[transform_size];
double processed[half];
fftw_complex *out;
fftw_plan p;
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);
for(int x=0; x < wavFile->getSamples()/step_size; x++){
int j = 0;
for(i = step_size*x; i < (x * step_size) + transform_size - 1; i++, j++){
in[j] = wavFile->getSample(i)/32767.9;
//apply window function
for(i = 0; i < transform_size; i++){
in[i] *= windowHanning(i, transform_size);
// in[i] *= windowBlackmanHarris(i, transform_size);
p = fftw_plan_dft_r2c_1d(transform_size, in, out, FFTW_ESTIMATE);
fftw_execute(p); /* repeat as needed */
for(i = 0; i < half; i++){
out[i][0] *= (2./transform_size);
out[i][11] *= (2./transform_size);
processed[i] = out[i][0]*out[i][0] + out[i][12]*out[i][13];
processed[i] =10. * (log (processed[i] + 1e-6)/log(10)) /-60.;
for (i = 0; i < half; i++){
if(processed[i] > 0.99)
processed[i] = 1;
This is not exactly an answer as to what is wrong but rather a step by step procedure to debug this.
What do you think this line does? processed[i] = out[i][0]*out[i][0] + out[i][12]*out[i][13] Likely that is incorrect: fftw_complex is typedef double fftw_complex[2], so you only have out[i][0] and out[i][1], where the first is the real and the second the imaginary part of the result for that bin. If the array is contiguous in memory (which it is), then out[i][12] is likely the same as out[i+6][0] and so forth. Some of these will go past the end of the array, adding random values.
Is your window function correct? Print out windowHanning(i, transform_size) for every i and compare with a reference version (for example numpy.hanning or the matlab equivalent). This is the most likely cause, what you see looks like a bad window function, kind of.
Print out processed, and compare with a reference version (given the same input, of course you'd have to print the input and reformat it to feed into pylab/matlab etc). However, the -60 and 1e-6 are fudge factors which you don't want, the same effect is better done in a different way. Calculate like this:
power_in_db[i] = 10 * log(out[i][0]*out[i][0] + out[i][1]*out[i][1])/log(10)
Print out the values of power_in_db[i] for the same i but for all x (a horizontal line). Are they approximately the same?
If everything so far is good, the remaining suspect is setting the pixel values. Be very explicit about clipping to range, scaling and rounding.
int pixel_value = (int)round( 255 * (power_in_db[i] - min_db) / (max_db - min_db) );
if (pixel_value < 0) { pixel_value = 0; }
if (pixel_value > 255) { pixel_value = 255; }
Here, again, print out the values in a horizontal line, and compare with the grayscale values in your pgm (by hand, using the colorpicker in photoshop or gimp or similar).
At this point, you will have validated everything from end to end, and likely found the bug.
The code you produced, was almost correct. So, you didn't left me much to correct:
void Spectrogram::process(){
int transform_size = 1024;
int half = transform_size/2;
int step_size = transform_size/2;
double in[transform_size];
double processed[half];
fftw_complex *out;
fftw_plan p;
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * transform_size);
for (int x=0; x < wavFile->getSamples()/step_size; x++) {
// Fill the transformation array with a sample frame and apply the window function.
// Normalization is performed later
// (One error was here: you didn't set the last value of the array in)
for (int j = 0, int i = x * step_size; i < x * step_size + transform_size; i++, j++)
in[j] = wavFile->getSample(i) * windowHanning(j, transform_size);
p = fftw_plan_dft_r2c_1d(transform_size, in, out, FFTW_ESTIMATE);
fftw_execute(p); /* repeat as needed */
for (int i=0; i < half; i++) {
// (Here were some flaws concerning the access of the complex values)
out[i][0] *= (2./transform_size); // real values
out[i][1] *= (2./transform_size); // complex values
processed[i] = out[i][0]*out[i][0] + out[i][1]*out[i][1]; // power spectrum
processed[i] = 10./log(10.) * log(processed[i] + 1e-6); // dB
// The resulting spectral values in 'processed' are in dB and related to a maximum
// value of about 96dB. Normalization to a value range between 0 and 1 can be done
// in several ways. I would suggest to set values below 0dB to 0dB and divide by 96dB:
// Transform all dB values to a range between 0 and 1:
if (processed[i] <= 0) {
processed[i] = 0;
} else {
processed[i] /= 96.; // Reduce the divisor if you prefer darker peaks
if (processed[i] > 1)
processed[i] = 1;
// This should be called each time fftw_plan_dft_r2c_1d()
// was called to avoid a memory leak:
The two corrected bugs were most probably responsible for the slight variation of successive transformation results. The Hanning window is very vell suited to minimize the "noise" so a different window would not have solved the problem (actually #Alex I already pointed to the 2nd bug in his point 2. But in his point 3. he added a -Inf-bug as log(0) is not defined which can happen if your wave file containts a stretch of exact 0-values. To avoid this the constant 1e-6 is good enough).
Not asked, but there are some optimizations:
put p = fftw_plan_dft_r2c_1d(transform_size, in, out, FFTW_ESTIMATE); outside the main loop,
precalculate the window function outside the main loop,
abandon the array processed and just use a temporary variable to hold one spectral line at a time,
the two multiplications of out[i][0] and out[i][1] can be abandoned in favour of one multiplication with a constant in the following line. I left this (and other things) for you to improve
Thanks to #Maxime Coorevits additionally a memory leak could be avoided: "Each time you call fftw_plan_dft_rc2_1d() memory are allocated by FFTW3. In your code, you only call fftw_destroy_plan() outside the outer loop. But in fact, you need to call this each time you request a plan."
Audacity typically doesn't map one frequency bin to one horizontal line, nor one sample period to one vertical line. The visual effect in Audacity may be due to resampling of the spectrogram picture in order to fit the drawing area.

Matrix multiplication in a cpp file for Matlab

How would I do a matrix multiplication in cpp format that would after be compiled into a mex file?
My normal matrix multiplication in a Matlab script is as follow:
cMatrix = (1 / r) * pfMatrix * wcMatrix; %here pfMatrix is 2x3 and wcMatrix is 3x8
% Hence cMatrix is 2x8
% r is a scalar
The pfMatrix, wcMatrix and r are declared correctly in the cpp file and they have the same values as in the script. However cMatrix doesn't give me the same results. Here the implementation of the Matrix multiplication in the cpp :
int i, n, j;
for (i = 0; i<1; i++)
for (n = 0; n<7; n++)
for (j = 0; j<2; j++)
d->cMatrix[i][n] += (d->pfMatrix[i][j]) * (d->wcMatrix[j][n]);
d->cMatrix[i][n] = (1 / d->r) * d->cMatrix[i][n];
I modified the loop following Ben Voigt answer. The results in cMatrix are still not identical to the one calculated from the Matlab script.
For example :
pfMatrix = [7937.91049469652,0,512;0,7933.81033431703,384];
wcMatrix = [-0.880633810389421,-1.04063381038942,-1.04063381038942,-0.880633810389421,-0.815633810389421,-1.10563381038942,-1.10563381038942,-0.815633810389421;-0.125,-0.125,0.125,0.125,-0.29,-0.29,0.29,0.29;100,100,100,100,100,100,100,100];
r = 100;
In this case, cMatrix(1,1) is :
(pfMatrix(1,1)*wcMatrix(1,1) + pfMatrix(1,2)*wcMatrix(2,1) + pfMatrix(1,3)*wcMatrix(3,1)) / r = 442.09
However, with the mex file the equivalent result is 959.
Edit #2:
I found the error in an element of pfMatrix that was not declared correctly (missing a division by 2). So the answer of Ben Voigt is working correctly. However, there is still a slight difference between the two results (Matlab script gives 442 and the mex gives 447, could it be a results of different data type?).
Edit #3:
Found the error and it was not related with the matrix multiplication loop.
Using your result matrix as scratch space is not a great idea. The compiler has to worry about aliasing, which means it can't optimize.
Try an explicit working variable, which also provides a convenient place to zero it:
for (int i = 0; i < 2; ++i) {
for (int n = 0; n < 8; ++n) {
double accum = 0.0;
for (int j = 0; j < 3; ++j) {
accum += (d->pfMatrix[i][j]) * (d->wcMatrix[j][n]);
d->cMatrix[i][n] = accum / d->r;
Your ranges were also wrong, which I've fixed.
(Also note that good performance on large matrices requires banding to get good cache behavior, however that shouldn't be an issue on a product of this size.)
A multiplication between matrices must be in this way: A[m][n] * B[n][p] = R[m][p].
The conditions that you wrote in the for loops are not correct and doesn't respect the matrix dimensions.
Look also at the Eigen libraries, which are open-source and provide a simple way to do the matrix multiplications.

Efficient 2D FFT of fixed length real input data in C/C++

I'm developing an algorithm that calls several times to a FFT function. I have several time constraints (real-time desired) so I need to minimize the time expended in every FFT call.
I'm working with OpenCV library and I have already implemented my code with two different approaches:
Using FFTW library. Data/memory management + FFT(8ms) = 14ms (in mean, FFT_MEASURE flag).
Using OpenCV fft function. Data/memory management + FFT (21ms) = 23ms (in mean).
As my input data is always fixed as a real image of 512x512 pixels, do you think if I implement myself the FFT algorithm based in the mathematical definition of DFT, storing the sine/cosine tables can I achieve better performance or the FFTW library is really very optimized? Any better ideas?
All ideas and suggestions will be really appreciated. By now, I don't consider paralellization or GPU implementation.
Thank you
System: Intel Xeon 5130 2.0GHz CPU in Windows 7, Visual Studio 10.0 and FFTW 3.3.3 (compiled following instructions in the site), OpenCV 2.4.3.
Code example for FFT call with FFTW (input: OpenCV Mat CV_32F (1 channel, float type), output OpenCV Mat CV_32FC2 (2 channels, float type):
float *im_data;
fftwf_complex *data_in;
fftwf_complex *fft;
fftwf_plan plan_f;
int i, j, k;
int height=I.rows;
int width=I.cols;
int N=height*width;
float* outdata = new float[2*N];
im_data = ( float* ) I.data;
data_in = ( fftwf_complex* )fftwf_malloc( sizeof( fftwf_complex ) * N );
fft = ( fftwf_complex* )fftwf_malloc( sizeof( fftwf_complex ) * N );
plan_f = fftwf_plan_dft_2d( height , width , data_in , fft , FFTW_FORWARD , FFTW_MEASURE );
for(int i = 0,k=0; i < height; ++i) {
float* row = I.ptr<float>(i);
for(int j = 0; j < width; j++) {
data_in[k][1] =(float)0.0;
fftwf_execute( plan_f );
int width2=2*width;
// writing output matrix: RealFFT[0],ImaginaryFFT[0],RealFFT[1],ImaginaryFFT[1],...
for( i = 0, k = 0 ; i < height ; i++ ) {
for( j = 0 ; j < width2 ; j++ ) {
outdata[i * width2 + j] = ( float )fft[k][0];
outdata[i * width2 + j+1] = ( float )fft[k][1];
Mat fft_I(height,width,CV_32FC2,outdata);
fftwf_destroy_plan( plan_f );
fftwf_free( data_in );
fftwf_free( fft );
return fft_I;
Your FFT time with FFTW seems very high. To get the best of out FFTW with fixed size FFTs you should generate a plan using the FFTW_PATIENT flag and then ideally save the generated "wisdom" for subsequent re-use. You can generate wisdom either from your own code or using the fftw-wisdom tool.
The FFT from the Intel Math Kernel Library (separate from the Intel compiler) is faster than FFTW most of the time. I don't know if it will be enough of an improvement in your case to justify the price though.
I will agree with the others that rolling your own FFT is probably not a good use of your time (unless you are wanting to learn how to do it). The available FFT implementations (FFTW, MKL) have been so finely tuned over many years. I'm not saying that you can't do better, but it would probably be a lot of work and time for marginal gains.
Believe me fftw is realy very optimized, there is very small chance, that you can do it better.
Which compiler you have used for compiling fftw? Sometimes compiler from Intel gives better perfomance than gcc

Violation access in time compilation (0xC0000005)

The process I want to do is to make the FFT to an image (stored in “imagen”) , and then, multiply it with a filter ‘H’, after that, the inverse FFT will be done also.
The code is shown below:
int ancho;
int alto;
ancho=ui.imageframe->imagereader->GetBufferedRegion().GetSize()[0]; //ancho=widht of the image
alto=ui.imageframe->imagereader->GetBufferedRegion().GetSize()[1]; //alto=height of the image
double *H ;
H =matrix2D_H(ancho,alto,eta,sigma); // H is calculated
// We want to get: F= fft(f) ; H*F ; f'=ifft(H*F)
// Inicialization of the neccesary elements for the calculation of the fft
fftw_complex *out;
fftw_plan p;
int N= (ancho/2+1)*alto; //number of points of the image
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*N);
double *in = (double*) imagen.GetPointer(); // conversion of itk.smartpointer --> double*
p = fftw_plan_dft_r2c_2d(ancho, alto, in, out, FFTW_ESTIMATE); // FFT planning
fftw_execute(p); // FFT calculation
/* Multiplication of the Output of the FFT with the Filter H*/
int a = alto;
int b = ancho/2 +1; // The reason for the second dimension to have this value is that when the FFT calculation of a real image is performed only the non-redundants outputs are calculated, that’s the reason for the output of the FFT and the filter ‘H’ to be equal.
// Matrix point-by-point multiplicaction: [axb]*[axb]
fftw_complex* res ; // result will be stored here
res = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*a*b);
res = multiply_matrix_2D(out,H, a, b);
The problem is located here, in the loop inside the function ‘multiply_matrix_2D’:
fftw_complex* prueba_r01::multiply_matrix_2D(fftw_complex* out, double* H, int M ,int N){
/* The matrix out[MxN] or [n0x(n1/2)+1] is the image after the FFT , and the out_H[MxN] is the filter in the frequency domain,
both are multiplied POINT TO POINT, it has to be called twice, one for the imaginary part and another for the normal part
fftw_complex *H_cast;
H_cast = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*M*N);
H_cast= reinterpret_cast<fftw_complex*> (H); // casting from double* to fftw_complex*
fftw_complex *res; // the result of the multiplication will be stored here
res = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*M*N);
//Loop for calculating the matrix point-to-point multiplication
for (int x = 0; x<M ; x++){
for (int y = 0; y<N ; y++){
res[x*N+y][0] = out[x*N+y][0]*(H_cast[x*N+y][0]+H_cast[x*N+y][1]);
res[x*N+y][1] = out[x*N+y][1]*(H_cast[x*N+y][0]+H_cast[x*N+y][1]);
return res;
With the values of x = 95 and y = 93 being M = 191 and N = 96;
Uncontroled exception at 0x004273ab in prueba_r01.exe: 0xC0000005 acess infraction reading 0x01274000.
imagen http://img846.imageshack.us/img846/4585/accessviolationproblem.png
Where a lot of values of the variables are in red, and for translation issue: H_cast[][1] has in the value box : “Error30CXX0000 : impossible to evaluate the expression”.
I will really appreciate any kind of help with this please!!
This part of the code
H_cast = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*M*N);
H_cast= reinterpret_cast<fftw_complex*> (H); // casting from double* to fftw_complex*
first allocates a new buffer for H_cast and then immediately sets it to point to the original H instead. It doesn't copy the data, just the pointer.
At the end of the function some buffer is free'd
which seems to free the data pointed to by H and not the buffer allocated in the function.
When getting back to the caller, the H there is lost!
There is an FFT class inside of ITK that can use fftw (USE_FFTW) from cmake for configuration. This class describes how to reference the ITK raw buffer memory from fftw.
PS: The upcoming ITKv4 has greatly improved the fftw compatibility.

Differences between FFTW and CUFFT output

In the char I have posted below, I am comparing the results from an IFFT run in FFTW and CUFFT.
What are the possible reasons this is coming out different? Is it really THAT much round off error?
Here is the relevant code snippet:
cufftHandle plan;
cufftComplex *d_data;
cufftComplex *h_data;
cudaMalloc((void**)&d_data, sizeof(cufftComplex)*W);
complex<float> *temp = (complex<float>*)fftwf_malloc(sizeof(fftwf_complex) * W);
h_data = (cufftComplex *)malloc(sizeof(cufftComplex)*W);
memset(h_data, 0, W*sizeof(cufftComplex));
/* Create a 1D FFT plan. */
cufftPlan1d(&plan, W, CUFFT_C2C, 1);
if (!reader->getData(rowBuff, row))
return 0;
// copy from read buffer to our FFT input buffer
memcpy(indata, rowBuff, fCols * sizeof(complex<float>));
for(int c = 0; c < W; c++)
h_data[c] = make_cuComplex(indata[c].real(), indata[c].imag());
cutilSafeCall(cudaMemcpy(d_data, h_data, W* sizeof(cufftComplex), cudaMemcpyHostToDevice));
cufftExecC2C(plan, d_data, d_data, CUFFT_INVERSE);
cutilSafeCall(cudaMemcpy(h_data, d_data,W * sizeof(cufftComplex), cudaMemcpyDeviceToHost));
for(int c = 0; c < W; c++)
temp[c] =(cuCrealf(h_data[c]), cuCimagf(h_data[c]));
//execute ifft plan on "indata"
//dump out abs() values of the first 50 temp and outdata values. Had to convert h_data back to a normal complex
ifft was defined like so:
ifft = fftwf_plan_dft_1d(freqCols, reinterpret_cast<fftwf_complex*>(indata),
and to generate the graph I dumped out h_data and outdata after the fftw_execute
W is the width of the row of the image I am processing.
See anything glaringly obvious?
So it looks like CUFFT is returning a real and imaginary part, and FFTW only the real. The cuCabsf() function that comes iwth the CUFFT complex library causes this to give me a multiple of sqrt(2) when I have both parts of the complex
As an aside - I never have been able to get exactly matching results in the intermediate steps between FFTW and CUFFT. If you do both the IFFT and FFT though, you should get something close.