ifft results are different from original signal - c++

FFT works fine, but when I want to take IFFT I always see the same graph from its results. Results are complex and graph always the same regardless of the original signal.
in real part graph is a -sin with period = frame size
in imaginary part it is a -cos with the same period
Where can be a problem?
original signal:
IFFT real value (on pics are only half of frame):
Algorithm FFT that I use.
double** FFT(double** f, int s, bool inverse) {
if (s == 1) return f;
int sH = s / 2;
double** fOdd = new double*[sH];
double** fEven = new double*[sH];
for (int i = 0; i < sH; i++) {
int j = 2 * i;
fOdd[i] = f[j];
fEven[i] = f[j + 1];
}
double** sOdd = FFT(fOdd, sH, inverse);
double** sEven = FFT(fEven, sH, inverse);
double**spectr = new double*[s];
double arg = inverse ? DoublePI / s : -DoublePI / s;
double*oBase = new double[2]{ cos(arg),sin(arg) };
double*o = new double[2]{ 1,0 };
for (int i = 0; i < sH; i++) {
double* sO1 = Mul(o, sOdd[i]);
spectr[i] = Sum(sEven[i], sO1);
spectr[i + sH] = Dif(sEven[i], sO1);
o = Mul(o, oBase);
}
return spectr;
}

The "butterfly" portion is applying the coefficients incorrectly:
for (int i = 0; i < sH; i++) {
double* sO1 = sOdd[i];
double* sE1 = Mul(o, sEven[i]);
spectr[i] = Sum(sO1, sE1);
spectr[i + sH] = Dif(sO1, sE1);
o = Mul(o, oBase);
}
Side Note:
I kept your notation but it makes things confusing:
fOdd has indexes 0, 2, 4, 6, ... so it should be fEven
fEven has indexes 1, 3, 5, 7, ... so it should be fOdd
really sOdd should be sLower and sEven should be sUpper since they correspond to the 0:s/2 and s/2:s-1 elements of the spectrum respectively:
sLower = FFT(fEven, sH, inverse); // fEven is 0, 2, 4, ...
sUpper = FFT(fOdd, sH, inverse); // fOdd is 1, 3, 5, ...
Then the butterfly becomes:
for (int i = 0; i < sH; i++) {
double* sL1 = sLower[i];
double* sU1 = Mul(o, sUpper[i]);
spectr[i] = Sum(sL1, sU1);
spectr[i + sH] = Dif(sL1, sU1);
o = Mul(o, oBase);
}
When written like this it is easier to compare to this pseudocode example on wikipedia.
And #Dai is correct you are going to leak a lot of memory

Regarding the memory, you can use the std::vector to encapsulate dynamically-allocated arrays and to ensure they're deallocated when execution leaves scope. You could use unique_ptr<double[]> but the performance gains are not worth it IMO and you lose the safety of the at() method.
(Based on #Robb's answer)
A few other tips:
Avoid cryptic identifiers - programs should be readable, and names like "f" and "s" make your program harder to read and maintain.
Type-based Hungarian notation is frowned upon as modern editors show type information automatically so it adds unnecessary complication to identifier names.
Use size_t for indexes, not int
The STL is your friend, use it!
Preemptively prevent bugs by using const to prevent accidental mutation of read-only data.
Like so:
#include <vector>
using namespace std;
vector<double> fastFourierTransform(const vector<double> signal, const bool inverse) {
if( signal.size() < 2 ) return signal;
const size_t half = signal.size() / 2;
vector<double> lower; lower.reserve( half );
vector<double> upper; upper.reserve( half );
bool isEven = true;
for( size_t i = 0; i < signal.size(); i++ ) {
if( isEven ) lower.push_back( signal.at( i ) );
else upper.push_back( signal.at( i ) );
isEven = !isEven;
}
vector<double> lowerFft = fastFourierTransform( lower, inverse );
vector<double> upperFft = fastFourierTransform( upper, inverse );
vector<double> result;
result.reserve( signal.size() );
double arg = ( inverse ? 1 : -1 ) * ( DoublePI / signal.size() );
// Ideally these should be local `double` values passed directly into `Mul`.
unique_ptr<double[]> oBase = make_unique<double[]>( 2 );
oBase[0] = cos(arg);
oBase[1] = sin(arg);
unique_ptr<double[]> o = make_unique<double[]>( 2 );
o[0] = 0;
o[1] = 0;
for( size_t i = 0; i < half; i++ ) {
double* lower1 = lower.at( i );
double* upper1 = Mul( o, upper.at( i ) );
result.at( i ) = Sum( lower1, upper1 );
result.at( i + half ) = Dif( lower1, upper1 );
o = Mul( o, oBase );
}
// My knowledge of move-semantics of STL containers is a bit rusty - so there's probably a better way to return the output 'result' vector.
return result;
}

Related

Expression must be modifiable lvalue

I'm trying to store a structure of capacity 6 in another structure.
struct eachElement {
float centerX;
float centerY;
int flagMountain;
};
eachElement cn[6];
struct characters {
eachElement each[6];
};
characters chars[1500];
float strtPt = 235.0;
float initializer = strtPt;
float endPt = 120.0;
float holder = endPt;
int count = 0;
int ctr = 0;
int cr = 0;
int countCharacters = 0;
int dup = 0;
while (holder < m_img_height) {
for (float i = initializer ; i < m_img_width - 500; ) {
float j = holder;
int ck = 1;
while (ck < 4) {
cvCircle(image, cvPoint(i, j), 3, cvScalar(0, 255, 0), 1);
cr = ctr++;
cn[cr].centerX = i;
cn[cr].centerY = j;
cn[cr].flagMountain = 1;
cvCircle(image, cvPoint(i + 5, j + 5), 1, cvScalar(0, 255, 0), 1);
j += 9.448;
count++;
ck++;
}
if (count == 6) {
i += 23.811;
count = 0;
ctr = 0;
dup = countCharacters++;
chars[dup].each = cn;
}
else
i += 9.448;
}
holder += 56.686;
}
In this line,
chars[dup].each = cn;
it gives me an error saying expression must be a modifiable lvalue.
Even though I'm assigning it to the same type, I got this error.
Any help would be appreciated.
I don't know what you try to achieve with your code, but you must specify an array index for the each member array to access any of it's innards:
chars[dup].each[0] = cn[0];
// ^^^ ^^^
The array start address cannot be changed
chars[dup].each = cn;
hence the compiler error.
To fix that use std::copy():
std::copy(std::begin(cn),std::end(cn),std::begin(chars[dup].each));
Arrays do not have the copy assignment operator. You have to copy each element of one array into another array.
Thus this statement
chars[dup].each = cn;
is wrong.
You can use standard algorithm std::copy to copy one array into another. For example
#include <algorithm>
//...
std::copy( std::begin( cn ), std::end( cn ), std::begin( chars[dup].each ) );
Or if the compiler does not support function std::begin and std::end then you can just write
#include <algorithm>
//...
std::copy( cn, cn + 6, chars[dup].each );
or
std::copy( cn, cn + sizeof( cn ) / sizeof( *cn ), chars[dup].each );
Your struct has fixed buffer
struct characters {
eachElement each[6];
};
and your assignemt
chars[dup].each = cn;
will be legal (I don't know if will make sense) if the struct were like this:
struct characters {
eachElement * each;
};
The code is hard to understand for me, I cannot propose an equivalent solution for your goal. But such is the background of error message
maybe, given your declaration - but I don't understand the goal - this style would work:
for(int i=1;i<6;i++)
chars[dup].each[i] = cn;

Modifying a function to use SSE intrinsics

I am trying to calculate the approximate value of the radical: sqrt(i + sqrt(i + sqrt(i + ...))) using SSE in order to get a speedup from vectorization (I also read that the SIMD square-root function runs approximately 4.7x faster than the innate FPU square-root function). However, I am having problems getting the same functionality in the vectorized version; I am getting the incorrect value and I'm not sure
My original function is this:
template <typename T>
T CalculateRadical( T tValue, T tEps = std::numeric_limits<T>::epsilon() )
{
static std::unordered_map<T,T> setResults;
auto it = setResults.find( tValue );
if( it != setResults.end() )
{
return it->second;
}
T tPrev = std::sqrt(tValue + std::sqrt(tValue)), tCurr = std::sqrt(tValue + tPrev);
// Keep iterating until we get convergence:
while( std::abs( tPrev - tCurr ) > tEps )
{
tPrev = tCurr;
tCurr = std::sqrt(tValue + tPrev);
}
setResults.insert( std::make_pair( tValue, tCurr ) );
return tCurr;
}
And the SIMD equivalent (when this template function is instantiated with T = float and given tEps = 0.0005f) I have written is:
// SSE intrinsics hard-coded function:
__m128 CalculateRadicals( __m128 values )
{
static std::unordered_map<float, __m128> setResults;
// Store our epsilon as a vector for quick comparison:
__declspec(align(16)) float flEps[4] = { 0.0005f, 0.0005f, 0.0005f, 0.0005f };
__m128 eps = _mm_load_ps( flEps );
union U {
__m128 vec;
float flArray[4];
};
U u;
u.vec = values;
float flFirstVal = u.flArray[0];
auto it = setResults.find( flFirstVal );
if( it != setResults.end( ) )
{
return it->second;
}
__m128 prev = _mm_sqrt_ps( _mm_add_ps( values, _mm_sqrt_ps( values ) ) );
__m128 curr = _mm_sqrt_ps( _mm_add_ps( values, prev ) );
while( _mm_movemask_ps( _mm_cmplt_ps( _mm_sub_ps( curr, prev ), eps ) ) != 0xF )
{
prev = curr;
curr = _mm_sqrt_ps( _mm_add_ps( values, prev ) );
}
setResults.insert( std::make_pair( flFirstVal, curr ) );
return curr;
}
I am calling the function in a loop using the following code:
long long N;
std::cin >> N;
float flExpectation = 0.0f;
long long iMultipleOf4 = (N / 4LL) * 4LL;
for( long long i = iMultipleOf4; i > 0LL; i -= 4LL )
{
__declspec(align(16)) float flArray[4] = { static_cast<float>(i - 3), static_cast<float>(i - 2), static_cast<float>(i - 1), static_cast<float>(i) };
__m128 arg = _mm_load_ps( flArray );
__m128 vec = CalculateRadicals( arg );
float flSum = Sum( vec );
flExpectation += flSum;
}
for( long long i = iMultipleOf4; i < N; ++i )
{
flExpectation += CalculateRadical( static_cast<float>(i), 0.0005f );
}
flExpectation /= N;
I get the following outputs for input 5:
With SSE version: 2.20873
With FPU verison: 1.69647
Where does the discrepancy come from, what am I doing wrong in the SIMD equivalent?
EDIT: I've realized that the Sum function is relevant here:
float Sum( __m128 vec1 )
{
float flTemp[4];
_mm_storeu_ps( flTemp, vec1 );
return flTemp[0] + flTemp[1] + flTemp[2] + flTemp[3];
}
SSE intrinsics can be pretty tedious sometimes...
But not here. You just screwed up your loop :
for( long long i = iMultipleOf4; i > 0LL; i -= 4LL )
I doubt it's doing what you expected. If iMultipleOf4 is 4, then your function will compute with 4,3,2,1 but not 0. And then your 2nd loop redo the computation with 4.
The two function give the same results for me, and the loops gives the same flExpectation after correction. Though there still is a small difference, probably because the FPUs have slight differences in how they compute.

3D FFT Using Intel MKL with Zero Padding

I want to compute 3D FFT using Intel MKL of an array which has about 300×200×200 elements. This 3D array is stored as a 1D array of type double in a columnwise fashion:
for( int k = 0; k < nk; k++ ) // Loop through the height.
for( int j = 0; j < nj; j++ ) // Loop through the rows.
for( int i = 0; i < ni; i++ ) // Loop through the columns.
{
ijk = i + ni * j + ni * nj * k;
my3Darray[ ijk ] = 1.0;
}
I want to perform not-in-place FFT on the input array and prevent it from getting modified (I need to use it later in my code) and then do the backward computation in-place. I also want to have zero padding.
My questions are:
How can I perform the zero-padding?
How should I deal with the size of the arrays used by FFT functions when zero padding is included in the computation?
How can I take out the zero padded results and get the actual result?
Here is my attempt to the problem, I would be absolutely thankful for any comment, suggestion, or hint.
#include <stdio.h>
#include "mkl.h"
int max(int a, int b, int c)
{
int m = a;
(m < b) && (m = b);
(m < c) && (m = c);
return m;
}
void FFT3D_R2C( // Real to Complex 3D FFT.
double *in, int nRowsIn , int nColsIn , int nHeightsIn ,
double *out )
{
int n = max( nRowsIn , nColsIn , nHeightsIn );
// Round up to the next highest power of 2.
unsigned int N = (unsigned int) n; // compute the next highest power of 2 of 32-bit n.
N--;
N |= N >> 1;
N |= N >> 2;
N |= N >> 4;
N |= N >> 8;
N |= N >> 16;
N++;
/* Strides describe data layout in real and conjugate-even domain. */
MKL_LONG rs[4], cs[4];
// DFTI descriptor.
DFTI_DESCRIPTOR_HANDLE fft_desc = 0;
// Variables needed for out-of-place computations.
MKL_Complex16 *in_fft = new MKL_Complex16 [ N*N*N ];
MKL_Complex16 *out_fft = new MKL_Complex16 [ N*N*N ];
double *out_ZeroPadded = new double [ N*N*N ];
/* Compute strides */
rs[3] = 1; cs[3] = 1;
rs[2] = (N/2+1)*2; cs[2] = (N/2+1);
rs[1] = N*(N/2+1)*2; cs[1] = N*(N/2+1);
rs[0] = 0; cs[0] = 0;
// Create DFTI descriptor.
MKL_LONG sizes[] = { N, N, N };
DftiCreateDescriptor( &fft_desc, DFTI_DOUBLE, DFTI_REAL, 3, sizes );
// Configure DFTI descriptor.
DftiSetValue( fft_desc, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX );
DftiSetValue( fft_desc, DFTI_PLACEMENT, DFTI_NOT_INPLACE ); // Out-of-place transformation.
DftiSetValue( fft_desc, DFTI_INPUT_STRIDES , rs );
DftiSetValue( fft_desc, DFTI_OUTPUT_STRIDES , cs );
DftiCommitDescriptor( fft_desc );
DftiComputeForward ( fft_desc, in , in_fft );
// Change strides to compute backward transform.
DftiSetValue ( fft_desc, DFTI_INPUT_STRIDES , cs);
DftiSetValue ( fft_desc, DFTI_OUTPUT_STRIDES, rs);
DftiCommitDescriptor( fft_desc );
DftiComputeBackward ( fft_desc, out_fft, out_ZeroPadded );
// Printing the zero padded 3D FFT result.
for( long long i = 0; i < (long long)N*N*N; i++ )
printf("%f\n", out_ZeroPadded[i] );
/* I don't know how to take out the zero padded results and
save the actual result in the variable named "out" */
DftiFreeDescriptor ( &fft_desc );
delete[] in_fft;
delete[] out_ZeroPadded ;
}
int main()
{
int n = 10;
double *a = new double [n*n*n]; // This array is real.
double *afft = new double [n*n*n];
// Fill the array with some 'real' numbers.
for( int i = 0; i < n*n*n; i++ )
a[ i ] = 1.0;
// Calculate FFT.
FFT3D_R2C( a, n, n, n, afft );
printf("FFT results:\n");
for( int i = 0; i < n*n*n; i++ )
printf( "%15.8f\n", afft[i] );
delete[] a;
delete[] afft;
return 0;
}
just few hints:
Power of 2 size
I don't like the way you are computing the size
so let Nx,Ny,Nz be the size of input matrix
and nx,ny,nz size of the padded matrix
for (nx=1;nx<Nx;nx<<=1);
for (ny=1;ny<Ny;ny<<=1);
for (nz=1;nz<Nz;nz<<=1);
now zero pad by memset to zero first and then copy the matrix lines
padding to N^3 instead of nx*ny*nz can result in big slowdowns
if nx,ny,nz are not close to each other
output is complex
if I get it right a is input real matrix
and afft the output complex matrix
so why not allocate the space for it correctly?
double *afft = new double [2*nx*ny*nz];
complex number is real+imaginary part so 2 values per number
that goes also for the final print of result
and some "\r\n" after lines would be good for viewing
3D DFFT
I do not use nor know your DFFT library
I use mine own, but anyway 3D DFFT can be done by 1D DFFT
if you do it by the lines ... see this 2D DFCT by 1D DFFT
in 3D is the same but you need to add one pass and different normalization constant
this way you can have single line buffer double lin[2*max(nx,ny,nz)];
and make the zero padding on the run (so no need to have bigger matrix in memory)...
but that involves coping the lines on each 1D DFFT ...

3D Convolution with Intel MKL

I have written a C/C++ code which uses Intel MKL to compute the 3D convolution of an array which has about 300×200×200 elements. I want to apply a kernel which is either 3×3×3 or 5×5×5. Both the 3D input array and the kernel have real values.
This 3D array is stored as a 1D array of type double in a columnwise fashion. Similarly the kernel is of type double and is saved columnwise. For example,
for( int k = 0; k < nk; k++ ) // Loop through the height.
for( int j = 0; j < nj; j++ ) // Loop through the rows.
for( int i = 0; i < ni; i++ ) // Loop through the columns.
{
ijk = i + ni * j + ni * nj * k;
my3Darray[ ijk ] = 1.0;
}
For the computation of convolution, I want to perform not-in-place FFT on the input array and the kernel and prevent them from getting modified (I need to use them later in my code) and then do the backward computation in-place.
When I compare the result obtained from my code with the one obtained by MATLAB they are very different. Could someone kindly help me fix the issue? What is missing in my code?
Here is the MATLAB code I used:
a = ones( 10, 10, 10 );
kernel = ones( 3, 3, 3 );
aconvolved = convn( a, kernel, 'same' );
Here is my C/C++ code:
#include <stdio.h>
#include "mkl.h"
void Conv3D(
double *in, double *ker, double *out,
int nRows, int nCols, int nHeights)
{
int NI = nRows;
int NJ = nCols;
int NK = nHeights;
double *in_fft = new double [NI*NJ*NK];
double *ker_fft = new double [NI*NJ*NK];
DFTI_DESCRIPTOR_HANDLE fft_desc = 0;
MKL_LONG sizes[] = { NK, NJ, NI };
MKL_LONG strides[] = { 0, NJ*NI, NI, 1 };
DftiCreateDescriptor( &fft_desc, DFTI_DOUBLE, DFTI_REAL, 3, sizes );
DftiSetValue ( fft_desc, DFTI_PLACEMENT , DFTI_NOT_INPLACE); // Out-of-place computation.
DftiSetValue ( fft_desc, DFTI_INPUT_STRIDES , strides );
DftiSetValue ( fft_desc, DFTI_OUTPUT_STRIDES, strides );
DftiSetValue ( fft_desc, DFTI_BACKWARD_SCALE, 1/NI/NJ/NK );
DftiCommitDescriptor( fft_desc );
DftiComputeForward ( fft_desc, in , in_fft );
DftiComputeForward ( fft_desc, ker, ker_fft );
for (long long i = 0; i < (long long)NI*NJ*NK; ++i )
out[i] = in_fft[i]*ker_fft[i];
// In-place computation.
DftiSetValue ( fft_desc, DFTI_PLACEMENT, DFTI_INPLACE );
DftiCommitDescriptor( fft_desc );
DftiComputeBackward ( fft_desc, out );
DftiFreeDescriptor ( &fft_desc );
delete[] in_fft;
delete[] ker_fft;
}
int main(int argc, char* argv[])
{
int n = 10;
int nkernel = 3;
double *a = new double [n*n*n]; // This array is real.
double *aconvolved = new double [n*n*n]; // The convolved array is also real.
double *kernel = new double [nkernel*nkernel*nkernel]; // kernel is real.
// Fill the array with some 'real' numbers.
for( int i = 0; i < n*n*n; i++ )
a[ i ] = 1.0;
// Fill the kernel with some 'real' numbers.
for( int i = 0; i < nkernel*nkernel*nkernel; i++ )
kernel[ i ] = 1.0;
// Calculate the convolution.
Conv3D( a, kernel, aconvolved, n, n, n );
printf("Convolved:\n");
for( int i = 0; i < n*n*n; i++ )
printf( "%15.8f\n", aconvolved[i] );
delete[] a;
delete[] kernel;
delete[] aconvolved;
return 0;
}
You can't reverse the FFT with real-valued frequency data (just the magnitude). A forward FFT needs to output complex data. This is done by setting the DFTI_FORWARD_DOMAIN setting to DFTI_COMPLEX.
DftiCreateDescriptor( &fft_desc, DFTI_DOUBLE, DFTI_COMPLEX, 3, sizes );
Doing this implicitly sets the backward domain to complex too.
You will also need a complex data type. Probably something like,
MKL_Complex16* in_fft = new MKL_Complex16[NI*NJ*NK];
This means you will have to multiply both the real and imaginary parts:
for (size_t i = 0; i < (size_t)NI*NJ*NK; ++i) {
out_fft[i].real = in_fft[i].real * ker_fft[i].real;
out_fft[i].imag = in_fft[i].imag * ker_fft[i].imag;
}
The output of the inverse FFT is also complex, and assuming your input data is real, you can just grab the .real component and that is your result. This means you'll need a temporary complex output array (say, out_fft as above).
Also note that to avoid artifacts, you want the size of your fft to be (at least) M+N-1 on each dimension. Generally you would choose the next highest power of two for speed.
I strongly suggest you implement it in MATLAB first, using FFTs. There are many such implementations available (example), but I would start from the basics and make a simple function on your own.

FFTW and OpenCV's C++ interface, real and imaginary part in Mat output

I'm trying to code a FFT/IFFT function with FFTW 3.3 and OpenCV 2.1 using the C++ interface. I've seen a lot of examples using the old OpenCV formats and I did a direct conversion, but something doesn't work.
The objective of my function is to return a Mat object with the real part and the imaginary part of the FFT, like dft default OpenCV function does. Here is the code of the function. Program gets blocked with memory problem in the lines that copy im_data to data_in.
Does somebody know what am I doing wrong? Thank you
Mat fft_sr(Mat& I)
{
double *im_data;
double *realP_data;
double *imP_data;
fftw_complex *data_in;
fftw_complex *fft;
fftw_plan plan_f;
int width = I.cols;
int height = I.rows;
int step = I.step;
int i, j, k;
Mat realP=Mat::zeros(height,width,CV_64F); // Real Part FFT
Mat imP=Mat::zeros(height,width,CV_64F); // Imaginary Part FFT
im_data = ( double* ) I.data;
realP_data = ( double* ) realP.data;
imP_data = ( double* ) imP.data;
data_in = ( fftw_complex* )fftw_malloc( sizeof( fftw_complex ) * width * height );
fft = ( fftw_complex* )fftw_malloc( sizeof( fftw_complex ) * width * height );
// Problem Here
for( i = 0, k = 0 ; i < height ; i++ ) {
for( j = 0 ; j < width ; j++ ) {
data_in[k][0] = ( double )im_data[i * step + j];
data_in[k][1] = ( double )0.0;
k++;
}
}
plan_f = fftw_plan_dft_2d( height, width, data_in, fft, FFTW_FORWARD, FFTW_ESTIMATE );
fftw_execute( plan_f );
// Copy real and imaginary data
for( i = 0, k = 0 ; i < height ; i++ ) {
for( j = 0 ; j < width ; j++ ) {
realP_data[i * step + j] = ( double )fft[k][0];
imP_data[i * step + j] = ( double )fft[k][1];
k++;
}
}
Mat fft_I(I.size(),CV_64FC2);
Mat fftplanes[] = {Mat_<double>(realP), Mat_<double>(imP)};
merge(fftplanes, 2, fft_I);
fftw_destroy_plan(plan_f);
fftw_free(data_in);
fftw_free(fft);
return fft_I;
}
You are using step wrong. It is meant to index into Mat::data. Since you already casted Mat::data to double* when assigning it to im_data, you can index into im_data "normally":
data_in[k][0] = im_data[i * width + j];
When using step the correct way to index is:
data_in[k][0] = ( double )I.data[i * step + j];
Update:
Try to access your images row-wise. That way you avoid running into problems with stride/step, while still exploiting fast access:
for (int i = 0; i < I.rows; i++)
{
double* row = I.ptr<double>(i);
for (int j = 0; j < I.cols; j++)
{
// Do something with the current pixel.
double someValue = row[j];
}
}
I know this is old but when you are using fftw you need to initialize fftw_complex *data_in
only after creating the plan for the fft, if i recall correctly when you create the plan it sets all the
*data_in values to 0.
so allocate before the plan and initialize after!
Statement
im_data = ( double* ) I.data;
defines im_data as double pointer to image data.
I think that should be mandatory that I was a double values image.