Discrete Fourier Transform implementation gives different result than OpenCV DFT - c++

We have implemented DFT and wanted to test it with OpenCV's implementation. The results are different.
our DFT's results are in order from smallest to biggest, whereas OpenCV's results are not in any order.
the first (0th) value is the same for both calculations, as in this case, the complex part is 0 (since e^0 = 1, in the formula). The other values are different, for example OpenCV's results contain negative values, whereas ours does not.
This is our implementation of DFT:
// complex number
std::complex<float> j;
j = -1;
j = std::sqrt(j);
std::complex<float> result;
std::vector<std::complex<float>> fourier; // output
// this->N = length of contour, 512 in our case
// foreach fourier descriptor
for (int n = 0; n < this->N; ++n)
{
// Summation in formula
for (int t = 0; t < this->N; ++t)
{
result += (this->centroidDistance[t] * std::exp((-j*PI2 *((float)n)*((float)t)) / ((float)N)));
}
fourier.push_back((1.0f / this->N) * result);
}
and this is how we calculate the DFT with OpenCV:
std::vector<std::complex<float>> fourierCV; // output
cv::dft(std::vector<float>(centroidDistance, centroidDistance + this->N), fourierCV, cv::DFT_SCALE | cv::DFT_COMPLEX_OUTPUT);
The variable centroidDistance is calculated in a previous step.
Note: please avoid answers saying use OpenCV instead of your own implementation.

You forgot to initialise result for each iteration of n:
for (int n = 0; n < this->N; ++n)
{
result = 0.0f; // initialise `result` to 0 here <<<
// Summation in formula
for (int t = 0; t < this->N; ++t)
{
result += (this->centroidDistance[t] * std::exp((-j*PI2 *((float)n)*((float)t)) / ((float)N)));
}
fourier.push_back((1.0f / this->N) * result);
}

Related

What is wrong with my 2D Array Gaussian Blur function in C++?

I am making a simple Gaussian blur function for a 2D array that is supposed to represent an image. The function just prints out the array values at the end (no actual image processing going on here). I was pretty sure that I had implemented everything correct, but the values I am getting for (N=3, sigma=1.5) are much lower than expected based on this calculator: http://dev.theomader.com/gaussian-kernel-calculator/
I am following this equation:
void gaussian_filter(int N, double sigma) {
double k[N][N];
for(int i=0; i<N; i++) { //Initialize kernal to 0
for(int j=0; j<N; j++) {
k[i][j] = 0;
}
}
double sum = 0.0; //There is an issue somewhere in this block of code
int change = (N/2);
double r, s = change * sigma * sigma;
for (int x = -change; x <= change; x++) {
for(int y = -change; y <= change; y++) {
r = sqrt(x*x + y*y);
k[x + change][y + change] = (exp(-(r*r)/s))/(M_PI * s);
sum += k[x + change][y + change];
}
}
for(int i = 0; i < N; ++i) { //Normalize
for(int j = 0; j < N; ++j) {
k[i][j] /= sum;
}
}
for(int i = 0; i < N; ++i) { //Print out array
for (int j = 0; j < N; ++j)
cout<<k[i][j]<<"\t";
}
cout<<endl;
}
}
Here is the expected output for N=3 and Sigma=1.5
Here is the current broken output for N=3 and Sigma=1.5
Why does s depend on change? I think you should do:
double r, s = 2 * sigma * sigma;
// instead of
// double r, s = change * sigma * sigma;
That website computes Gaussian kernels in an unorthodox manner:
The weights are calculated by numerical integration of the continuous gaussian distribution over each discrete kernel tap.
That is, it samples a continuous Gaussian kernel that has been convolved with a uniform (“box”) filter of 1 pixel wide. The resulting Gaussian is wider than advertised. I advise against this method.
The proper way to create a Gaussian kernel is to just sample the Gaussian function at given integer locations, for example x = [-3, -2, -1, 0, 1, 2, 3].
Do note that a 3-pixel kernel is not wide enough to represent a Gaussian. It is important to sample the tail of the curve, without it, the kernel doesn’t have the good properties of the Gaussian kernel. I recommend sampling up to 3 sigma to each side, leading to 2*ceil(3*sigma)+1 pixels. 2 sigma is the bare minimum, useful only when speed is more important than good results.
Do also note that the Gaussian is separable, you can apply two 1D kernels in succession, rather than a single 2D kernel. For the 9x9 kernel you get for sigma=1.5, this translates to 9+9=18 multiplications and additions, compared to 9x9=81 for the 2D kernel. This is a significant saving!

convolution implementation in c++

I want to implement 2D convolution function in C++ by myself, without using filter2D(). I'm trying to iterate all pixels of input image and kernel, then, assign new value to each pixel of dst.
However, I got this error.
Thread 1: EXC_BAD_ACCESS (code=1, address=0x0)
I found that this error tells I'm accessing nullptr, but I could not solve the problem. Here is my c++ code.
cv::Mat_<float> spatialConvolution(const cv::Mat_<float>& src, const cv::Mat_<float>& kernel)
{
// declare variables
Mat_<float> dst;
Mat_<float> flipped_kernel;
float tmp = 0.0;
// flip kernel
flip(kernel, flipped_kernel, -1);
// multiply and integrate
// input rows
for(int i=0;i<src.rows;i++){
// input columns
for(int j=0;j<src.cols;j++){
// kernel rows
for(int k=0;k<flipped_kernel.rows;k++){
// kernel columns
for(int l=0;l<flipped_kernel.cols;l++){
tmp += src.at<float>(i,j) * flipped_kernel.at<float>(k,l);
}
}
dst.at<float>(i,j) = tmp;
}
}
return dst.clone();
}
To simplify let's suppose you have kernel 3x3
k(0,0) k(0,1) k(0,2)
k(1,0) k(1,1) k(1,2)
k(2,0) k(2,1) k(2,2)
to calculate convolution you are scanning input image (marked as I) from left to fright, from top to bottom
and for every pixel of input image you assign one value calculated from the formula below:
newValue(y,x) = I(y-1,x-1) * k(0,0) + I(y-1,x) * k(0,1) + I(y-1,x+1) * k(0,2)
+ I(y,x-1) * k(1,0) + I(y,x) * k(1,1) + I(y,x+1) * k(1,2) +
+ I(y+1,x-1) * k(2,0) + I(y+1,x) * k(2,1) + I(y+1,x+1) * k(2,2)
------------------x------------>
|
|
| [k(0,0) k(0,1) k(0,2)]
y [k(1,0) k(1,1) k(1,2)]
| [k(2,0) k(2,1) k(2,2)]
|
(y,x) of input Image (I) is anchor point of kernel, to assign new value to I(y,x)
you need to multiply every k coefficient by corresponding point of I - your code doesn't do it.
First you need to create dst matrix with dimenstion as original image, and the same type of pixel.
Then you need to rewrite your loops to reflect formula described above:
cv::Mat_<float> spatialConvolution(const cv::Mat_<float>& src, const cv::Mat_<float>& kernel)
{
Mat dst(src.rows,src.cols,src.type());
Mat_<float> flipped_kernel;
flip(kernel, flipped_kernel, -1);
const int dx = kernel.cols / 2;
const int dy = kernel.rows / 2;
for (int i = 0; i<src.rows; i++)
{
for (int j = 0; j<src.cols; j++)
{
float tmp = 0.0f;
for (int k = 0; k<flipped_kernel.rows; k++)
{
for (int l = 0; l<flipped_kernel.cols; l++)
{
int x = j - dx + l;
int y = i - dy + k;
if (x >= 0 && x < src.cols && y >= 0 && y < src.rows)
tmp += src.at<float>(y, x) * flipped_kernel.at<float>(k, l);
}
}
dst.at<float>(i, j) = saturate_cast<float>(tmp);
}
}
return dst.clone();
}
Your memory access error is presumably happening due to the line:
dst.at<float>(i,j) = tmp;
because dst is not initialized. You can't assign something to that index of the matrix if it has no size/data. Instead, initialize the matrix first, as Mat_<float> is a declaration, not an initialization. Use one of the initializations where you can specify a cv::Size or the rows/columns from the different constructors for Mat (see the docs). For example, you can initialize dst with:
Mat dst{src.size(), src.type()};

HOG optimization with using SIMD

There are several attempts to optimize calculation of HOG descriptor with using of SIMD instructions: OpenCV, Dlib, and Simd. All of them use scalar code to add resulting magnitude to HOG histogram:
float histogram[height/8][width/8][18];
float ky[height], kx[width];
int idx[size];
float val[size];
for(size_t i = 0; i < size; ++i)
{
histogram[y/8][x/8][idx[i]] += val[i]*ky[y]*kx[x];
histogram[y/8][x/8 + 1][idx[i]] += val[i]*ky[y]*kx[x + 1];
histogram[y/8 + 1][x/8][idx[i]] += val[i]*ky[y + 1]*kx[x];
histogram[y/8 + 1][x/8 + 1][idx[i]] += val[i]*ky[y + 1]*kx[x + 1];
}
There the value of size depends from implementation but in general the meaning is the same.
I know that problem of histogram calculation with using of SIMD does not have a simple and effective solution. But in this case we have small size (18) of histogram. Can it help in SIMD optimizations?
I have found solution. It is a temporal buffer. At first we sum histogram to temporary buffer (and this operation can be vectorized). Then we add the sum from buffer to output histogram (and this operation also can be vectorized):
float histogram[height/8][width/8][18];
float ky[height], kx[width];
int idx[size];
float val[size];
float buf[18][4];
for(size_t i = 0; i < size; ++i)
{
buf[idx[i]][0] += val[i]*ky[y]*kx[x];
buf[idx[i]][1] += val[i]*ky[y]*kx[x + 1];
buf[idx[i]][2] += val[i]*ky[y + 1]*kx[x];
buf[idx[i]][3] += val[i]*ky[y + 1]*kx[x + 1];
}
for(size_t i = 0; i < 18; ++i)
{
histogram[y/8][x/8][i] += buf[i][0];
histogram[y/8][x/8 + 1][i] += buf[i][1];
histogram[y/8 + 1][x/8][i] += buf[i][2];
histogram[y/8 + 1][x/8 + 1][i] += buf[i][3];
}
You can do a partial optimisation by using SIMD to calculate all the (flattened) histogram indices and the bin increments. Then process these in a scalar loop afterwards. You probably also want to strip-mine this such that you process one row at a time, in order to keep the temporary bin indices and increments in cache. It might appear that this would be inefficient, due to the use of temporary intermediate buffers, but in practice I have seen a useful overall gain in similar scenarios.
uint32_t i = 0;
for (y = 0; y < height; ++y) // for each row
{
uint32_t inds[width * 4]; // flattened histogram indices for this row
float vals[width * 4]; // histogram bin increments for this row
// SIMD loop for this row - calculate flattened histogram indices and bin
// increments (scalar code shown for reference - converting this loop to
// SIMD is left as an exercise for the reader...)
for (x = 0; x < width; ++x, ++i)
{
indices[4*x] = (y/8)*(width/8)*18+(x/8)*18+idx[i];
indices[4*x+1] = (y/8)*(width/8)*18+(x/8 + 1)*18+idx[i];
indices[4*x+2] = (y/8+1)*(width/8)*18+(x/8)*18+idx[i];
indices[4*x+3] = (y/8+1)*(width/8)*18+(x/8 + 1)*18+idx[i];
vals[4*x] = val[i]*ky[y]*kx[x];
vals[4*x+1] = val[i]*ky[y]*kx[x+1];
vals[4*x+2] = val[i]*ky[y+1]*kx[x];
vals[4*x+3] = val[i]*ky[y+1]*kx[x+1];
}
// scalar loop for this row
float * const histogram_base = &histogram[0][0][0]; // pointer to flattened histogram
for (x = 0; x < width * 4; ++x) // for each set of 4 indices/increments in this row
{
histogram_base[indices[x]] += vals[x]; // update the (flattened) histogram
}
}

Dividing cv::Mat by a number using integer division

In OpenCV if cv::Mat (CV_8U) was divided by a number (int) the result will be rounded to the nearest number for example:
cv::Mat temp(1, 1, CV_8UC1, cv::Scalar(5));
temp /= 3;
std::cout <<"OpenCV Integer Division:" << temp;
std::cout << "\nNormal Integer Division:" << 5 / 3;
The result is:
OpenCV Integer Division: 2
Normal Integer Division: 1
It is obvious that OpenCV does not use integer division even if the type of the cv::Mat is CV_8U.
My questions are:
Why? Is not supposed for integers to be divided as integers. Why this strange behaviour of OpenCV?
Can I obtain integer division without iterating pixel by pixel and dividing it?
My current solution is:
for (size_t r = 0; r < temp.rows; r++){
auto row_ptr = temp.ptr<uchar>(r);
for (size_t c = 0; c < temp.cols; c++){
row_ptr[c] /= 3;
}
}
firstly : the overloaded operator for Division does the operation by converting the elements of matrix into double. it originally uses multiplication operator as: Mat / a =Mat * (1/a).
secondly : a very easy way exists to do this by one small for loop:
for(int i=0;i<temp.total();i++)
((unsigned char*)temp.data)[i]/=3;
The solution I used to solve it is: (depending on #Afshine answer and #Miki comment):
if (frame.isContinuous()){
for(int i = 0; i < frame.total(); i++){
frame.data[i]/=3;
}
}
else{
for (size_t r = 0; r < frame.rows; r++){
auto row_ptr = frame.ptr<uchar>(r);
for (size_t c = 0; c < 3 * frame.cols; c++){
row_ptr[c] /= 3;
}
}
}

DFT algorithm and convolution. what is wrong?

#include <vector>
using std::vector;
#include <complex>
using std::complex;
using std::polar;
typedef complex<double> Complex;
#define Pi 3.14159265358979323846
// direct Fourier transform
vector<Complex> dF( const vector<Complex>& in )
{
const int N = in.size();
vector<Complex> out( N );
for (int k = 0; k < N; k++)
{
out[k] = Complex( 0.0, 0.0 );
for (int n = 0; n < N; n++)
{
out[k] += in[n] * polar<double>( 1.0, - 2 * Pi * k * n / N );
}
}
return out;
}
// inverse Fourier transform
vector<Complex> iF( const vector<Complex>& in )
{
const int N = in.size();
vector<Complex> out( N );
for (int k = 0; k < N; k++)
{
out[k] = Complex( 0.0, 0.0 );
for (int n = 0; n < N; n++)
{
out[k] += in[n] * polar<double>(1, 2 * Pi * k * n / N );
}
out[k] *= Complex( 1.0 / N , 0.0 );
}
return out;
}
Who can say, what is wrong???
Maybe i don't understand details of implementation this algorithm... But i can't find it )))
also, i need to calculate convolution.
But i can't find test example.
UPDATE
// convolution. I suppose that x0.size == x1.size
vector convolution( const vector& x0, const vector& x1)
{
const int N = x0.size();
vector<Complex> tmp( N );
for ( int i = 0; i < N; i++ )
{
tmp[i] = x0[i] * x1[i];
}
return iF( tmp );
}
I really don't know exactly what your asking, but your DFT and IDFT algorithms look correct to me. Convolution can be performed using the DFT and IDFT using the circular convolution theorem which basically states that f**g = IDFT(DFT(f) * DFT(g)) where ** is circular convolution and * is simple multiplication.
To compute linear convolution (non-circular) using the DFT, you must zero-pad each of the inputs so that the circular wrap-around only occurs for zero-valued samples and does not affect the output. Each input sequence needs to be zero padded to a length of N >= L+M-1 where L and M are the lengths of the input sequences. Then you perform circular convolution as shown above and the first L+M-1 samples are the linear convolution output (samples beyond this should be zero).
Note: Performing convolution with the DFT and IDFT algorithms you have shown is much more inefficient than just computing it directly. The advantage only comes when using an FFT and IFFT(O(NlogN)) algorithm in place of the DFT and IDFT (O(N^2)).
Check FFTW library "for computing the discrete Fourier transform (DFT)" and its C# wrapper;) Maybe this too;)
Good luck!
The transforms look fine, but there's nothing in the program that is doing convolution.
UPDATE: the convolution code needs to forward transform the inputs first before the element-wise multiplication.