I'm currently experimenting with Eye tracking I've successfully built an iris tracking algorithm using OpenCV with contours and Hough transform. But the next step is unclear for me. I want to know if the calculations i'm doing are correct for translating the center of an eye to the screen. The head of the user has an fixed position.
What I want is an algorithm that works on all eyes off course. Is there like an angle calculation? So when the user is looking more to the right, linear?
What I do right now is:
First I let the user look at specific points and use RANSAC to detect the iris position that's closest to the position on the screen. I do that with four 2D points on the screen and iris. I'm using Homography for this to get the correct calculation.
void gaussian_elimination(float *input, int n){
// ported to c from pseudocode in
// http://en.wikipedia.org/wiki/Gaussian_elimination
float * A = input;
int i = 0;
int j = 0;
int m = n-1;
while (i < m && j < n){
// Find pivot in column j, starting in row i:
int maxi = i;
for(int k = i+1; k<m; k++){
if(fabs(A[k*n+j]) > fabs(A[maxi*n+j])){
maxi = k;
}
}
if (A[maxi*n+j] != 0){
//swap rows i and maxi, but do not change the value of i
if(i!=maxi)
for(int k=0;k<n;k++){
float aux = A[i*n+k];
A[i*n+k]=A[maxi*n+k];
A[maxi*n+k]=aux;
}
//Now A[i,j] will contain the old value of A[maxi,j].
//divide each entry in row i by A[i,j]
float A_ij=A[i*n+j];
for(int k=0;k<n;k++){
A[i*n+k]/=A_ij;
}
//Now A[i,j] will have the value 1.
for(int u = i+1; u< m; u++){
//subtract A[u,j] * row i from row u
float A_uj = A[u*n+j];
for(int k=0;k<n;k++){
A[u*n+k]-=A_uj*A[i*n+k];
}
//Now A[u,j] will be 0, since A[u,j] - A[i,j] * A[u,j] = A[u,j] - 1 * A[u,j] = 0.
}
i++;
}
j++;
}
//back substitution
for(int i=m-2;i>=0;i--){
for(int j=i+1;j<n-1;j++){
A[i*n+m]-=A[i*n+j]*A[j*n+m];
//A[i*n+j]=0;
}
}
}
ofMatrix4x4 findHomography(ofPoint src[4], ofPoint dst[4]){
ofMatrix4x4 matrix;
// create the equation system to be solved
//
// from: Multiple View Geometry in Computer Vision 2ed
// Hartley R. and Zisserman A.
//
// x' = xH
// where H is the homography: a 3 by 3 matrix
// that transformed to inhomogeneous coordinates for each point
// gives the following equations for each point:
//
// x' * (h31*x + h32*y + h33) = h11*x + h12*y + h13
// y' * (h31*x + h32*y + h33) = h21*x + h22*y + h23
//
// as the homography is scale independent we can let h33 be 1 (indeed any of the terms)
// so for 4 points we have 8 equations for 8 terms to solve: h11 - h32
// after ordering the terms it gives the following matrix
// that can be solved with gaussian elimination:
float P[8][9]={
{-src[0].x, -src[0].y, -1, 0, 0, 0, src[0].x*dst[0].x, src[0].y*dst[0].x, -dst[0].x }, // h11
{ 0, 0, 0, -src[0].x, -src[0].y, -1, src[0].x*dst[0].y, src[0].y*dst[0].y, -dst[0].y }, // h12
{-src[1].x, -src[1].y, -1, 0, 0, 0, src[1].x*dst[1].x, src[1].y*dst[1].x, -dst[1].x }, // h13
{ 0, 0, 0, -src[1].x, -src[1].y, -1, src[1].x*dst[1].y, src[1].y*dst[1].y, -dst[1].y }, // h21
{-src[2].x, -src[2].y, -1, 0, 0, 0, src[2].x*dst[2].x, src[2].y*dst[2].x, -dst[2].x }, // h22
{ 0, 0, 0, -src[2].x, -src[2].y, -1, src[2].x*dst[2].y, src[2].y*dst[2].y, -dst[2].y }, // h23
{-src[3].x, -src[3].y, -1, 0, 0, 0, src[3].x*dst[3].x, src[3].y*dst[3].x, -dst[3].x }, // h31
{ 0, 0, 0, -src[3].x, -src[3].y, -1, src[3].x*dst[3].y, src[3].y*dst[3].y, -dst[3].y }, // h32
};
gaussian_elimination(&P[0][0],9);
matrix(0,0)=P[0][8];
matrix(0,1)=P[1][8];
matrix(0,2)=0;
matrix(0,3)=P[2][8];
matrix(1,0)=P[3][8];
matrix(1,1)=P[4][8];
matrix(1,2)=0;
matrix(1,3)=P[5][8];
matrix(2,0)=0;
matrix(2,1)=0;
matrix(2,2)=0;
matrix(2,3)=0;
matrix(3,0)=P[6][8];
matrix(3,1)=P[7][8];
matrix(3,2)=0;
matrix(3,3)=1;
return matrix;
}
You should have a look at existing solutions for this:
Eye writer for painting with your eyes (I tested this to control the mouse only)
Eyewriter.org
Eyewriter walkthrough
Eyewriter on Github
EyeLike pupil tracking
EyeLike info page (algorithm similar to want you want is discussed here)
EyeLike on Github
Good luck!
May be this link is helpful to you , best luck
cv::Mat computeMatXGradient(const cv::Mat &mat) {
cv::Mat out(mat.rows,mat.cols,CV_64F);
for (int y = 0; y < mat.rows; ++y) {
const uchar *Mr = mat.ptr<uchar>(y);
double *Or = out.ptr<double>(y);
Or[0] = Mr[1] - Mr[0];
for (int x = 1; x < mat.cols - 1; ++x) {
Or[x] = (Mr[x+1] - Mr[x-1])/2.0;
}
Or[mat.cols-1] = Mr[mat.cols-1] - Mr[mat.cols-2];
}
return out;
}
Related
The above parallelize code is taking much more time as compare to the original one. I have used bfs approach to solve the problem. I am getting the correct output but it is taking too much time.
(x, y) represents matrix cell coordinates, and
dist represents their minimum distance from the source
struct Node
{
int x, y, dist;
};
\\Below arrays detail all four possible movements from a cell
int row[] = { -1, 0, 0, 1 };
int col[] = { 0, -1, 1, 0 };
Function to check if it is possible to go to position (row, col)
from the current position. The function returns false if (row, col)
is not a valid position or has a value 0 or already visited.
bool isValid(vector<vector<int>> const &mat, vector<vector<bool>> &visited, int row, int col)
{
return (row >= 0 && row < mat.size()) && (col >= 0 && col < mat[0].size())
&& mat[row][col] && !visited[row][col];
}
Find the shortest possible route in a matrix mat from source
cell (i, j) to destination cell (x, y)
int findShortestPathLength(vector<vector<int>> const &mat, pair<int, int> &src,
pair<int, int> &dest)
{
if (mat.size() == 0 || mat[src.first][src.second] == 0 ||
mat[dest.first][dest.second] == 0) {
return -1;
}
// `M × N` matrix
int M = mat.size();
int N = mat[0].size();
// construct a `M × N` matrix to keep track of visited cells
vector<vector<bool>> visited;
visited.resize(M, vector<bool>(N));
// create an empty queue
queue<Node> q;
// get source cell (i, j)
int i = src.first;
int j = src.second;
// mark the source cell as visited and enqueue the source node
visited[i][j] = true;
q.push({i, j, 0});
// stores length of the longest path from source to destination
int min_dist = INT_MAX;
// loop till queue is empty
while (!q.empty())
{
// dequeue front node and process it
Node node = q.front();
q.pop();
// (i, j) represents a current cell, and `dist` stores its
// minimum distance from the source
int i = node.x, j = node.y, dist = node.dist;
// if the destination is found, update `min_dist` and stop
if (i == dest.first && j == dest.second)
{
min_dist = dist;
break;
}
// check for all four possible movements from the current cell
// and enqueue each valid movement
#pragma omp parallel for
for (int k = 0; k < 4; k++)
{
// check if it is possible to go to position
// (i + row[k], j + col[k]) from current position
#pragma omp task shared(i,visited,j)
{
if (isValid(mat, visited, i + row[k], j + col[k]))
{
// mark next cell as visited and enqueue it
visited[i + row[k]][j + col[k]] = true;
q.push({ i + row[k], j + col[k], dist + 1 });
}
}
}
}
if (min_dist != INT_MAX) {
return min_dist;
}
return -1;
}
main part of the code only contains a matrix and source and destination coordinates
int main()
{
vector<vector<int>> mat =
{
{ 1, 1, 1, 1, 1, 0, 0, 1, 1, 1 },
{ 0, 1, 1, 1, 1, 1, 0, 1, 0, 1 },
{ 0, 0, 1, 0, 1, 1, 1, 0, 0, 1 },
{ 1, 0, 1, 1, 1, 0, 1, 1, 0, 1 },
{ 0, 0, 0, 1, 0, 0, 0, 1, 0, 1 },
{ 1, 0, 1, 1, 1, 0, 0, 1, 1, 0 },
{ 0, 0, 0, 0, 1, 0, 0, 1, 0, 1 },
{ 0, 1, 1, 1, 1, 1, 1, 1, 0, 0 },
{ 1, 1, 1, 1, 1, 0, 0, 1, 1, 1 },
{ 0, 0, 1, 0, 0, 1, 1, 0, 0, 1 },
};
pair<int, int> src = make_pair(0, 0);
pair<int, int> dest = make_pair(7, 5);
int min_dist = findShortestPathLength(mat, src, dest);
if (min_dist != -1)
{
cout << min_dist<<endl;
}
else {
cout << "Destination cannot be reached from a given source"<<endl;
}
return 0;
}
I have used shared variable but it is taking too much time.
Can anyone help me?
As people have remarked, you only get parallelism over the 4 directions; a better approach is to keep a set of sets of points: first the starting point, then all points you can reach from there in 1 step, then the ones you can reach in two steps.
With this approach you get more parallelism: you're building what's known as a "wave front" and all the points can be tackled simultaneously.
auto last_level = reachable.back();
vector<int> newly_reachable;
for ( auto n : last_level ) {
const auto& row = graph.row(n);
for ( auto j : row ) {
if ( not reachable.has(j)
and not any_of
( newly_reachable.begin(),newly_reachable.end(),
[j] (int i) { return i==j; } ) )
newly_reachable.push_back(j);
}
}
if (newly_reachable.size()>0)
reachable.push_back(newly_reachable);
(I have written this for a general DAG; writing your maze as a DAG is an exercise for the reader.)
However, this approach still has big problems: if two points on the current wave front decide to add the same new point, you have to resolve that.
For a very parallel approach you need to abandon the "push" model of adding new points altogether, and to go a "pull" model: in every "distance" iteration you loop over all points and ask, if I am not reachable, was one of my neighbors reachable? If so, mark me as reachable in one step more than that neighbor.
If you think about that last approach for a second (or two) you'll see that you are essentially doing a sequence of matrix-vector products, with the adjacency matrix and the currently reachable set. Except that you replace the scalar "+" operation by "min" and the scalar "*" by "+1". Read any tutorial about the interpretation of graph operations as linear algebra. Except that it's not really linear.
Im trying to implement Hough Transform using gradient direction. I know that there is an implementation in OpenCv but I want to do it myself.
I'm using Sobel to get the X and Y gradient. Then for every pixel the
magnitute ---> sqrt(sobelX^2 + sobelY^2)
directions --> atan2(sobelY,sobelX) * 180/PI
if the magnitude is higher then 220 (so almost black) this is the edge.
And then the direction is used on the circle equation.
But the results are not acceptable. Any help?
I know there are the cv::polar and cv::cartToPolar, but I want to optimize code so that all equations will be calculated on fly, no empty loops.
cv::Mat sobelX,sobelY;
Sobel(mat, sobelX, CV_32F, 1, 0, kernelSize, 1, 0, cv::BORDER_REPLICATE);
Sobel(mat, sobelY, CV_32F, 0, 1, kernelSize, 1, 0, cv::BORDER_REPLICATE);
//cv::Canny(mat,mat,100,200,kernelSize,false);
debug::showImage("sobelX",sobelX);
debug::showImage("SobelY",sobelY);
debug::showImage("MAT",mat);
cv::Mat magnitudeMap,angleMap;
magnitudeMap = cv::Mat::zeros(mat.rows,mat.cols,mat.type());
angleMap = cv::Mat::zeros(mat.rows,mat.cols,mat.type());
std::vector<cv::Mat> hough_spaces(max);
for(int i=0; i<max; ++i)
{
hough_spaces[i] = cv::Mat::zeros(mat.rows,mat.cols,mat.type());
}
for(int x=0; x<mat.rows; ++x)
{
for(int y=0; y<mat.cols; ++y)
{
const float magnitude = sqrt(sobelX.at<uchar>(x,y)*sobelX.at<uchar>(x,y)+sobelY.at<uchar>(x,y)*sobelY.at<uchar>(x,y));
const float theta= atan2(sobelY.at<uchar>(x,y),sobelX.at<uchar>(x,y)) * 180/CV_PI;
magnitudeMap.at<uchar>(x,y) = magnitude;
if(magnitude > 225)//mat.at<const uchar>(x,y) == 255)
{
for(int radius=min; radius<max; ++radius)
{
const int a = x - radius * cos(theta);//lookup::cosArray[static_cast<int>(theta)];//+ 0.5f;
const int b = y - radius * sin(theta);//lookup::sinArray[static_cast<int>(theta)]; //+ 0.5f;
if(a >= 0 && a <hough_spaces[radius].rows && b >= 0 && b<hough_spaces[radius].cols) {
hough_spaces[radius].at<uchar>(a,b)+=10;
}
}
}
}
}
debug::showImage("magnitude",magnitudeMap);
for(int radius=min; radius<max; ++radius)
{
double min_f,max_f;
cv::Point min_loc,max_loc;
cv::minMaxLoc(hough_spaces[radius],&min_f,&max_f,&min_loc,&max_loc);
if(max_f>=treshold)
{
circles.emplace_back(cv::Point3f(max_loc.x,max_loc.y,radius));
// debug::showImage(std::to_string(radius).c_str(),hough_spaces[radius]);
}
}
circles.shrink_to_fit();
I just implemented bicubic interpolation for resizing images.
I have a test image 6x6 pixels (grayscale), its columns are black and white (x3).
I am comparing the results of my code with the results from the tool ffmpeg and they are not correct. I can not understand why, I think I may be calculating the neighbourhood of pixels wrong or maybe the distance of the resized pixel to the original ones.
Can someone look into my code (I will simplify it for better reading) and tell me where the error is?
// Iterate through each line
for(int lin = 0; lin < dstHeight; lin++){
// Original coordinates
float linInOriginal = (lin - 0.5) / scaleHeightRatio;
// Calculate original pixels coordinates to interpolate
int linTopFurther = clamp(floor(linInOriginal) - 1, 0, srcHeight - 1);
int linTop = clamp(floor(linInOriginal), 0, srcHeight - 1);
int linBottom = clamp(ceil(linInOriginal), 0, srcHeight - 1);
int linBottomFurther = clamp(ceil(linInOriginal) + 1, 0, srcHeight - 1);
// Calculate distance to the top left pixel
float linDist = linInOriginal - floor(linInOriginal);
// Iterate through each column
for(int col = 0; col < dstWidth; col++){
// Original coordinates
float colInOriginal = (col - 0.5) / scaleWidthRatio;
// Calculate original pixels coordinates to interpolate
int colLeftFurther = clamp(floor(colInOriginal) - 1, 0, srcWidth - 1);
int colLeft = clamp(floor(colInOriginal), 0, srcWidth - 1);
int colRight = clamp(ceil(colInOriginal), 0, srcWidth - 1);
int colRightFurther = clamp(ceil(colInOriginal) + 1, 0, srcWidth - 1);
// Calculate distance to the top left pixel
float colDist = colInOriginal - floor(colInOriginal);
// Gets the original pixels values
// 1st row
uint8_t p00 = srcSlice[0][linTopFurther * srcWidth + colLeftFurther];
// ...
// 2nd row
uint8_t p01 = srcSlice[0][linTop * srcWidth + colLeftFurther];
// ...
// 3rd row
// ...
// 4th row
// ...
// Bilinear interpolation operation
// Y
float value = cubicInterpolate(
cubicInterpolate(static_cast<float>(p00), static_cast<float>(p10), static_cast<float>(p20), static_cast<float>(p30), colDist),
cubicInterpolate(static_cast<float>(p01), static_cast<float>(p11), static_cast<float>(p21), static_cast<float>(p31), colDist),
cubicInterpolate(static_cast<float>(p02), static_cast<float>(p12), static_cast<float>(p22), static_cast<float>(p32), colDist),
cubicInterpolate(static_cast<float>(p03), static_cast<float>(p13), static_cast<float>(p23), static_cast<float>(p33), colDist),
linDist);
dstSlice[0][lin * dstWidth + col] = double2uint8_t(clamp(value, 0.0f, 255.0f));
}
}
I was forgetting to set the values of the second degree variables of the interpolation matrix. They were set to 0, so the resulting interpolation would resemble the bilinear interpolation.
I trying to reduce my image colors to some predefined colors using the following function:
void quantize_img(cv::Mat &lab_img, std::vector<cv::Scalar> &lab_colors) {
float min_dist, dist;
int min_idx;
for (int i = 0; i < lab_img.rows*lab_img.cols * 3; i += lab_img.cols * 3) {
for (int j = 0; j < lab_img.cols * 3; j += 3) {
min_dist = FLT_MAX;
uchar &l = *(lab_img.data + i + j + 0);
uchar &a = *(lab_img.data + i + j + 1);
uchar &b = *(lab_img.data + i + j + 2);
for (int k = 0; k < lab_colors.size(); k++) {
double &lc = lab_colors[k](0);
double &ac = lab_colors[k](1);
double &bc = lab_colors[k](2);
dist = (l - lc)*(l - lc)+(a - ac)*(a - ac)+(b - bc)*(b - bc);
if (min_dist > dist) {
min_dist = dist;
min_idx = k;
}
}
l = lab_colors[min_idx](0);
a = lab_colors[min_idx](1);
b = lab_colors[min_idx](2);
}
}
}
However it does not seem to work properly! For example the output for the following input looks amazing!
if (!(src = imread("im0.png")).data)
return -1;
cvtColor(src, lab, COLOR_BGR2Lab);
std::vector<cv::Scalar> lab_color_plate_({
Scalar(100, 0 , 0), //white
Scalar(50 , 0 , 0), //gray
Scalar(0 , 0 , 0), //black
Scalar(50 , 127, 127), //red
Scalar(50 ,-128, 127), //green
Scalar(50 , 127,-128), //violet
Scalar(50 ,-128,-128), //blue
Scalar(68 , 46 , 75), //orange
Scalar(100,-16 , 93) //yellow
});
//convert from conventional Lab to OpenCV Lab
for (int k = 0; k < lab_color_plate_.size(); k++) {
lab_color_plate_[k](0) *= 255.0 / 100.0;
lab_color_plate_[k](1) += 128;
lab_color_plate_[k](2) += 128;
}
quantize_img(lab, lab_color_plate_);
cvtColor(lab, lab, CV_Lab2BGR);
imwrite("im0_lab.png", lab);
Input image:
Output image
Can anyone explain where the problem is?
After checking your algorithm I noticed that the algorithm is correct 100% and the problem is your color space.... Let's take one of the colors that is changed "wrongly" like the green from the trees.
Using a color picker tool in GIMP it tells you that at least one of the green used is in RGB (111, 139, 80). When this is converted to LAB, you get (54.4, -20.7, 28.3). The distance to green is (by your formula) 21274.34 , and with grey the distance is 1248.74... so it will choose grey over green, even though it is a green color.
A lot of values in LAB can generate a green value. You can test it out the color ranges in this webpage. I would suggest you to use HSV or HSL and compare the H values only which is the Hue. The other values changes only the tone of green, but a small range in the Hue determines that it is green. This will probably give you more accurate results.
As some suggestion to improve your code, use Vec3b and cv::Mat functions like this:
for (int i = 0; i < lab_img.rows; ++i) {
for (int j = 0; j < lab_img.cols; ++j) {
Vec3b pixel = lab_img.at<Vec3b>(i,j);
}
}
This way the code is more readable, and some checks are done in debug mode.
The other way would be to do a one loop since you don't care about indices
auto currentData = reinterpret_cast<Vec3b*>(lab_img.data);
for (size_t i = 0; i < lab_img.rows*lab_img.cols; i++)
{
auto& pixel = currentData[i];
}
This way is also better. This last part is just a suggestion, there is nothing wrong with your current code, just harder to read understand to the outside viewer.
I'm trying to make sure FFTW does what I think it should do, but am having problems. I'm using OpenCV's cv::Mat. I made a test program that, given a Mat f, computes ifft(fft(f)) and compares the result to f. I would expect the difference between the two to be negligible, but there's a strange pattern in the data..
In this case, f is initialized to be an 8x8 array of floats with positive values less than 1.
Here's my test program code:
Mat f = .. //populate f
if (f.type() != CV_32FC1)
DLOG << "Bad f type";
const int y = f.rows;
const int x = f.cols;
double* input = fftw_alloc_real(y * 2*(x/2 + 1));
// forward fft
fftw_plan plan = fftw_plan_dft_r2c_2d(x, y, input, (fftw_complex*)input, FFTW_MEASURE);
// inverse fft
fftw_plan iplan = fftw_plan_dft_c2r_2d(x, y, (fftw_complex*)input, input, FFTW_MEASURE);
// populate fftw data from f
for (int yi = 0; yi < y; ++yi)
{
const float* yptr = f.ptr<float>(yi);
for (int xi = 0; xi < x; ++xi)
input[yi*x + xi] = (double)yptr[xi];
}
fftw_execute(plan);
fftw_execute(iplan);
// put data into another cv::Mat for comparison
Mat check(y, x, f.type());
for (int yi = 0; yi < y; ++yi)
{
float* yptr = check.ptr<float>(yi);
for (int xi = 0; xi < x ; ++xi)
yptr[xi] = (float)input[yi*x + xi];
}
DLOG << Util::summary(f, "f");
DLOG << f;
DLOG << Util::summary(check, "check");
DLOG << check;
Mat diff = f*x*y - check;
DLOG << Util::summary(diff, "diff");
DLOG << diff;
Where DLOG is my logger and Util::summary(cv::Mat m) just prints passed string and the dimensions, channels, min, and max of the mat.
Here's what the data looks like (output):
f: rows:8 cols:8 chans:1 min:0.00257996 max:0.4
[0.050668437, 0.04509116, 0.033668514, 0.10986148, 0.12855141, 0.048241843, 0.12613985,.09731093;
0.028602425, 0.0092236707, 0.037089188, 0.118964, 0.075040311, 0.40000001, 0.11959606, 0.071930833;
0.0025799556, 0.051522054, 0.22233701, 0.052993439, 0.032000393, 0.12673819, 0.015244827, 0.044803992;
0.13946071, 0.019708242, 0.0112687, 0.047459811, 0.019342113, 0.030085485, 0.018739942, 0.0098618753;
0.041809395, 0.029681522, 0.026837418, 0.16038358, 0.29034778, 0.17247421, 0.1789207, 0.042179305;
0.025630442, 0.017192598, 0.060540862, 0.1854037, 0.21287154, 0.04813192, 0.042614728, 0.034764063;
0.0030835248, 0.018511582, 0.0071733585, 0.017076733, 0.064545207, 0.0026390438, 0.088922881, 0.045725599;
0.12798512, 0.23215951, 0.027465452, 0.03174505, 0.04352935, 0.025079668, 0.044403922, 0.035459157]
check: rows:8 cols:8 chans:1 min:-3.26489 max:25.6
[3.24278, 2.8858342, 2.1547849, 7.0311346, 8.2272902, 3.0874779, 8.0729504, 6.2278996;
0.30818239, 0, 2.373708, 7.6136961, 4.8025799, 25.6, 7.6541481, 4.6035733;
0.16511716, 3.2974114, -3.2648909, 0, 2.0480251, 8.1112442, 0.97566891, 2.8674555;
8.9254856, 1.2613275, 0.72119683, 3.0374279, -0.32588482, 0, 1.1993563, 0.63116002;
2.6758013, 1.8996174, 1.7175947, 10.264549, 18.582258, 11.038349, 0.042666838, 0;
1.6403483, 1.1003263, 3.8746152, 11.865837, 13.623778, 3.0804429, 2.7273426, 2.2249;
0.44932228, 0, 0.45909494, 1.0929109, 4.1308932, 0.16889881, 5.6910644, 2.9264383;
8.1910477, 14.858209, -0.071794562, 0, 2.7858784, 1.6050987, 2.841851, 2.2693861]
diff: rows:8 cols:8 chans:1 min:-0.251977 max:17.4945
[0, 0, 0, 0, 0, 0, 0, 0;
1.5223728, 0.59031492, 0, 0, 0, 0, 0, 0;
0, 0, 17.494459, 3.3915801, 0, 0, 0, 0;
0, 0, 0, 0, 1.5637801, 1.9254711, 0, 0;
0, 0, 0, 0, 0, 0, 11.408258, 2.6994755;
0, 0, 0, 0, 0, 0, 0, 0;
-0.2519767, 1.1847413, 0, 0, 0, 0, 0, 0;
0, 0, 1.8295834, 2.0316832, 0, 0, 0, 0]
The difficult part for me is the nonzero entries in the diff matrix. I've accounted for the scaling FFTW does on the values and the padding needed to do an in-place fft on real data; what am I missing?
I find it surprising that the data could be off by a value of 17 (which is 66% of the max value), when there are so many zeros. Also, the data irregularities seem to form a diagonal pattern.
As you may have noticed when writting fftw_alloc_real(y * 2*(x/2 + 1)); fftw needs extra space in the x direction to store complex data. In your case, as x=8, it needs 2*(x/2+1)=10 reals.
http://www.fftw.org/doc/Real_002ddata-DFT-Array-Format.html#Real_002ddata-DFT-Array-Format
So...you should take care of this as you populate the input array or retreive values from it.
You way change
input[yi*x + xi] = (double)yptr[xi];
for
int xfft=2*(x/2 + 1);
...
input[yi*xfft + xi] = (double)yptr[xi];
And
yptr[xi] = (float)input[yi*x + xi];
for
yptr[xi] = (float)input[yi*xfft + xi];
It should solve your problem since the non-nul points in your diff correspond to the extra padding.
Bye,