Implementing Structured Tensor - c++

I am trying to implement a paper called Structured Tensor Based Image Interpolation. In the paper what it does is the use structure tensor to classify each pixel in an image into three different classes (uniform, corners and edges) based on eigen values of a structured tensor.
To a achieve this I have written the following code:
void tensorComputation(Mat dx, Mat dy, Mat magnitude)
{
Mat dx2, dy2, dxy;
GaussianBlur(magnitude, magnitude, Size(3, 3), 0, 0, BORDER_DEFAULT);
// Calculate image derivatives
multiply(dx, dx, dx2);
multiply(dy, dy, dy2);
multiply(dx, dy, dxy);
Mat t(2, 2, CV_32F); // tensor matrix
// Insert values to the tensor matrix.
t.at<float>(0, 0) = sum(dx2)[0];
t.at<float>(0, 1) = sum(dxy)[0];
t.at<float>(1, 0) = sum(dxy)[0];
t.at<float>(1, 1) = sum(dy2)[0];
// eigen decomposition to get the main gradient direction.
Mat eigVal, eigVec;
eigen(t, eigVal, eigVec);
// This should compute the angle of the gradient direction based on the first eigenvector.
float* eVec1 = eigVec.ptr<float>(0);
float* eVec2 = eigVec.ptr<float>(1);
cout << fastAtan2(eVec1[0], eVec1[1]) << endl;
cout << fastAtan2(eVec2[0], eVec2[1]) << endl;
}
Here dx, dy, magnitude are derivative in x-axis, derivative in y- axis and magnitude of an image respectively.
What I know is I have found structured tensor for the entire image. But my problem is that I need to compute structured tensor for each pixel in an image. How to achieve this?

In your code you blur magnitude, but then don't use it. You don't need this magnitude at all.
You build the structure tensor correctly, but you average over the whole image. What you want to do is apply local averaging. For each pixel, the structure tensor is the average of your matrix over the pixels in the neighborhood. You compute this by applying a Gaussian blur to each of the components of the tensor: dx2, dy2, and dxy.
The larger the sigma of the Gaussian, the larger the neighborhood you average over. You get more regularization (less sensitive to noise) but also less resolution (less sensitive to small variations and short edges). Play around with the parameter until you get what you need. Sigma between 2 and 5 are quite common.
Next, you need to compute the eigendecomposition per pixel. I don't know if OpenCV makes this easy. I recommend you use DIPlib 3 instead. It has the right infrastructure to compute and use the structure tensor. See here how easy it can be.

Related

3D Reconstruction Of Planar Markers usin OpenCV

I am trying to perform 3D Reconstruction(Structure From Motion) from Multiple Images of Planar Markers. I very new to MVG and openCV.
As far I have understood I have to do the following steps:
Identify corresponding 2D corner points in the one images.
Calculate the Camera Pose of the first image us cv::solvePNP(assuming the
origin to be center of the marker).
Repeat 1 and 2 for the second image.
Estimate the relative motion of the camera by Rot_relative = R2 - R1,
Trans_relative = T2-T1.
Now assume the first camera to be the origin construct the 3x4 Projection
Matrix for both views, P1 =[I|0]*CameraMatrix(known by Calibration) and P2 =
[Rot_relative |Trans_relative ].
Use the created projection matrices and 2D corner points to triangulate the
3D coordinate using cv::triangulatePoints(P1,P2,point1,point2,OutMat)
The 3D coordinate can be found by dividing the each rows of OutMat by the 4th
row.
I was hoping to keep my "First View" as my origin and iterate
through n views repeating steps from 1-7(I suppose its called Global SFM).
I was hoping to get (n-1)3D points of the corners with "The first View as origin" which we could optimize using Bundle Adjustment.
But the result I get is very disappointing the 3D points calculated are displaced by a huge factor.
These are questions:
1.Is there something wrong with the steps I followed?
2.Should I use cv::findHomography() and cv::decomposeHomographyMat() to find the
relative motion of the camera?
3.Should point1 and point2 in cv::triangulatePoints(P1,P2,point1,point2,OutMat)
be normalized and undistorted? If yes, how should the "Outmat" be interpreted?
Please anyone who has insights towards the topic, can you point out my mistake?
P.S. I have come to above understanding after reading "MultiView Geometry in Computer Vision"
Please find the code snippet below:
cv::Mat Reconstruction::Triangulate(std::vector<cv::Point2f>
ImagePointsFirstView, std::vector<cv::Point2f>ImagePointsSecondView)
{
cv::Mat rVectFirstView, tVecFristView;
cv::Mat rVectSecondView, tVecSecondView;
cv::Mat RotMatFirstView = cv::Mat(3, 3, CV_64F);
cv::Mat RotMatSecondView = cv::Mat(3, 3, CV_64F);
cv::solvePnP(RealWorldPoints, ImagePointsFirstView, cameraMatrix, distortionMatrix, rVectFirstView, tVecFristView);
cv::solvePnP(RealWorldPoints, ImagePointsSecondView, cameraMatrix, distortionMatrix, rVectSecondView, tVecSecondView);
cv::Rodrigues(rVectFirstView, RotMatFirstView);
cv::Rodrigues(rVectSecondView, RotMatSecondView);
cv::Mat RelativeRot = RotMatFirstView-RotMatSecondView ;
cv::Mat RelativeTrans = tVecFristView-tVecSecondView ;
cv::Mat RelativePose;
cv::hconcat(RelativeRot, RelativeTrans, RelativePose);
cv::Mat ProjectionMatrix_0 = cameraMatrix*cv::Mat::eye(3, 4, CV_64F);
cv::Mat ProjectionMatrix_1 = cameraMatrix* RelativePose;
cv::Mat X;
cv::undistortPoints(ImagePointsFirstView, ImagePointsFirstView, cameraMatrix, distortionMatrix, cameraMatrix);
cv::undistortPoints(ImagePointsSecondView, ImagePointsSecondView, cameraMatrix, distortionMatrix, cameraMatrix);
cv::triangulatePoints(ProjectionMatrix_0, ProjectionMatrix_1, ImagePointsFirstView, ImagePointsSecondView, X);
X.row(0) = X.row(0) / X.row(3);
X.row(1) = X.row(1) / X.row(3);
X.row(2) = X.row(2) / X.row(3);
return X;
}

How to get the distance of the object and How to use camera calibration matrix correctly?

I succesfully calibrate my camera using opencv. The camera lens i am using.
https://www.baslerweb.com/en/products/vision-components/lenses/basler-lens-c125-0418-5m-f1-8-f4mm/
The internal and external camera parameter is given below.
cv::Mat cameraMatrix(3, 3, cv::DataType<double>::type);
cameraMatrix.at<double>(0) = 1782.80;//fx //432.2 in mm
cameraMatrix.at<double>(1) = 0;
cameraMatrix.at<double>(2) = 3.0587694283633488e+002;//cx
cameraMatrix.at<double>(3) = 0;
cameraMatrix.at<double>(4) = 1782.80;//fy
cameraMatrix.at<double>(5) = 3.0535864258476721e+002;//cy
cameraMatrix.at<double>(6) = 0;
cameraMatrix.at<double>(7) = 0;
cameraMatrix.at<double>(8) = 1;
cv::Mat disCoeffs(1, 5, cv::DataType<double>::type);
disCoeffs.at<double>(0) = -8.1752937039996709e-001;//k1
disCoeffs.at<double>(1) = -2.5660653367749450e+001;//k2
disCoeffs.at<double>(2) = -1.5556922931812768e-002;//p1
disCoeffs.at<double>(3) = -4.4021541217208054e-002;//p2
disCoeffs.at<double>(4) = 1.5042036073609015e+002;//k3
I know this formula is used to calculate the distance of the object. But i am very confuse how to proper use it.
Resolution of my camera is 640x480.
focal length = 1782.80 (px) do not know how to correctly convert to mm
i know focal length is distance from sensor to image plane. So what actually this value represent? Pixel is just a unit represent dot on screen.
Object i am using is circle.
radius = 22. (width and height 44*44)
circle center point: 300,300 (x,y)
sensor height do not know how to get?
Where do i use principle points?
How i get distance from camera to object? How do get real world coordinate of the circle?
I know its too much to ask. I try one month. Did not find any proper solution.
i use function solvePnP to get the camera translation and rotation matrix. But i have problem how to calculate object point?
Your cx and cy seems to be wrong because they should be half the resolution: 640/2 & 480/2.
fx and fy are in pixel unit you get from calibration process. To convert them to mm use that formula:
pixels width = (image width in pixels) * (focal length in mm) / (CCD width in mm)
pixels height = (image height in pixels) * (focal length in mm) / (CCD height in mm)
When you calibrate your camera, you use those formulas to make sure you've the right values. For me cx and cy are wrong because they represent the center of the image (they shouldn't be equal unless your image is square which is not the case). For fx and fy I can't tell because I don't know the CCD of your camera. They can be equal if the CCD is square.
Don't change those parameters manually but let the your calibration software compute them.
Now you've those parameters, how you compute the distance?
The formula you presented is not useful in a sense that if you can measure the real height, you usually can measure the distance (at least in your case).. so why using a camera!?
So to compute the distance in real world, you need two more things: The extrinsic parameters (Your cameraMatrix matrix is the intrinsic parameters) and at least four points (the more points the better) in real world coordinates.
Once you have those things, you can use solvePnP function to find the pose of an object. The pose represents the translation and rotation with respect to the camera frame.
http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#solvepnp
This is a piece of code can help to do that:
//Four points in real world with `x,y and z` coordinates
vector<Point3f> vec3d;
vec3d.push_back(Point3f(0, 0, 0));
vec3d.push_back(Point3f(0, 211, 0));
vec3d.push_back(Point3f(295, 211, 0));
vec3d.push_back(Point3f(295, 0, 0));
The z=0 because your real points are in a plane.
//The same four points but in your image plan, therefore there is no z and they're in pixel unit
vector<Point2f> vec2d;
vec2d.push_back(Point2f(532, 412)); //(y,x)
vec2d.push_back(Point2f(583, 594));
vec2d.push_back(Point2f(927, 535));
vec2d.push_back(Point2f(817, 364));
//The pose of the object: rvec is your rotation vector, tvec is your translation vector
cv::Mat rvec, tvec;
solvePnP(vec3d, vec2d, cameraMatrix, distCoeffs, rvec, tvec);
Finally, you can compute the real distance from the tvec as euclidean distance: d=std::sqrt(tx*tx+ty*ty+tz*tz).
Your questions:
sensor height do not know how to get?
Look for your camera specification in the internet or in the manual book and you'll find it.
Where do i use principle points?
They're your intrinsic parameters. You're not gonna use them separately.
How i get distance from camera to object? How do get real world coordinate of the circle?
I explained that above. You need four points and with a circle you have only one which not enough to compute the pose.
But i have problem how to calculate object point?
objectPoints in solvePnP are your real world coordinates. For example, a chessboard has corners in which we know the exact position in mm of each one with respect to a world frame that you choose in the chessboard. It can be in the left top corner or something like that and z=0 because the chessboard is printed in a paper just like your circle!
EDIT:
You can find more specifications in the manual page 13 here. It is said 7.4 x 7.4µm:
f (mm)=f(pixel) x pixel_size(mm) => f (mm) = 1782.80x7.2e-6 = 12.83616 (mm)
Which is not 4mm!! then you need to do the calibration again, something is wrong!
3D points:
vector vec3d;
vec3d is where you gonna store your 3D coordinates point. I gave you an example for the first point which the origin:
vec3d.push_back(Point3f(0, 0, 0)); //y,x,z
EDIT3
If you take a pattern like this
Then choose for example the circle in top left or right corner and it will have a coordinate of (0,0,0), that the origin. After that the circle next to it is your second point and it will have (x,0,0) x is the distance in (mm) between the two circles.. You do the same for four points in your pattern. You can choose any pattern you want as long as you can detect it in your image and retrieve their coordinates in pixel.
If you still don't understand, I advise you take a course in projective geometry and camera models.. so as you can understand what every parameter means.

FFTW gives wrong results in comparison to MATLAB [duplicate]

I aim to get the DFT of an image in OpenCV.
Using dft function, I'm able to calculate it, and then paint it by calculating its magnitude (then, apply the log and finally normalize it in order to paint values between 0 and 1).
My result is, for the following image, the result I show you (with swap in order to have lower frequencies in the center of the image):
However, if I compare it to the result I obtain using other tools like Halcon, It seems incorrect to my since It seems to have really "high" values (the OpenCV DFT magnitude I mean):
I thought it might be for these reasons:
The difference between DFT (at OpenCV) and FFT (Halcon)
The operations I'm performing in order to show the magnitude in OpenCV.
The first one have as problem that it's quite hard for me to analyze, and OpenCV doesn't have a FFT function, as well as Halcon doesn't have a DFT function (if I'm not wrong of course), so I can't compare it directly.
The second one is in which I've been working the most time, but I still don't find the reason if it's there.
There's the code I'm using to paint the magnitude of img (which is my DFT image):
// 1.- To split the image in Re | Im values
Mat planes[] = {Mat_<float>(img), Mat::zeros(img.size(), CV_32F)};
// 2.- To magnitude + phase
split(img, planes);
// Calculate magnitude. I overwrite it, I know, but this is inside a function so it will be never used again, doesn't matter
magnitude(planes[0], planes[1], planes[0]);
// Magnitude Mat
Mat magI = planes[0];
// 3.- We add 1 to all them in order to perform the log
magI += Scalar::all(1); // switch to logarithmic scale
log(magI, magI);
// 4.- Swap the quadrants to center frequency
magI = magI(Rect(0, 0, magI.cols & -2, magI.rows & -2));
int cx = magI.cols/2;
int cy = magI.rows/2;
Mat q0(magI, Rect(0, 0, cx, cy)); // Top-Left - Create a ROI per quadrant
Mat q1(magI, Rect(cx, 0, cx, cy)); // Top-Right
Mat q2(magI, Rect(0, cy, cx, cy)); // Bottom-Left
Mat q3(magI, Rect(cx, cy, cx, cy)); // Bottom-Right
// swap quadrants (Top-Left with Bottom-Right)
Mat tmp;
q0.copyTo(tmp);
q3.copyTo(q0);
tmp.copyTo(q3);
// swap quadrant (Top-Right with Bottom-Left)
q1.copyTo(tmp);
q2.copyTo(q1);
tmp.copyTo(q2);
// 5.- Normalize
// Transform the matrix with float values into a
// viewable image form (float between values 0 and 1).
normalize(magI, magI, 0, 1, CV_MINMAX);
// Paint it
imshow( "Magnitud DFT", magI);
So summarizing: any idea about why do I have this difference between these two magnitudes?
I'll summarize my comments into an answer.
When one thinks of doing a Fourier transform to work in the inverse domain, the assumption is that doing the inverse transform will return the same function/vector/whatever. In other words, we assume
This is the case with many programs and libraries (e.g. Mathematica, Matlab/octave, Eigen/unsupported/FFT, etc.). However, with many libraries (FFTW, KissFFT, etc.) this is not the case and there tends to be a scale
where s is usually the number of elements (m) in the array to the power of something (should be 1 if not scaled in a mismatched fashion in both the transform and the inverse). This is done in order to refrain from iterating over all m elements multiplying by a scale, which is often not important.
That being said, when looking at the scale in the inverse domain, various libraries that do scale the transforms have the liberty to use different scales for the transform and inverse transform. Common scaling pairs for the transform/inverse include {m^-1, m} and {m^-0.5, m^0.5}. Therefore, when comparing results from different libraries, we should be prepared to factors of m (scaled by m^-1 vs. not scaled), m^0.5 (scaled by m^-0.5 vs. not scaled and scaled by m^-1 vs. scaled by m^-0.5) or even other scales if other scaling factors were used.
Note: This scaling factor is not related to normalizing an array, such that all values are [0,1] or that the norm of the array is equal to 1.

How to combine two remap() operations into one?

I have a tight loop, where I get a camera image, undistort it and also transform it according to some transformation (e.g. a perspective transform). I already figured out to use cv::remap(...) for each operation, which is already much more efficient than using plain matrix operations.
In my understanding it should be possible to combine the lookup maps into one and call remap just once in every loop iteration. Is there a canonical way to do this? I would prefer not to implement all the interpolation stuff myself.
Note: The procedure should work with differently sized maps. In my particular case the undistortion preserves the image dimensions, while the other transformation scales the image to a different size.
Code for illustration:
// input arguments
const cv::Mat_<math::flt> intrinsic = getIntrinsic();
const cv::Mat_<math::flt> distortion = getDistortion();
const cv::Mat mNewCameraMatrix = cv::getOptimalNewCameraMatrix(intrinsic, distortion, myImageSize, 0);
// output arguments
cv::Mat undistortMapX;
cv::Mat undistortMapY;
// computes undistortion maps
cv::initUndistortRectifyMap(intrinsic, distortion, cv::Mat(),
newCameraMatrix, myImageSize, CV_16SC2,
undistortMapX, undistortMapY);
// computes undistortion maps
// ...computation of mapX and mapY omitted
cv::convertMaps(mapX, mapY, skewMapX, skewMapY, CV_16SC2);
for(;;) {
cv::Mat originalImage = getNewImage();
cv::Mat undistortedImage;
cv::remap(originalImage, undistortedImage, undistortMapX, undistortMapY, cv::INTER_LINEAR);
cv::Mat skewedImage;
cv::remap(undistortedImage, skewedImage, skewMapX, skewMapY, cv::INTER_LINEAR);
outputImage(skewedImage);
}
You can apply remap on undistortMapX and undistortMapY.
cv::remap(undistortMapX, undistrtSkewX, skewMapX, skewMapY, cv::INTER_LINEAR);
cv::remap(undistortMapY, undistrtSkewY, skewMapX, skewMapY, cv::INTER_LINEAR);
Than you can use:
cv::remap(originalImage , skewedImage, undistrtSkewX, undistrtSkewY, cv::INTER_LINEAR);
It works because skewMaps and undistortMaps are arrays of coordinates in image, so it should be similar to taking location of location...
Edit (answer to comments):
I think I need to make some clarification. remap() function calculates pixels in new image from pixels of old image. In case of linear interpolation each pixel in new image is a weighted average of 4 pixels from the old image. The weights differ from pixel to pixel according to values from provided maps. If the value is more or less integer, then most of the weight is taken from single pixel. As a result new image will be as sharp is original image. On the other hand, if the value is far from being integer (i.e. integer + 0.5) then the weights are similar. This will create smoothing effect. To get a feeling of what I am talking about, look at the undistorted image. You will see that some parts of the image are sharper/smoother than other parts.
Now back to the explanation about what happened when you combined two remap operations into one. The coordinates in combined maps are correct, i.e. pixel in skewedImage is calculated from correct 4 pixels of originalImage with correct weights. But it is not identical to result of two remap operations. Each pixel in undistortedImage is a weighted average of 4 pixels from originalImage. This means that each pixel of skewedImage would be a weighted average of 9-16 pixels from orginalImage. Conclusion: using single remap() can NOT possibly give result that is identical to two usages of remap().
Discussion about which of the two possible images (single remap() vs double remap()) is better is quite complicated. Normally it is good to make as little interpolations as possible, because each interpolation introduces different artifacts. Especially if the artifacts are not uniform in the image (some regions became more smooth than others). In some cases those artifacts may have good visual effect on the image - like reducing some of the jitter. But if this is what you want, you can achieve this in cheaper and more consistent ways. For example by smoothing original image prior to remaping.
In the case of two general mappings, there is no choice but to use the approach suggested by #MichaelBurdinov.
However, in the special case of two mappings with known inverse mappings, an alternative approach is to compute the maps manually. This manual approach is more accurate than the double remap one, since it does not involve interpolation of coordinate maps.
In practice, most of the interesting applications match this special case. It does too in your case because your first map corresponds to image undistortion (whose inverse operation is image distortion, which is associated to a well known analytical model) and your second map corresponds to a perspective transform (whose inverse can be expressed analytically).
Computing the maps manually is actually quite easy. As stated in the documentation (link) these maps contain, for each pixel in the destination image, the (x,y) coordinates where to find the appropriate intensity in the source image. The following code snippet shows how to compute the maps manually in your case:
int dst_width=...,dst_height=...; // Initialize the size of the output image
cv::Mat Hinv=H.inv(), Kinv=K.inv(); // Precompute the inverse perspective matrix and the inverse camera matrix
cv::Mat map_undist_warped_x32f(dst_height,dst_width,CV_32F); // Allocate the x map to the correct size (n.b. the data type used is float)
cv::Mat map_undist_warped_y32f(dst_height,dst_width,CV_32F); // Allocate the y map to the correct size (n.b. the data type used is float)
// Loop on the rows of the output image
for(int y=0; y<dst_height; ++y) {
std::vector<cv::Point3f> pts_undist_norm(dst_width);
// For each pixel on the current row, first use the inverse perspective mapping, then multiply by the
// inverse camera matrix (i.e. map from pixels to normalized coordinates to prepare use of projectPoints function)
for(int x=0; x<dst_width; ++x) {
cv::Mat_<float> pt(3,1); pt << x,y,1;
pt = Kinv*Hinv*pt;
pts_undist_norm[x].x = pt(0)/pt(2);
pts_undist_norm[x].y = pt(1)/pt(2);
pts_undist_norm[x].z = 1;
}
// For each pixel on the current row, compose with the inverse undistortion mapping (i.e. the distortion
// mapping) using projectPoints function
std::vector<cv::Point2f> pts_dist;
cv::projectPoints(pts_undist_norm,cv::Mat::zeros(3,1,CV_32F),cv::Mat::zeros(3,1,CV_32F),intrinsic,distortion,pts_dist);
// Store the result in the appropriate pixel of the output maps
for(int x=0; x<dst_width; ++x) {
map_undist_warped_x32f.at<float>(y,x) = pts_dist[x].x;
map_undist_warped_y32f.at<float>(y,x) = pts_dist[x].y;
}
}
// Finally, convert the float maps to signed-integer maps for best efficiency of the remap function
cv::Mat map_undist_warped_x16s,map_undist_warped_y16s;
cv::convertMaps(map_undist_warped_x32f,map_undist_warped_y32f,map_undist_warped_x16s,map_undist_warped_y16s,CV_16SC2);
Note: H above is your perspective transform while Kshould be the camera matrix associated with the undistorted image, so it should be what in your code is called newCameraMatrix (which BTW is not an output argument of initUndistortRectifyMap). Depending on your specific data, there might also be some additional cases to handle (e.g. division by pt(2) when it might be zero, etc).
I found this question when looking to combine dewarping (undistortion) and projection tranforms in python, but there is no direct python answer.
Here is an direct conversion of BConic's answer in python
import numpy as np
import cv2
dst_width = ...
dst_height = ...
h_inv = np.linalg.inv(h)
k_inv = np.linalg.inv(new_camera_matrix)
map_x = np.zeros((dst_height, dst_width), dtype=np.float32)
map_y = np.zeros((dst_height, dst_width), dtype=np.float32)
for y in range(dst_height):
pts_undist_norm = np.zeros((dst_width, 3, 1))
for x in range(dst_width):
pt = np.array([x, y, 1]).reshape(3,1)
pt2 = k_inv # h_inv # pt
pts_undist_norm[x][0] = pt2[0]/pt2[2]
pts_undist_norm[x][1] = pt2[1]/pt2[2]
pts_undist_norm[x][2] = 1
r_vec = np.zeros((3,1))
t_vec = np.zeros((3,1))
pts_dist, _ = cv2.projectPoints(pts_undist_norm, r_vec, t_vec, intrinsic, distortion)
pts_dist = pts_dist.squeeze()
for x2 in range(dst_width):
map_x[y][x2] = pts_dist[x2][0]
map_y[y][x2] = pts_dist[x2][1]
# using CV_16SC2 introduced substantial image artifacts for me
map_x_final, map_y_final = cv2.convertMaps(map_x, map_y, cv2.CV_32FC1, cv2.CV_32FC1)
This is obviously really slow since it is using a double for loop and iterating through every pixel, so you can do it much faster using numpy. You should be able to do something similar in C++ to eliminate the for loops and do a single matrix multiplication.
import numpy as np
import cv2
dst_width = ...
dst_height = ...
h_inv = np.linalg.inv(h)
k_inv = np.linalg.inv(new_camera_matrix)
m_grid = np.mgrid[0:dst_width, 0:dst_height].reshape(2, dst_height*dst_width)
m_grid = np.insert(m_grid, 2, 1, axis=0)
m_grid_result = k_inv # h_inv # m_grid
pts_undist_norm = m_grid_result[:2, :] / m_grid_result[2, :]
pts_undist_norm = np.insert(pts_undist_norm, 2, 1, axis=0)
r_vec = np.zeros((3,1))
t_vec = np.zeros((3,1))
pts_dist = cv2.projectPoints(pts_undist_norm, r_vec, t_vec, intrinsic, distortion)
pts_dist = pts_dist.squeeze().astype(np.float32)
map_x = pts_dist[:, 0].reshape(dst_width, dst_height).swapaxes(0,1)
map_y = pts_dist[:, 1].reshape(dst_width, dst_height).swapaxes(0,1)
# using CV_16SC2 introduced substantial image artifacts for me
map_x_final, map_y_final = cv2.convertMaps(map_x, map_y, cv2.CV_32FC1, cv2.CV_32FC1)
This numpy implementation is roughly 25-75x faster than the first method.
I came across the same problem. I tried to implement AldurDisciple's answer. Instead of calculating transformation in a loop. I'm having a mat with mat.at <Vec2f>(x,y)=Vec2f(x,y) and applying perspectiveTransform to this mat. Add a 3rd channel of "1" to the result mat and apply projectPoints.
Here is my code
Mat xy(2000, 2500, CV_32FC2);
float *pxy = (float*)xy.data;
for (int y = 0; y < 2000; y++)
for (int x = 0; x < 2500; x++)
{
*pxy++ = x;
*pxy++ = y;
}
// perspective transformation of coordinates of destination image,
// which generates the map from destination image to norm points
Mat pts_undist_norm(2000, 2500, CV_32FC2);
Mat matPerspective =transRot3x3;
perspectiveTransform(xy, pts_undist_norm, matPerspective);
//add 3rd channel of 1
vector<Mat> channels;
split(pts_undist_norm, channels);
Mat channel3(2000, 2500, CV_32FC1, cv::Scalar(float(1.0)));
channels.push_back(channel3);
Mat pts_undist_norm_3D(2000, 2500, CV_32FC3);
merge(channels, pts_undist_norm_3D);
//projectPoints to extend the map from norm points back to the original captured image
pts_undist_norm_3D = pts_undist_norm_3D.reshape(0, 5000000);
Mat pts_dist(5000000, 1, CV_32FC2);
projectPoints(pts_undist_norm_3D, Mat::zeros(3, 1, CV_64F), Mat::zeros(3, 1, CV_64F), intrinsic, distCoeffs, pts_dist);
Mat maps[2];
pts_dist = pts_dist.reshape(0, 2000);
split(pts_dist, maps);
// apply map
remap(originalImage, skewedImage, maps[0], maps[1], INTER_LINEAR);
The transformation matrix used to map to norm points is a bit different from the one used in AldurDisciple's answer. transRot3x3 is composed from tvec and rvec generated by calibrateCamera.
double transData[] = { 0, 0, tvecs[0].at<double>(0), 0, 0,
tvecs[0].at<double>(1), 0, 0, tvecs[0].at<double>(2) };
Mat translate3x3(3, 3, CV_64F, transData);
Mat rotation3x3;
Rodrigues(rvecs[0], rotation3x3);
Mat transRot3x3(3, 3, CV_64F);
rotation3x3.col(0).copyTo(transRot3x3.col(0));
rotation3x3.col(1).copyTo(transRot3x3.col(1));
translate3x3.col(2).copyTo(transRot3x3.col(2));
Added:
I realized if the only needed map is the final map why not just use projectPoints to a mat with mat.at(x,y)=Vec2f(x,y,0) .
//generate a 3-channel mat with each entry containing it's own coordinates
Mat xyz(2000, 2500, CV_32FC3);
float *pxyz = (float*)xyz.data;
for (int y = 0; y < 2000; y++)
for (int x = 0; x < 2500; x++)
{
*pxyz++ = x;
*pxyz++ = y;
*pxyz++ = 0;
}
// project coordinates of destination image,
// which generates the map from destination image to source image directly
xyz=xyz.reshape(0, 5000000);
Mat pts_dist(5000000, 1, CV_32FC2);
projectPoints(xyz, rvecs[0], tvecs[0], intrinsic, distCoeffs, pts_dist);
Mat maps[2];
pts_dist = pts_dist.reshape(0, 2000);
split(pts_dist, maps);
//apply map
remap(originalImage, skewedImage, maps[0], maps[1], INTER_LINEAR);

Matching small grayscale images

I want to test whether two images match. Partial matches also interest me.
The problem is that the images suffer from strong noise. Another problem is that the images might be rotated with an unknown angle. The objects shown in the images will roughly always have the same scale!
The images show area scans from a top-shot perspective. "Lines" are mostly walls and other objects are mostly trees and different kinds of plants.
Another problem was, that the left image was very blurry and the right one's lines were very thin.
To compensate for this difference I used dilation. The resulting images are the ones I uploaded.
Although It can easily be seen that these images match almost perfectly I cannot convince my algorithm of this fact.
My first idea was a feature based matching, but the matches are horrible. It only worked for a rotation angle of -90°, 0° and 90°. Although most descriptors are rotation invariant (in past projects they really were), the rotation invariance seems to fail for this example.
My second idea was to split the images into several smaller segments and to use template matching. So I segmented the images and, again, for the human eye they are pretty easy to match. The goal of this step was to segment the different walls and trees/plants.
The upper row are parts of the left, and the lower are parts of the right image. After the segmentation the segments were dilated again.
As already mentioned: Template matching failed, as did contour based template matching and contour matching.
I think the dilation of the images was very important, because it was nearly impossible for the human eye to match the segments without dilation before the segmentation. Another dilation after the segmentation made this even less difficult.
Your first job should be to fix the orientation. I am not sure what is the best algorithm to do that but here is an approach I would use: fix one of the images and start rotating the other. For each rotation compute a histogram for the color intense on each of the rows/columns. Compute some distance between the resulting vectors(e.g. use cross product). Choose the rotation that results in smallest cross product. It may be good idea to combine this approach with hill climbing.
Once you have the images aligned in approximately the same direction, I believe matching should be easier. As the two images are supposed to be at the same scale, compute something analogous to the geometrical center for both images: compute weighted sum of all pixels - a completely white pixel would have a weight of 1, and a completely black - weight 0, the sum should be a vector of size 2(x and y coordinate). After that divide those values by the dimensions of the image and call this "geometrical center of the image". Overlay the two images in a way that the two centers coincide and then once more compute cross product for the difference between the images. I would say this should be their difference.
You can also try following methods to find rotation and similarity.
Use image moments to get the rotation as shown here.
Once you rotate the image, use cross-correlation to evaluate the similarity.
EDIT
I tried this with OpenCV and C++ for the two sample images. I'm posting the code and results below as it seems to work well at least for the given samples.
Here's the function to calculate the orientation vector using image moments:
Mat orientVec(Mat& im)
{
Moments m = moments(im);
double cov[4] = {m.mu20/m.m00, m.mu11/m.m00, m.mu11/m.m00, m.mu02/m.m00};
Mat covMat(2, 2, CV_64F, cov);
Mat evals, evecs;
eigen(covMat, evals, evecs);
return evecs.row(0);
}
Rotate and match sample images:
Mat im1 = imread(INPUT_FOLDER_PATH + string("WojUi.png"), 0);
Mat im2 = imread(INPUT_FOLDER_PATH + string("XbrsV.png"), 0);
// get the orientation vector
Mat v1 = orientVec(im1);
Mat v2 = orientVec(im2);
double angle = acos(v1.dot(v2))*180/CV_PI;
// rotate im2. try rotating with -angle and +angle. here using -angle
Mat rot = getRotationMatrix2D(Point(im2.cols/2, im2.rows/2), -angle, 1.0);
Mat im2Rot;
warpAffine(im2, im2Rot, rot, Size(im2.rows, im2.cols));
// add a border to rotated image
int borderSize = im1.rows > im2.cols ? im1.rows/2 + 1 : im1.cols/2 + 1;
Mat im2RotBorder;
copyMakeBorder(im2Rot, im2RotBorder, borderSize, borderSize, borderSize, borderSize,
BORDER_CONSTANT, Scalar(0, 0, 0));
// normalized cross-correlation
Mat& image = im2RotBorder;
Mat& templ = im1;
Mat nxcor;
matchTemplate(image, templ, nxcor, CV_TM_CCOEFF_NORMED);
// take the max
double max;
Point maxPt;
minMaxLoc(nxcor, NULL, &max, NULL, &maxPt);
// draw the match
Mat rgb;
cvtColor(image, rgb, CV_GRAY2BGR);
rectangle(rgb, maxPt, Point(maxPt.x+templ.cols-1, maxPt.y+templ.rows-1), Scalar(0, 255, 255), 2);
cout << "max: " << max << endl;
With -angle rotation in code, I get max = 0.758. Below is the rotated image in this case with the matching region.
Otherwise max = 0.293