How to obtain the scale and rotation angle from LogPolar transform - c++

I'm trying to use LogPolar transform to obtain the scale and the rotation angle from two images. Below are two 300x300 sample images. The first rectangle is 100x100, and the second rectangle is 150x150, rotated by 45 degree.
The algorithm:
Convert both images to LogPolar.
Find the translational shift using Phase Correlation.
Convert the translational shift to scale and rotation angle (how to do this?).
My code:
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/imgproc/imgproc_c.h>
#include <opencv2/highgui/highgui.hpp>
#include <iostream>
int main()
{
cv::Mat a = cv::imread("rect1.png", 0);
cv::Mat b = cv::imread("rect2.png", 0);
if (a.empty() || b.empty())
return -1;
cv::imshow("a", a);
cv::imshow("b", b);
cv::Mat pa = cv::Mat::zeros(a.size(), CV_8UC1);
cv::Mat pb = cv::Mat::zeros(b.size(), CV_8UC1);
IplImage ipl_a = a, ipl_pa = pa;
IplImage ipl_b = b, ipl_pb = pb;
cvLogPolar(&ipl_a, &ipl_pa, cvPoint2D32f(a.cols >> 1, a.rows >> 1), 40);
cvLogPolar(&ipl_b, &ipl_pb, cvPoint2D32f(b.cols >> 1, b.rows >> 1), 40);
cv::imshow("logpolar a", pa);
cv::imshow("logpolar b", pb);
cv::Mat pa_64f, pb_64f;
pa.convertTo(pa_64f, CV_64F);
pb.convertTo(pb_64f, CV_64F);
cv::Point2d pt = cv::phaseCorrelate(pa_64f, pb_64f);
std::cout << "Shift = " << pt
<< "Rotation = " << cv::format("%.2f", pt.y*180/(a.cols >> 1))
<< std::endl;
cv::waitKey(0);
return 0;
}
The log polar images:
For the sample image images above, the translational shift is (16.2986, 36.9105). I have successfully obtain the rotation angle, which is 44.29. But I have difficulty in calculating the scale. How to convert the given translational shift to obtain the scale?

You have two Images f1, f2 with f1(m, n) = f2(m/a , n/a) That is f1 is scaled by factor a
In logarithmic notation that is equivalent to f1(log m, log n) = f2(logm − log a, log n − log a) where log a is the shift in your phasecorrelated image.
Compare B. S. Reddy, B. N. Chatterji: An FFT-Based Technique for Translation, Rotation and
Scale-Invariant Image Registration, IEEE Transactions On Image Processing Vol. 5
No. 8, IEEE, 1996
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.185.4387&rep=rep1&type=pdf

here is python version
which tells
ir = abs(ifft2((f0 * f1.conjugate()) / r0))
i0, i1 = numpy.unravel_index(numpy.argmax(ir), ir.shape)
angle = 180.0 * i0 / ir.shape[0]
scale = log_base ** i1

The value for the scale factor is indeed exp(pt.y). However, since you used a "magnitude scale parameter" of 40 for the cvLogPolar function, you now need to divide pt.x by 40 to get the correct value for the displacement:
Scale = exp( pt.x / 40) = exp(16.2986 / 40) = 1.503
The value of the "magnitude scale parameter" for the cvLogPolar function does not affect the displacement produced by the rotation angle pt.x, because according to the math, it cancels out. For that reason, your formula for the rotation gives the correct value.
On another note, I believe the formula for the rotation should actually be:
Rotation = pt.y*360/(a.cols)
But, for some strange reason, the ">> 1" that you added is causing the result to be multiplied by 2 (which I believe you compensated for by multiplying by 180 instead of 360?) Remove it, and you'll see what I mean.
Also, ">>1" is causing a division by 2 in:
cvPoint2D32f(a.cols >> 1, a.rows >> 1)
If to set the center parameter of the cvLogPolar function to the center of the image (which is what you want):
cvPoint2D32f(a.cols/2, a.rows/2)
and
cvPoint2D32f(b.cols/2, b.rows/2)
then, you'll also get the correct value for the rotation (i.e. the same value that you got), and for the scale.

This thread was helpful in getting me started on rotation-invariant phase correlation, so I hope my input will help resolve any lingering issues.
We aim to calculate the scale and rotation (which is incorrectly calculated in the code). Let's start by gathering the equations from the logPolar docs. There they state the following:
(1) I = (dx,dy) = (x-center.x, y-center.y)
(2) rho = M * ln(magnitude(I))
(3) phi = Ky * angle(I)_0..360
Note: rho is pt.x and phi is pt.y in the code above
We also know that
(4) M = src.cols/ln(maxRadius)
(5) Ky = src.rows/360
First, let's solve for scale. Solving for magnitude(I) (i.e. scale) in equation 2, we get
(6) magnitude(I) = scale = exp(rho/M)
Then we substitute for M and simplify to get
(7) magnitude(I) = scale = exp(rho*ln(maxRadius)/src.cols) = pow(maxRadius, rho/src.cols)
Now let's solve for rotation. Solving for angle(I) (i.e. rotation) in equation 3, we get
(8) angle(I) = rotation = phi/Ky
Then we substitute for Ky and simplify to get
(9) angle(I) = rotation = phi*360/src.rows
So, scale and rotation can be calculated using equations 7 and 9, respectively. It might be worth noting that you should use equation 4 for calculation M and Point2f center( (float)a.cols/2, (float)a.rows/2 ) for calculating center as opposed to what is in the code above. There are good bits of info in this logpolar example opencv code.

From the values by phase correlation, the coordinates are rectangular coordinates hence (16.2986, 36.9105) are (x,y). The scale is calculated as
scale = log((x^2 + y^ 2)^0.5) which is approximately 1.6(near to 1.5).
When we calculate angle by using formulae theta = arctan(y/x) = 66(approx).
The theta value is way of the real value(45 in this case).

Related

What is the correct way to Normalize corresponding points before estimation of Fundamental matrix in OpenCV C++?

I am trying to manually implement a fundamental matrix estimation function for corresponding points (based on similarities between two images). The corresponding points are obtained after performing ORB feature detection, extraction, matching and ratio test.
There is a lot of literature available on good sources about this topic. However none of them appear to give a good pseudo-code for doing the process. I went through various Chapters on Multiple View Geometry book; and also many online sources.
This source appears to give a formula for doing the normalization and I followed the formula mentioned on page 13 of this source.
Based on this formula, I created the following algorithm. I am not sure if I am doing it the right way though !
Normalization.hpp
class Normalization {
typedef std::vector <cv::Point2f> intercepts;
typedef std::vector<cv::Mat> matVec;
public:
Normalization () {}
~Normalization () {}
void makeAverage(intercepts pointsVec);
std::tuple <cv::Mat, cv::Mat> normalize(intercepts pointsVec);
matVec getNormalizedPoints(intercepts pointsVec);
private:
double xAvg = 0;
double yAvg = 0;
double count = 0;
matVec normalizedPts;
double distance = 0;
matVec matVecData;
cv::Mat forwardTransform;
cv::Mat reverseTransform;
};
Normalization.cpp
#include "Normalization.hpp"
typedef std::vector <cv::Point2f> intercepts;
typedef std::vector<cv::Mat> matVec;
/*******
*#brief : The makeAverage function receives the input 2D coordinates (x, y)
* and creates the average of x and y
*#params : The input parameter is a set of all matches (x, y pairs) in image A
************/
void Normalization::makeAverage(intercepts pointsVec) {
count = pointsVec.size();
for (auto& member : pointsVec) {
xAvg = xAvg + member.x;
yAvg = yAvg + member.y;
}
xAvg = xAvg / count;
yAvg = yAvg / count;
}
/*******
*#brief : The normalize function accesses the average distance calculated
* in the previous step and calculates the forward and inverse transformation
* matrices
*#params : The input to this function is a vector of corresponding points in given image
*#return : The returned data is a tuple of forward and inverse transformation matrices
*************/
std::tuple <cv::Mat, cv::Mat> Normalization::normalize(intercepts pointsVec) {
for (auto& member : pointsVec) {
// Accumulate the distance for every point
distance += ((1 / (count * std::sqrt(2))) *\
(std::sqrt(std::pow((member.x - xAvg), 2)\
+ std::pow((member.y - yAvg), 2))));
}
forwardTransform = (cv::Mat_<double>(3, 3) << (1 / distance), \
0, -(xAvg / distance), 0, (1 / distance), \
-(yAvg / distance), 0, 0, 1);
reverseTransform = (cv::Mat_<double>(3, 3) << distance, 0, xAvg, \
0, distance, yAvg, 0, 0, 1);
return std::make_tuple(forwardTransform, reverseTransform);
}
/*******
*#brief : The getNormalizedPoints function trannsforms the raw image coordinates into
* transformed coordinates using the forwardTransform matrix estimated in previous step
*#params : The input to this function is a vector of corresponding points in given image
*#return : The returned data is vector of transformed coordinates
*************/
matVec Normalization::getNormalizedPoints(intercepts pointsVec) {
cv::Mat triplet;
for (auto& member : pointsVec) {
triplet = (cv::Mat_<double>(3, 1) << member.x, member.y, 1);
matVecData.emplace_back(forwardTransform * triplet);
}
return matVecData;
}
Is this the right way ? Are there other ways of Normalization ?
I think you can do it your way but in "Multiple View Geometry in Computer Vision" Hartley and Zisserman recommend isotropic scaling (p. 107):
Isotropic scaling. As a first step of normalization, the coordinates in each image are
translated (by a different translation for each image) so as to bring the centroid of the
set of all points to the origin. The coordinates are also scaled so that on the average a
point x is of the form x = (x, y,w)T, with each of x, y and w having the same average
magnitude. Rather than choose different scale factors for each coordinate direction, an
isotropic scaling factor is chosen so that the x and y-coordinates of a point are scaled
equally. To this end, we choose to scale the coordinates so that the average distance of
a point x from the origin is equal to
√
2. This means that the “average” point is equal
to (1, 1, 1)T. In summary the transformation is as follows:
(i) The points are translated so that their centroid is at the origin.
(ii) The points are then scaled so that the average distance from the origin is equal
to √2.
(iii) This transformation is applied to each of the two images independently.
They state that it is important for the direct linear transformation (DLT) but even more important for the calculation of a Fundamental Matrix like you want to do.
The algorithm you chose, normalized the point coordinates to (1, 1, 1) but did not apply a scaling so that the average distance from the origin is equal to √2.
Here is some code for this type of normalization. The averaging step stayed the same:
std::vector<cv::Mat> normalize(std::vector<cv::Point2d> pointsVec) {
// Averaging
double count = (double) pointsVec.size();
double xAvg = 0;
double yAvg = 0;
for (auto& member : pointsVec) {
xAvg = xAvg + member.x;
yAvg = yAvg + member.y;
}
xAvg = xAvg / count;
yAvg = yAvg / count;
// Normalization
std::vector<cv::Mat> points3d;
std::vector<double> distances;
for (auto& member : pointsVec) {
double distance = (std::sqrt(std::pow((member.x - xAvg), 2) + std::pow((member.y - yAvg), 2)));
distances.push_back(distance);
}
double xy_norm = std::accumulate(distances.begin(), distances.end(), 0.0) / distances.size();
// Create a matrix transforming the points into having mean (0,0) and mean distance to the center equal to sqrt(2)
cv::Mat_<double> Normalization_matrix(3, 3);
double diagonal_element = sqrt(2) / xy_norm;
double element_13 = -sqrt(2) * xAvg / xy_norm;
double element_23 = -sqrt(2)* yAvg/ xy_norm;
Normalization_matrix << diagonal_element, 0, element_13,
0, diagonal_element, element_23,
0, 0, 1;
// Multiply the original points with the normalization matrix
for (auto& member : pointsVec) {
cv::Mat triplet = (cv::Mat_<double>(3, 1) << member.x, member.y, 1);
points3d.emplace_back(Normalization_matrix * triplet);
}
return points3d;
}

Irrlicht: draw 2D image in 3D space based on four corner coordinates

I would like to create a function to position a free-floating 2D raster image in space with the Irrlicht engine. The inspiration for this is the function rgl::show2d in the R package rgl. An example implementation in R can be found here.
The input data should be limited to the path to the image and a table with the four corner coordinates of the respective plot rectangle.
My first, pretty primitive and finally unsuccessful approach to realize this with irrlicht:
Create a cube:
ISceneNode * picturenode = scenemgr->addCubeSceneNode();
Flatten one side:
picturenode->setScale(vector3df(1, 0.001, 1));
Add image as texture:
picturenode->setMaterialTexture(0, driver->getTexture("path/to/image.png"));
Place flattened cube at the center position of the four corner coordinates. I just calculate the mean coordinates on all three axes with a small function position_calc().
vector3df position = position_calc(rcdf); picturenode->setPosition(position);
Determine the object rotation by calculating the normal of the plane defined by the four corner coordinates, normalizing the result and trying to somehow translate the resulting vector to rotation angles.
vector3df normal = normal_calc(rcdf);
vector3df angles = (normal.normalize()).getSphericalCoordinateAngles();
picturenode->setRotation(angles);
This solution doesn't produce the expected result. The rotation calculation is wrong. With this approach I'm also not able to scale the image correctly to it's corner coordinates.
How can I fix my workflow? Or is there a much better way to achieve this with Irrlicht that I'm not aware of?
Edit: Thanks to #spug I believe I'm almost there. I tried to implement his method 2, because quaternions are already available in Irrlicht. Here's what I came up with to calculate the rotation:
#include <Rcpp.h>
#include <irrlicht.h>
#include <math.h>
using namespace Rcpp;
core::vector3df rotation_calc(DataFrame rcdf) {
NumericVector x = rcdf["x"];
NumericVector y = rcdf["y"];
NumericVector z = rcdf["z"];
// Z-axis
core::vector3df zaxis(0, 0, 1);
// resulting image's normal
core::vector3df normal = normal_calc(rcdf);
// calculate the rotation from the original image's normal (i.e. the Z-axis)
// to the resulting image's normal => quaternion P.
core::quaternion p;
p.rotationFromTo(zaxis, normal);
// take the midpoint of AB from the diagram in method 1, and rotate it with
// the quaternion P => vector U.
core::vector3df MAB(0, 0.5, 0);
core::quaternion m(MAB.X, MAB.Y, MAB.Z, 0);
core::quaternion rot = p * m * p.makeInverse();
core::vector3df u(rot.X, rot.Y, rot.Z);
// calculate the rotation from U to the midpoint of DE => quaternion Q
core::vector3df MDE(
(x(0) + x(1)) / 2,
(y(0) + y(1)) / 2,
(z(0) + z(1)) / 2
);
core::quaternion q;
q.rotationFromTo(u, MDE);
// multiply in the order Q * P, and convert to Euler angles
core::quaternion f = q * p;
core::vector3df euler;
f.toEuler(euler);
// to degrees
core::vector3df degrees(
euler.X * (180.0 / M_PI),
euler.Y * (180.0 / M_PI),
euler.Z * (180.0 / M_PI)
);
Rcout << "degrees: " << degrees.X << ", " << degrees.Y << ", " << degrees.Z << std::endl;
return degrees;
}
The result is almost correct, but the rotation on one axis is wrong. Is there a way to fix this or is my implementation inherently flawed?
That's what the result looks like now. The points mark the expected corner points.
I've thought of two ways to do this; neither are very graceful - not helped by Irrlicht restricting us to spherical polars.
NB. the below assumes rcdf is centered at the origin; this is to make the rotation calculation a bit more straightforward. Easy to fix though:
Compute the center point (the translational offset) of rcdf
Subtract this from all the points of rcdf
Perform the procedures below
Add the offset back to the result points.
Pre-requisite: scaling
This is easy; simply calculate the ratios of width and height in your rcdf to your original image, then call setScaling.
Method 1: matrix inversion
For this we need an external library which supports 3x3 matrices, since Irrlicht only has 4x4 (I believe).
We need to solve the matrix equation which rotates the image from X-Y to rcdf. For this we need 3 points in each frame of reference. Two of these we can immediately set to adjacent corners of the image; the third must point out of the plane of the image (since we need data in all three dimensions to form a complete basis) - so to calculate it, simply multiply the normal of each image by some offset constant (say 1).
(Note the points on the original image have been scaled)
The equation to solve is therefore:
(Using column notation). The Eigen library offers an implementation for 3x3 matrices and inverse.
Then convert this matrix to spherical polar angles: https://www.learnopencv.com/rotation-matrix-to-euler-angles/
Method 2:
To calculate the quaternion to rotate from direction vector A to B: Finding quaternion representing the rotation from one vector to another
Calculate the rotation from the original image's normal (i.e. the Z-axis) to rcdf's normal => quaternion P.
Take the midpoint of AB from the diagram in method 1, and rotate it with the quaternion P (http://www.geeks3d.com/20141201/how-to-rotate-a-vertex-by-a-quaternion-in-glsl/) => vector U.
Calculate the rotation from U to the midpoint of DE => quaternion Q
Multiply in the order Q * P, and convert to Euler angles: https://en.wikipedia.org/wiki/Conversion_between_quaternions_and_Euler_angles
(Not sure if Irrlicht has support for quaternions)

How to fit a plane to a 3D point cloud?

I want to fit a plane to a 3D point cloud. I use a RANSAC approach, where I sample several points from the point cloud, calculate the plane, and store the plane with the smallest error. The error is the distance between the points and the plane. I want to do this in C++, using Eigen.
So far, I sample points from the point cloud and center the data. Now, I need to fit the plane to the samples points. I know I need to solve Mx = 0, but how do I do this? So far I have M (my samples), I want to know x (the plane) and this fit needs to be as close to 0 as possible.
I have no idea where to continue from here. All I have are my sampled points and I need more data.
From you question I assume that you are familiar with the Ransac algorithm, so I will spare you of lengthy talks.
In a first step, you sample three random points. You can use the Random class for that but picking them not truly random usually gives better results. To those points, you can simply fit a plane using Hyperplane::Through.
In the second step, you repetitively cross out some points with large Hyperplane::absDistance and perform a least-squares fit on the remaining ones. It may look like this:
Vector3f mu = mean(points);
Matrix3f covar = covariance(points, mu);
Vector3 normal = smallest_eigenvector(covar);
JacobiSVD<Matrix3f> svd(covariance, ComputeFullU);
Vector3f normal = svd.matrixU().col(2);
Hyperplane<float, 3> result(normal, mu);
Unfortunately, the functions mean and covariance are not built-in, but they are rather straightforward to code.
Recall that the equation for a plane passing through origin is Ax + By + Cz = 0, where (x, y, z) can be any point on the plane and (A, B, C) is the normal vector perpendicular to this plane.
The equation for a general plane (that may or may not pass through origin) is Ax + By + Cz + D = 0, where the additional coefficient D represents how far the plane is away from the origin, along the direction of the normal vector of the plane. [Note that in this equation (A, B, C) forms a unit normal vector.]
Now, we can apply a trick here and fit the plane using only provided point coordinates. Divide both sides by D and rearrange this term to the right-hand side. This leads to A/D x + B/D y + C/D z = -1. [Note that in this equation (A/D, B/D, C/D) forms a normal vector with length 1/D.]
We can set up a system of linear equations accordingly, and then solve it by an Eigen solver as follows.
// Example for 5 points
Eigen::Matrix<double, 5, 3> matA; // row: 5 points; column: xyz coordinates
Eigen::Matrix<double, 5, 1> matB = -1 * Eigen::Matrix<double, 5, 1>::Ones();
// Find the plane normal
Eigen::Vector3d normal = matA.colPivHouseholderQr().solve(matB);
// Check if the fitting is healthy
double D = 1 / normal.norm();
normal.normalize(); // normal is a unit vector from now on
bool planeValid = true;
for (int i = 0; i < 5; ++i) { // compare Ax + By + Cz + D with 0.2 (ideally Ax + By + Cz + D = 0)
if ( fabs( normal(0)*matA(i, 0) + normal(1)*matA(i, 1) + normal(2)*matA(i, 2) + D) > 0.2) {
planeValid = false; // 0.2 is an experimental threshold; can be tuned
break;
}
}
This method is equivalent to the typical SVD-based method, but much faster. It is suitable for use when points are known to be roughly in a plane shape. However, the SVD-based method is more numerically stable (when the plane is far far away from origin) and robust to outliers.

Calculating scale, rotation and translation from Homography matrix

I am trying to calculate scale, rotation and translation between two consecutive frames of a video. So basically I matched keypoints and then used opencv function findHomography() to calculate the homography matrix.
homography = findHomography(feature1 , feature2 , CV_RANSAC); //feature1 and feature2 are matched keypoints
My question is: How can I use this matrix to calculate scale, rotation and translation?.
Can anyone provide me the code or explanation as to how to do it?
if you can use opencv 3.0, this decomposition method is available
http://docs.opencv.org/3.0-beta/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#decomposehomographymat
The right answer is to use homography as it is defined dst = H ⋅ src and explore what it does to small segments around a particular point.
Translation
Given a single point, for translation do
T = dst - (H ⋅ src)
Rotation
Given two points p1 and p2
p1 = H ⋅ p1
p2 = H ⋅ p2
Now just calculate the angle between vectors p1 p2 and p1' p2'.
Scale
You can use the same trick but now just compare the lengths: |p1 p2| and |p1' p2'|.
To be fair, use another segment orthogonal to the first and average the result. You will see that there is no constant scale factor or translation one. They will depend on the src location.
Given Homography matrix H:
|H_00, H_01, H_02|
H = |H_10, H_11, H_12|
|H_20, H_21, H_22|
Assumptions:
H_20 = H_21 = 0 and normalized to H_22 = 1 to obtain 8 DOF.
The translation along x and y axes are directly calculated from H:
tx = H_02
ty = H_12
The 2x2 sub matrix on the top left corner is decomposed to calculate shear, scaling and rotation. An easy and quick decomposition method is explained here.
Note: this method assumes invertible matrix.
Since i had to struggle for a couple of days to create my homography transformation function I'm going to put it here for the benefit of everyone.
Here you can see the main loop where every input position is multiplied by the homography matrix h. Then the result is used to copy the pixel from the original position to the destination position.
for (tempIn[0] = 0; tempIn[0] < stride; tempIn[0]++)
{
for (tempIn[1] = 0; tempIn[1] < rows; tempIn[1]++)
{
double w = h[6] * tempIn[0] + h[7] * tempIn[1] + 1; // very important!
//H_20 = H_21 = 0 and normalized to H_22 = 1 to obtain 8 DOF. <-- this is wrong
tempOut[0] = ((h[0] * tempIn[0]) + (h[1] * tempIn[1]) + h[2])/w;
tempOut[1] =(( h[3] * tempIn[0]) +(h[4] * tempIn[1]) + h[5])/w;
if (tempOut[1] < destSize && tempOut[0] < destSize && tempOut[0] >= 0 && tempOut[1] >= 0)
dest_[destStride * tempOut[1] + tempOut[0]] = src_[stride * tempIn[1] + tempIn[0]];
}
}
After such process an image with some kind of grid will be produced. Some kind of filter is needed to remove the grid. In my code i have used a simple linear filter.
Note: Only the central part of the original image is really required for producing a correct image. Some rows and columns can be safely discarded.
For estimating a tree-dimensional transform and rotation induced by a homography, there exist multiple approaches. One of them provides closed formulas for decomposing the homography, but they are very complex. Also, the solutions are never unique.
Luckily, OpenCV 3 already implements this decomposition (decomposeHomographyMat). Given an homography and a correctly scaled intrinsics matrix, the function provides a set of four possible rotations and translations.
The question seems to be about 2D parameters. Homography matrix captures perspective distortion. If the application does not create much perspective distortion, one can approximate a real world transformation using affine transformation matrix (that uses only scale, rotation, translation and no shearing/flipping). The following link will give an idea about decomposing an affine transformation into different parameters.
https://math.stackexchange.com/questions/612006/decomposing-an-affine-transformation

findHomography, getPerspectiveTransform, & getAffineTransform

This question is on the OpenCV functions findHomography, getPerspectiveTransform & getAffineTransform
What is the difference between findHomography and getPerspectiveTransform?. My understanding from the documentation is that getPerspectiveTransform computes the transform using 4 correspondences (which is the minimum required to compute a homography/perspective transform) where as findHomography computes the transform even if you provide more than 4 correspondencies (presumably using something like a least squares method?).
Is this correct?
(In which case the only reason OpenCV still continues to support getPerspectiveTransform should be legacy? )
My next concern is that I want to know if there is an equivalent to findHomography for computing an Affine transformation? i.e. a function which uses a least squares or an equivalent robust method to compute and affine transformation.
According to the documentation getAffineTransform takes in only 3 correspondences (which is the min required to compute an affine transform).
Best,
Q #1: Right, the findHomography tries to find the best transform between two sets of points. It uses something smarter than least squares, called RANSAC, which has the ability to reject outliers - if at least 50% + 1 of your data points are OK, RANSAC will do its best to find them, and build a reliable transform.
The getPerspectiveTransform has a lot of useful reasons to stay - it is the base for findHomography, and it is useful in many situations where you only have 4 points, and you know they are the correct ones. The findHomography is usually used with sets of points detected automatically - you can find many of them, but with low confidence. getPerspectiveTransform is good when you kn ow for sure 4 corners - like manual marking, or automatic detection of a rectangle.
Q #2 There is no equivalent for affine transforms. You can use findHomography, because affine transforms are a subset of homographies.
I concur with everything #vasile has written. I just want to add some observations:
getPerspectiveTransform() and getAffineTransform() are meant to work on 4 or 3 points (respectively), that are known to be correct correspondences. On real-life images taken with a real camera, you can never get correspondences that accurate, not with automatic nor manual marking of the corresponding points.
There are always outliers. Just look at the simple case of wanting to fit a curve through points (e.g. take a generative equation with noise y1 = f(x) = 3.12x + gauss_noise or y2 = g(x) = 0.1x^2 + 3.1x + gauss_noise): it will be much more easier to find a good quadratic function to estimate the points in both cases, than a good linear one. Quadratic might be an overkill, but in most cases will not be (after removing outliers), and if you want to fit a straight line there you better be mightily sure that is the right model, otherwise you are going to get unusable results.
That said, if you are mightily sure that affine transform is the right one, here's a suggestion:
use findHomography, that has RANSAC incorporated in to the functionality, to get rid of the outliers and get an initial estimate of the image transformation
select 3 correct matches-correspondances (that fit with the homography found), or reproject 3 points from the 1st image to the 2nd (using the homography)
use those 3 matches (that are as close to correct as you can get) in getAffineTransform()
wrap all of that in your own findAffine() if you want - and voila!
Re Q#2, estimateRigidTransform is the oversampled equivalent of getAffineTransform. I don't know if it was in OCV when this was first posted, but it's available in 2.4.
There is an easy solution for the finding the Affine transform for the system of over-determined equations.
Note that in general an Affine transform finds a solution to the over-determined system of linear equations Ax=B by using a pseudo-inverse or a similar technique, so
x = (A At )-1 At B
Moreover, this is handled in the core openCV functionality by a simple call to solve(A, B, X).
Familiarize yourself with the code of Affine transform in opencv/modules/imgproc/src/imgwarp.cpp: it really does just two things:
a. rearranges inputs to create a system Ax=B;
b. then calls solve(A, B, X);
NOTE: ignore the function comments in the openCV code - they are confusing and don’t reflect the actual ordering of the elements in the matrices. If you are solving [u, v]’= Affine * [x, y, 1] the rearrangement is:
x1 y1 1 0 0 1
0 0 0 x1 y1 1
x2 y2 1 0 0 1
A = 0 0 0 x2 y2 1
x3 y3 1 0 0 1
0 0 0 x3 y3 1
X = [Affine11, Affine12, Affine13, Affine21, Affine22, Affine23]’
u1 v1
B = u2 v2
u3 v3
All you need to do is to add more points. To make Solve(A, B, X) work on over-determined system add DECOMP_SVD parameter. To see the powerpoint slides on the topic, use this link. If you’d like to learn more about the pseudo-inverse in the context of computer vision, the best source is: ComputerVision, see chapter 15 and appendix C.
If you are still unsure how to add more points see my code below:
// extension for n points;
cv::Mat getAffineTransformOverdetermined( const Point2f src[], const Point2f dst[], int n )
{
Mat M(2, 3, CV_64F), X(6, 1, CV_64F, M.data); // output
double* a = (double*)malloc(12*n*sizeof(double));
double* b = (double*)malloc(2*n*sizeof(double));
Mat A(2*n, 6, CV_64F, a), B(2*n, 1, CV_64F, b); // input
for( int i = 0; i < n; i++ )
{
int j = i*12; // 2 equations (in x, y) with 6 members: skip 12 elements
int k = i*12+6; // second equation: skip extra 6 elements
a[j] = a[k+3] = src[i].x;
a[j+1] = a[k+4] = src[i].y;
a[j+2] = a[k+5] = 1;
a[j+3] = a[j+4] = a[j+5] = 0;
a[k] = a[k+1] = a[k+2] = 0;
b[i*2] = dst[i].x;
b[i*2+1] = dst[i].y;
}
solve( A, B, X, DECOMP_SVD );
delete a;
delete b;
return M;
}
// call original transform
vector<Point2f> src(3);
vector<Point2f> dst(3);
src[0] = Point2f(0.0, 0.0);src[1] = Point2f(1.0, 0.0);src[2] = Point2f(0.0, 1.0);
dst[0] = Point2f(0.0, 0.0);dst[1] = Point2f(1.0, 0.0);dst[2] = Point2f(0.0, 1.0);
Mat M = getAffineTransform(Mat(src), Mat(dst));
cout<<M<<endl;
// call new transform
src.resize(4); src[3] = Point2f(22, 2);
dst.resize(4); dst[3] = Point2f(22, 2);
Mat M2 = getAffineTransformOverdetermined(src.data(), dst.data(), src.size());
cout<<M2<<endl;
getAffineTransform:affine transform is combination of translation, scale, shear, and rotation
https://www.mathworks.com/discovery/affine-transformation.html
https://www.tutorialspoint.com/computer_graphics/2d_transformation.htm
getPerspectiveTransform:perspective transform is project mapping
enter image description here