Calculating scale, rotation and translation from Homography matrix

Calculating scale, rotation and translation from Homography matrix - c++

I am trying to calculate scale, rotation and translation between two consecutive frames of a video. So basically I matched keypoints and then used opencv function findHomography() to calculate the homography matrix.
homography = findHomography(feature1 , feature2 , CV_RANSAC); //feature1 and feature2 are matched keypoints
My question is: How can I use this matrix to calculate scale, rotation and translation?.
Can anyone provide me the code or explanation as to how to do it?

if you can use opencv 3.0, this decomposition method is available
http://docs.opencv.org/3.0-beta/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#decomposehomographymat

The right answer is to use homography as it is defined dst = H ⋅ src and explore what it does to small segments around a particular point.
Translation
Given a single point, for translation do
T = dst - (H ⋅ src)
Rotation
Given two points p1 and p2
p1 = H ⋅ p1
p2 = H ⋅ p2
Now just calculate the angle between vectors p1 p2 and p1' p2'.
Scale
You can use the same trick but now just compare the lengths: |p1 p2| and |p1' p2'|.
To be fair, use another segment orthogonal to the first and average the result. You will see that there is no constant scale factor or translation one. They will depend on the src location.

Given Homography matrix H:
|H_00, H_01, H_02|
H = |H_10, H_11, H_12|
|H_20, H_21, H_22|
Assumptions:
H_20 = H_21 = 0 and normalized to H_22 = 1 to obtain 8 DOF.
The translation along x and y axes are directly calculated from H:
tx = H_02
ty = H_12
The 2x2 sub matrix on the top left corner is decomposed to calculate shear, scaling and rotation. An easy and quick decomposition method is explained here.
Note: this method assumes invertible matrix.

Since i had to struggle for a couple of days to create my homography transformation function I'm going to put it here for the benefit of everyone.
Here you can see the main loop where every input position is multiplied by the homography matrix h. Then the result is used to copy the pixel from the original position to the destination position.
for (tempIn[0] = 0; tempIn[0] < stride; tempIn[0]++)
{
for (tempIn[1] = 0; tempIn[1] < rows; tempIn[1]++)
{
double w = h[6] * tempIn[0] + h[7] * tempIn[1] + 1; // very important!
//H_20 = H_21 = 0 and normalized to H_22 = 1 to obtain 8 DOF. <-- this is wrong
tempOut[0] = ((h[0] * tempIn[0]) + (h[1] * tempIn[1]) + h[2])/w;
tempOut[1] =(( h[3] * tempIn[0]) +(h[4] * tempIn[1]) + h[5])/w;
if (tempOut[1] < destSize && tempOut[0] < destSize && tempOut[0] >= 0 && tempOut[1] >= 0)
dest_[destStride * tempOut[1] + tempOut[0]] = src_[stride * tempIn[1] + tempIn[0]];
}
}
After such process an image with some kind of grid will be produced. Some kind of filter is needed to remove the grid. In my code i have used a simple linear filter.
Note: Only the central part of the original image is really required for producing a correct image. Some rows and columns can be safely discarded.

For estimating a tree-dimensional transform and rotation induced by a homography, there exist multiple approaches. One of them provides closed formulas for decomposing the homography, but they are very complex. Also, the solutions are never unique.
Luckily, OpenCV 3 already implements this decomposition (decomposeHomographyMat). Given an homography and a correctly scaled intrinsics matrix, the function provides a set of four possible rotations and translations.

The question seems to be about 2D parameters. Homography matrix captures perspective distortion. If the application does not create much perspective distortion, one can approximate a real world transformation using affine transformation matrix (that uses only scale, rotation, translation and no shearing/flipping). The following link will give an idea about decomposing an affine transformation into different parameters.
https://math.stackexchange.com/questions/612006/decomposing-an-affine-transformation

Related

Irrlicht: draw 2D image in 3D space based on four corner coordinates

I would like to create a function to position a free-floating 2D raster image in space with the Irrlicht engine. The inspiration for this is the function rgl::show2d in the R package rgl. An example implementation in R can be found here.
The input data should be limited to the path to the image and a table with the four corner coordinates of the respective plot rectangle.
My first, pretty primitive and finally unsuccessful approach to realize this with irrlicht:
Create a cube:
ISceneNode * picturenode = scenemgr->addCubeSceneNode();
Flatten one side:
picturenode->setScale(vector3df(1, 0.001, 1));
Add image as texture:
picturenode->setMaterialTexture(0, driver->getTexture("path/to/image.png"));
Place flattened cube at the center position of the four corner coordinates. I just calculate the mean coordinates on all three axes with a small function position_calc().
vector3df position = position_calc(rcdf); picturenode->setPosition(position);
Determine the object rotation by calculating the normal of the plane defined by the four corner coordinates, normalizing the result and trying to somehow translate the resulting vector to rotation angles.
vector3df normal = normal_calc(rcdf);
vector3df angles = (normal.normalize()).getSphericalCoordinateAngles();
picturenode->setRotation(angles);
This solution doesn't produce the expected result. The rotation calculation is wrong. With this approach I'm also not able to scale the image correctly to it's corner coordinates.
How can I fix my workflow? Or is there a much better way to achieve this with Irrlicht that I'm not aware of?
Edit: Thanks to #spug I believe I'm almost there. I tried to implement his method 2, because quaternions are already available in Irrlicht. Here's what I came up with to calculate the rotation:
#include <Rcpp.h>
#include <irrlicht.h>
#include <math.h>
using namespace Rcpp;
core::vector3df rotation_calc(DataFrame rcdf) {
NumericVector x = rcdf["x"];
NumericVector y = rcdf["y"];
NumericVector z = rcdf["z"];
// Z-axis
core::vector3df zaxis(0, 0, 1);
// resulting image's normal
core::vector3df normal = normal_calc(rcdf);
// calculate the rotation from the original image's normal (i.e. the Z-axis)
// to the resulting image's normal => quaternion P.
core::quaternion p;
p.rotationFromTo(zaxis, normal);
// take the midpoint of AB from the diagram in method 1, and rotate it with
// the quaternion P => vector U.
core::vector3df MAB(0, 0.5, 0);
core::quaternion m(MAB.X, MAB.Y, MAB.Z, 0);
core::quaternion rot = p * m * p.makeInverse();
core::vector3df u(rot.X, rot.Y, rot.Z);
// calculate the rotation from U to the midpoint of DE => quaternion Q
core::vector3df MDE(
(x(0) + x(1)) / 2,
(y(0) + y(1)) / 2,
(z(0) + z(1)) / 2
);
core::quaternion q;
q.rotationFromTo(u, MDE);
// multiply in the order Q * P, and convert to Euler angles
core::quaternion f = q * p;
core::vector3df euler;
f.toEuler(euler);
// to degrees
core::vector3df degrees(
euler.X * (180.0 / M_PI),
euler.Y * (180.0 / M_PI),
euler.Z * (180.0 / M_PI)
);
Rcout << "degrees: " << degrees.X << ", " << degrees.Y << ", " << degrees.Z << std::endl;
return degrees;
}
The result is almost correct, but the rotation on one axis is wrong. Is there a way to fix this or is my implementation inherently flawed?
That's what the result looks like now. The points mark the expected corner points.

I've thought of two ways to do this; neither are very graceful - not helped by Irrlicht restricting us to spherical polars.
NB. the below assumes rcdf is centered at the origin; this is to make the rotation calculation a bit more straightforward. Easy to fix though:
Compute the center point (the translational offset) of rcdf
Subtract this from all the points of rcdf
Perform the procedures below
Add the offset back to the result points.
Pre-requisite: scaling
This is easy; simply calculate the ratios of width and height in your rcdf to your original image, then call setScaling.
Method 1: matrix inversion
For this we need an external library which supports 3x3 matrices, since Irrlicht only has 4x4 (I believe).
We need to solve the matrix equation which rotates the image from X-Y to rcdf. For this we need 3 points in each frame of reference. Two of these we can immediately set to adjacent corners of the image; the third must point out of the plane of the image (since we need data in all three dimensions to form a complete basis) - so to calculate it, simply multiply the normal of each image by some offset constant (say 1).
(Note the points on the original image have been scaled)
The equation to solve is therefore:
(Using column notation). The Eigen library offers an implementation for 3x3 matrices and inverse.
Then convert this matrix to spherical polar angles: https://www.learnopencv.com/rotation-matrix-to-euler-angles/
Method 2:
To calculate the quaternion to rotate from direction vector A to B: Finding quaternion representing the rotation from one vector to another
Calculate the rotation from the original image's normal (i.e. the Z-axis) to rcdf's normal => quaternion P.
Take the midpoint of AB from the diagram in method 1, and rotate it with the quaternion P (http://www.geeks3d.com/20141201/how-to-rotate-a-vertex-by-a-quaternion-in-glsl/) => vector U.
Calculate the rotation from U to the midpoint of DE => quaternion Q
Multiply in the order Q * P, and convert to Euler angles: https://en.wikipedia.org/wiki/Conversion_between_quaternions_and_Euler_angles
(Not sure if Irrlicht has support for quaternions)

Pass vector<Point2f> to getAffineTransform

I'm trying to calculate affine transformation between two consecutive frames from a video. So I have found the features and got the matched points in the two frames.
FastFeatureDetector detector;
vector<Keypoints> frame1_features;
vector<Keypoints> frame2_features;
detector.detect(frame1 , frame1_features , Mat());
detector.detect(frame2 , frame2_features , Mat());
vector<Point2f> features1; //matched points in 1st image
vector<Point2f> features2; //matched points in 2nd image
for(int i = 0;i<frame2_features.size() && i<frame1_features.size();++i )
{
double diff;
diff = pow((frame1.at<uchar>(frame1_features[i].pt) - frame2.at<uchar>(frame2_features[i].pt)) , 2);
if(diff<SSD) //SSD is sum of squared differences between two image regions
{
feature1.push_back(frame1_features[i].pt);
feature2.push_back(frame2_features[i].pt);
}
}
Mat affine = getAffineTransform(features1 , features2);
The last line gives the following error :
OpenCV Error: Assertion failed (src.checkVector(2, CV_32F) == 3 && dst.checkVector(2, CV_32F) == 3) in getAffineTransform
Can someone please tell me how to calculate the affine transformation with a set of matched points between the two frames?

Your problem is that you need exactly 3 point correspondences between the images.
If you have more than 3 correspondences, you should optimize the transformation to fit all the correspondences (except of outliers).
Therefore, I recommend to take a look at findHomography()-function (http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#findhomography).
It calculates a perspective transformation between the correspondences and needs at least 4 point correspondences.
Because you have more than 3 correspondences and affine transformations are a subset of perspective transformations, this should be appropriate for you.
Another advantage of the function is that it is able to detect outliers (correspondences that do not fit to the transformation and the other points) and these are not considered for transformation calculation.
To sum up, use findHomography(features1 , features2, CV_RANSAC) instead of getAffineTransform(features1 , features2).
I hope I could help you.

As I read from your code and assertion, there is something wrong with your vectors.
int checkVector(int elemChannels,int depth) //
this function returns N if the matrix is 1-channel (N x ptdim) or ptdim-channel (1 x N) or (N x 1); negative number otherwise.
And according to the documentation; http://docs.opencv.org/modules/imgproc/doc/geometric_transformations.html#getaffinetransform: Calculates an affine transform from three pairs of the corresponding points.
You seem to have more or less than three points in one or both of your vectors.

How to calculate extrinsic parameters of one camera relative to the second camera?

I have calibrated 2 cameras with respect to some world coordinate system. I know rotation matrix and translation vector for each of them relative to the world frame. From these matrices how to calculate rotation matrix and translation vector of one camera with respect to the other??
Any help or suggestion please. Thanks!

Here is an easier solution, since you already have the 3x3 rotation matrices R1 and R2, and the 3x1 translation vectors t1 and t2.
These express the motion from the world coordinate frame to each camera, i.e. are the matrices such that, if p is a point expressed in world coordinate frame, then the same point expressed in, say, camera 1 frame is p1 = R1 * p + t1.
The motion from camera 1 to 2 is then simply the composition of (a) the motion FROM camera 1 TO the world frame, and (b) of the motion FROM the world frame TO camera 2. You can easily compute this composition as follows:
Form the 4x4 roto-translation matrices Qw1 = [R1 t1] and Qw2 = [ R2 t2 ], both with the 4th row equal to [0 0 0 1]. These matrices completely express the roto-translation FROM the world coordinate frame TO camera 1 and 2 respectively.
The motion FROM camera 1 TO the world frame is simply Q1w = inv(Qw1). Here inv() is the algebraic inverse matrix, i.e. the one such that inv(X) * X = X * inv(X) = IdentityMatrix, for every nonsingular matrix X.
The roto-translation from camera 1 to 2 is then Q12 = Q1w * Qw2, and viceversa, the one from camera 2 to 1 is Q21 = Q2w * Qw1 = inv(Qw2) * Qw1.
Once you have Q12 you can extract from it the rotation and translation parts, if you so wish, respectively from its upper 3x3 submatrix and right 3x1 sub-column.

First convert your rotation matrix into a rotation vector. Now you have 2 3d vectors for each camera, call them A1,A2,B1,B2. You have all 4 of them with respect to some origin O. The rule you need is
A relative to B = (A relative to O)- (B relative to O)
Apply that rule to your 2 vectors and you will have their pose relative to one another.
Some documentation on converting from rotation matrix to euler angles can be found here as well as many other places. If you are using openCV you can just use Rodrigues. Here is some matlab/octave code I found.

Here is very simple and easy solution. I suppose your 1st camera has R1 and T1, 2nd camera has R2 and T2 rotation matrixes and translation vector according to common reference point.
Translation from 1st to 2nd camera, rotation from 1st to 2nd camera can be calculated by following two line matlab code;
R=R2*R1';
T=T2-R*T1;
but note, that is true if you have just one R and T for each camera. (I mean rotations and translation for one unique world reference). if you have more reference translations and rotations, you should calcuate R,T for every single reference point. Probably they will be very close to each other. But those might be sligtly different. Then you can calculate mean of Translation vector and convert all found rotation matrix to rotation vector, caluculate its mean and then convert them as rotation matrix.

findHomography, getPerspectiveTransform, & getAffineTransform

This question is on the OpenCV functions findHomography, getPerspectiveTransform & getAffineTransform
What is the difference between findHomography and getPerspectiveTransform?. My understanding from the documentation is that getPerspectiveTransform computes the transform using 4 correspondences (which is the minimum required to compute a homography/perspective transform) where as findHomography computes the transform even if you provide more than 4 correspondencies (presumably using something like a least squares method?).
Is this correct?
(In which case the only reason OpenCV still continues to support getPerspectiveTransform should be legacy? )
My next concern is that I want to know if there is an equivalent to findHomography for computing an Affine transformation? i.e. a function which uses a least squares or an equivalent robust method to compute and affine transformation.
According to the documentation getAffineTransform takes in only 3 correspondences (which is the min required to compute an affine transform).
Best,

Q #1: Right, the findHomography tries to find the best transform between two sets of points. It uses something smarter than least squares, called RANSAC, which has the ability to reject outliers - if at least 50% + 1 of your data points are OK, RANSAC will do its best to find them, and build a reliable transform.
The getPerspectiveTransform has a lot of useful reasons to stay - it is the base for findHomography, and it is useful in many situations where you only have 4 points, and you know they are the correct ones. The findHomography is usually used with sets of points detected automatically - you can find many of them, but with low confidence. getPerspectiveTransform is good when you kn ow for sure 4 corners - like manual marking, or automatic detection of a rectangle.
Q #2 There is no equivalent for affine transforms. You can use findHomography, because affine transforms are a subset of homographies.

I concur with everything #vasile has written. I just want to add some observations:
getPerspectiveTransform() and getAffineTransform() are meant to work on 4 or 3 points (respectively), that are known to be correct correspondences. On real-life images taken with a real camera, you can never get correspondences that accurate, not with automatic nor manual marking of the corresponding points.
There are always outliers. Just look at the simple case of wanting to fit a curve through points (e.g. take a generative equation with noise y1 = f(x) = 3.12x + gauss_noise or y2 = g(x) = 0.1x^2 + 3.1x + gauss_noise): it will be much more easier to find a good quadratic function to estimate the points in both cases, than a good linear one. Quadratic might be an overkill, but in most cases will not be (after removing outliers), and if you want to fit a straight line there you better be mightily sure that is the right model, otherwise you are going to get unusable results.
That said, if you are mightily sure that affine transform is the right one, here's a suggestion:
use findHomography, that has RANSAC incorporated in to the functionality, to get rid of the outliers and get an initial estimate of the image transformation
select 3 correct matches-correspondances (that fit with the homography found), or reproject 3 points from the 1st image to the 2nd (using the homography)
use those 3 matches (that are as close to correct as you can get) in getAffineTransform()
wrap all of that in your own findAffine() if you want - and voila!

Re Q#2, estimateRigidTransform is the oversampled equivalent of getAffineTransform. I don't know if it was in OCV when this was first posted, but it's available in 2.4.

There is an easy solution for the finding the Affine transform for the system of over-determined equations.
Note that in general an Affine transform finds a solution to the over-determined system of linear equations Ax=B by using a pseudo-inverse or a similar technique, so
x = (A At )-1 At B
Moreover, this is handled in the core openCV functionality by a simple call to solve(A, B, X).
Familiarize yourself with the code of Affine transform in opencv/modules/imgproc/src/imgwarp.cpp: it really does just two things:
a. rearranges inputs to create a system Ax=B;
b. then calls solve(A, B, X);
NOTE: ignore the function comments in the openCV code - they are confusing and don’t reflect the actual ordering of the elements in the matrices. If you are solving [u, v]’= Affine * [x, y, 1] the rearrangement is:
x1 y1 1 0 0 1
0 0 0 x1 y1 1
x2 y2 1 0 0 1
A = 0 0 0 x2 y2 1
x3 y3 1 0 0 1
0 0 0 x3 y3 1
X = [Affine11, Affine12, Affine13, Affine21, Affine22, Affine23]’
u1 v1
B = u2 v2
u3 v3
All you need to do is to add more points. To make Solve(A, B, X) work on over-determined system add DECOMP_SVD parameter. To see the powerpoint slides on the topic, use this link. If you’d like to learn more about the pseudo-inverse in the context of computer vision, the best source is: ComputerVision, see chapter 15 and appendix C.
If you are still unsure how to add more points see my code below:
// extension for n points;
cv::Mat getAffineTransformOverdetermined( const Point2f src[], const Point2f dst[], int n )
{
Mat M(2, 3, CV_64F), X(6, 1, CV_64F, M.data); // output
double* a = (double*)malloc(12*n*sizeof(double));
double* b = (double*)malloc(2*n*sizeof(double));
Mat A(2*n, 6, CV_64F, a), B(2*n, 1, CV_64F, b); // input
for( int i = 0; i < n; i++ )
{
int j = i*12; // 2 equations (in x, y) with 6 members: skip 12 elements
int k = i*12+6; // second equation: skip extra 6 elements
a[j] = a[k+3] = src[i].x;
a[j+1] = a[k+4] = src[i].y;
a[j+2] = a[k+5] = 1;
a[j+3] = a[j+4] = a[j+5] = 0;
a[k] = a[k+1] = a[k+2] = 0;
b[i*2] = dst[i].x;
b[i*2+1] = dst[i].y;
}
solve( A, B, X, DECOMP_SVD );
delete a;
delete b;
return M;
}
// call original transform
vector<Point2f> src(3);
vector<Point2f> dst(3);
src[0] = Point2f(0.0, 0.0);src[1] = Point2f(1.0, 0.0);src[2] = Point2f(0.0, 1.0);
dst[0] = Point2f(0.0, 0.0);dst[1] = Point2f(1.0, 0.0);dst[2] = Point2f(0.0, 1.0);
Mat M = getAffineTransform(Mat(src), Mat(dst));
cout<<M<<endl;
// call new transform
src.resize(4); src[3] = Point2f(22, 2);
dst.resize(4); dst[3] = Point2f(22, 2);
Mat M2 = getAffineTransformOverdetermined(src.data(), dst.data(), src.size());
cout<<M2<<endl;

getAffineTransform:affine transform is combination of translation, scale, shear, and rotation
https://www.mathworks.com/discovery/affine-transformation.html
https://www.tutorialspoint.com/computer_graphics/2d_transformation.htm
getPerspectiveTransform：perspective transform is project mapping
enter image description here

Get 3D coordinates from 2D image pixel if extrinsic and intrinsic parameters are known

I am doing camera calibration from tsai algo. I got intrensic and extrinsic matrix, but how can I reconstruct the 3D coordinates from that inormation?
1) I can use Gaussian Elimination for find X,Y,Z,W and then points will be X/W , Y/W , Z/W as homogeneous system.
2) I can use the
OpenCV documentation approach:
as I know u, v, R , t , I can compute X,Y,Z.
However both methods end up in different results that are not correct.
What am I'm doing wrong?

If you got extrinsic parameters then you got everything. That means that you can have Homography from the extrinsics (also called CameraPose). Pose is a 3x4 matrix, homography is a 3x3 matrix, H defined as
H = K*[r1, r2, t], //eqn 8.1, Hartley and Zisserman
with K being the camera intrinsic matrix, r1 and r2 being the first two columns of the rotation matrix, R; t is the translation vector.
Then normalize dividing everything by t3.
What happens to column r3, don't we use it? No, because it is redundant as it is the cross-product of the 2 first columns of pose.
Now that you have homography, project the points. Your 2d points are x,y. Add them a z=1, so they are now 3d. Project them as follows:
p = [x y 1];
projection = H * p; //project
projnorm = projection / p(z); //normalize
Hope this helps.

As nicely stated in the comments above, projecting 2D image coordinates into 3D "camera space" inherently requires making up the z coordinates, as this information is totally lost in the image. One solution is to assign a dummy value (z = 1) to each of the 2D image space points before projection as answered by Jav_Rock.
p = [x y 1];
projection = H * p; //project
projnorm = projection / p(z); //normalize
One interesting alternative to this dummy solution is to train a model to predict the depth of each point prior to reprojection into 3D camera-space. I tried this method and had a high degree of success using a Pytorch CNN trained on 3D bounding boxes from the KITTI dataset. Would be happy to provide code but it'd be a bit lengthy for posting here.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js