Compensate rotation from successive images - c++

Lets say I have image1 and image2 obtained from a webcam. For taking the image2, the webcam undergoes a rotation (yaw, pitch, roll) and a translation.
What I want: Remove the rotation from image2 so that only the translation remains (to be precise: my tracked points (x,y) from image2 will be rotated to the the same values as in image1 so that only the translation component remains).
What I have done/tried so far:
I tracked corresponding features from image1 and image2.
Calculated the fundamental matrix F with RANSAC to remove outliers.
Calibrated the camera so that I got a CAM_MATRIX (fx, fy and so on).
Calculated Essential Matrix from F with CAM_Matrix (E = cam_matrix^t * F * cam_matrix)
Decomposed the E matrix with OpenCV's SVD function so that I have a rotation matrix and translation vector.
-I know that there are 4 combinations and only 1 is the right translation vector/rotation matrix.
So my thought was: I know that the camera movement from image1 to image2 won't be more than lets say about 20°/AXIS so I can eliminate at least 2 possibilities where the angles are too far off.
For the 2 remaining I have to triangulate the points and see which one is the correct one (I have read that I only need 1 , but due possible errors/outliers it should be done with some more to be sure which one is the right). I think I could use the OpenCV's triangulation function for this? Is my thought right so far? Do I need to calculate the projection error?
Let's move on and assume that I finally obtained the right R|t matrix.
How do I continue? I tried to multiply the normal, as well as transposed rotation matrix which should reverse the rotation (?) (for testing purpose I just tried both possible combinations of R|t, I have not done the triangulation in code yet) with a tracked point in image2. but the calculated point is way too far off from what it should be. Do I need the calibration matrix here as well?
So how can I invert the rotation applied to image2? (to be exact, apply the inverse rotation to my std::vector<cv::Point2f> array which contains the tracked (x,y) points from image2)
Displaying the de-rotated image would be also nice to have. This is done with warpPerspective function? Like in this post ?
(I just don't fully understand what the purpose of A1/A2 and dist in the T matrix is or how I can adopt this solution to solve my problem.)

Related

Given a Fundamental Matrix and Image Points in one image plane, find exactly corresponding points in second Image Plane

As a relative beginner in this topic, I have read the literature but I am not sure about how to manipulate the equations to my purposes and would like advice on tackling this topic.
Preamble:
I have 2 cameras in a stereo rig which have been calibrated, thus extracting data structures such as each camera's Camera Matrix K1 and K2, as well as the Fundamental, Essential, Rotation and Translation matrices, F, E, R and T respectively. Also after rectifying one has the projection matrices P1 and P2 as well as the disparity matrix Q.
My aim is however to test the triangulation method of OpenCV, and to this end I would like to use synthetic Images where the correspondence between the points in image1 and image2 is exact.
My idea was to take an image of a chessboard with one camera, and use findCorners() and cornerSubPix() to get image points in the left camera, let's call them imagePoints1.
To get synthetically generated Image Points with exactly corresponding points on the left camera's image plane, I intend to use the property
x2'Fx1 = 0, given the F matrix and x1 (which represents one homogenous 2D point from imagePoints1)
to generate said set of Image Points.
This is where I am stuck, since he obvious solution would be to have a zero-vector to make this equation work. Otherwise I get a parametric solution. How do I then get non-zero points that fulfill this property x2'Fx1 = 0 given x1 and F ?
Thank you.

Obtaining world coordinates of an object from its image coordinates

I have been following this documentation to use OpenCV. In the formula below, I have successfully calculated both the intrinsic as well as the extrinsic matrices(I have made use of the solvePnP() procedure to obtain these matrices). Since, the object is lying on the ground I have substituted Z = 0. Then, I just removed the third column of the extrinsic matrix and multiplied it with intrinsic matrix to obtain a 3X3 projection matrix. I took it's inverse, and multiplied it by image coordinates i.e. su,sv and s.
However, all points in the world coordinates seem to be off by 1 mm or lesser, and hence I am getting not so accurate co-ordinates. Does anyone know where I might be going wrong?
Thanks
The camera calibration will probably always somewhat inaccurate, because for more than 2 calibration images instead of getting one true solution to equation system acquired from calibration images, You get the solution with the smallest error.
The same goes to cv::solvePnP() . You use one of three methods of optimising the many possible solutions for given equation system.
I do not understand how did You get the intrinsic and extrinsic matrices from cv::solvePnP() , which is used to calculate the rotation and translation of the object in camera coordinate system.
What You can do:
Try to get better intrinsic parameters
Try other methods for solvePnP like EPNP or check the RANSAC version

projection matrix from homography

I'm working on stereo-vision with the stereoRectifyUncalibrated() method under OpenCV 3.0.
I calibrate my system with the following steps:
Detect and match SURF feature points between images from 2 cameras
Apply findFundamentalMat() with the matching paairs
Get the rectifying homographies with stereoRectifyUncalibrated().
For each camera, I compute a rotation matrix as follows:
R1 = cameraMatrix[0].inv()*H1*cameraMatrix[0];
To compute 3D points, I need to get projection matrix but i don't know how i can estimate the translation vector.
I tried decomposeHomographyMat() and this solution https://stackoverflow.com/a/10781165/3653104 but the rotation matrix is not the same as what I get with R1.
When I check the rectified images with R1/R2 (using initUndistortRectifyMap() followed by remap()), the result seems correct (I checked with epipolar lines).
I am a little lost with my weak knowledge in vision. Thus if somebody could explain to me. Thank you :)
The code in the link that you have provided (https://stackoverflow.com/a/10781165/3653104) computes not the Rotation but 3x4 pose of the camera.
The last column of the pose is your Translation vector

use homography to rotate around x/y axes

The Project
I am working on a texture tracking project for mobile. It exclusively tracks planar surfaces so I have been using openCV's cv::FindHomography() to calculate the homography between two frames. That function runs very very slow however and is the primary bottleneck in my pipeline. I decided that an algorithm that can take an initial estimate of the homography would run much faster because my change in homography between frames is very small. Also, my outlier percentage is very small so robust methods are optional. Unfortunately, to my knowledge open CV does not include a homography finder that takes an initial estimate. It does however include solvePnP() which takes the original 3d world coordinates of the scene, the current 2d image coordinates, a camera matrix, distortion parameters, and most importantly an initial estimate. I am trying to replace FindHomography with solvePnP. Since I use only 2d coordinates throughout the pipeline and solvePnP asks for 3d coordinates I am trying to move from 2d->3d->3d_transform->2d_transform. Right now that process runs 6x faster than FindHomography() if it is given a good initial guess but it has issues.
The Problem
Something is wrong with how I am converting. My theory was that since a camera matrix is not required to find a homography it should not be required for this process since I only want the information contained in a homography in the end. I also assumed that since I throw out all z information in the end how I initialize z should not matter. My process is as follows
First I convert all my initial 2d coordinates to 3d by giving them a z pos of 1. I can assume that my original coordinates lie flat in the x-y plane. Then
cv::Mat rot_mat; //3x3 rotation matrix
cv::Mat pnp_rot; //3x1 rotation vector
cv::Mat pnp_tran; //3x1 translation vector
cv::Matx33f camera_matrix(1,0,0,
0,1,0,
0,0,1);
cv::Matx41f dist(0,0,0,0);
cv::solvePnP(original_cord, current_cord, camera_matrix, dist, pnp_rot, pnp_tran,true);
//Rodrigues converts from a rotation vector to a rotation matrix
cv::Rodrigues(pnp_rot, rot_mat);
cv::Matx33f homography(rot_mat(0,0),rot_mat(0,1),pnp_tran(0),
rot_mat(1,0),rot_mat(1,1),pnp_tran(1),
rot_mat(2,0),rot_mat(2,1),pnp_tran(2)+1);
The conversion to a homography here is simple. The first two columns of the homography are from the 3x3 rotation matrix, the last column is the translation vector. The one trick is that homography(2,2) corresponds to scale while pnp_tran(2) corresponds to movement in the z axis. Given that I initialize my z coordinates to 1, scale is z_translation + 1. This process works perfectly for 4 of 6 degrees of freedom. Translation_x, translation_x, scale, and rotation about z all work. Rotation about x and y however display significant error. I believe that this is due to initializing my points at z = 1 but I don't know how to fix it.
The Question
Was my assumption that I can get good results from solvePnP by using a faked camera matrix and initial z coord correct? If so, how should I set up my camera matrix and z coordinates to make x and y rotation work? Also if anyone knows where I could get a homography finding algorithm that takes an initial guess and works only in 2d, or information on techniques for writing my own it would be very helpful. I will most likely be moving in that direction once I get this working.
Update
I built myself a test program which takes a homography, generates a set of coplanar points from that homography, and then runs the points through solvePnP to recover the specified homography. In the process of doing this I realized that I am fundamentally misunderstanding some part of how homographies are constructed. I have been assuming that a homography is constructed as follows.
hom(0,2) = x translation
hom(1,2) = y translation
hom(2,2) = scale, I can divide the entire matrix by this to normalize
the first two columns I assumed were the first two columns of a 3x3 rotation matrix. This essentially amounts to taking a 3x4 transform and throwing away column(2). I have discovered however that this is not true. The test case showing me the error of my ways was trying to make a homography which rotates points some small angle around the y axis.
//rotate by .0175 rad about y axis
rot_mat = (1,0,.0174,
0,1,0,
-.0174,0,1);
//my conversion method to make this a homography gives
homography = (1,0,0,
0,1,0,
-.0174,0,1);
The above homography does not work at all. Take for example a point x,y,1 where x > 58. The result will be x,y,some_negative_number. When I convert from homogeneous coordinates back to cartesian my x and y values will both flip signs.
All that is to say, I now have a much simpler question that I think would let me solve everything. How do I construct a homography that rotates points by some angle around the x and y axis?
Homographies are not simple translation or rotation matrices. The aim is to map straight lines to straight lines rather than to map single points to other points. They take into account perspective matrices to achieve this and are explained here
Hence, homography matrices cannot be easily decomposed, but there are (complicated) ways to do so shown here. This may help you extract rotations and translations out of it.
This should help you better understand a homography, but the rest I am unfamiliar with.

Affine homography computation

Suppose you have an homography H between two images. The first image is the reference image, where the planar object cover the entire image (and it is parallel to the image). The second image depicts the planar object from another abritrary view (run-time image). Now, given a point in the reference image p=(x,y), i have a rectangular region of pixels of size SxS (with S<=20 pixel) around p (call it patch). I can unwarp this patch using the pixels in the run-time image and the inverse homography H^(-1).
Now, what i want to do is to compute, given H, an affine homography H_affine suitable for the patch around the point p. The naive way that i am using is to compute 4 point correspondences: the four corners of the patch and the corresponding points in the run-time image (computed using the full homography H). Given this four point correspondences (all belonging to a small neighborhood of the point p), one can compute the affine homography solving a simple linear system (using the gold standard algorithm). The affine homography so computed will approximate with reasonable precision (below .5 pixel) the full projective homography, since we are in a small neighboorhood of p (if the scale is not too unfavorable, that is, the patch SxS does not correspond to a big image region in the run-time image).
Is there a faster way to compute H_affine given H (related to the point p and the patch SxS)?
You say that you already know H, but then it sounds like you're trying to compute it all over again but this time call the result H_affine. The correct H would be a projective transformation and it can be uniquely decomposed into 3 parts representing the projective part, the affine part and the similarity part. If you already know H and only want the affine part and below, then decompose H and ignore its projective component. If you don't know H, then the 4 point correspondence is the way to go.