Compute homography given rotation and translation between two cameras - computer-vision

I know that one can compute the homography matrix by using at least four correpondence points.
I was wondering if and how can I obtain a homography matrix if I already know the rotation and translation between two cameras, including the camera intrinsics?
I found something that looked like this
H= KRK^-1
but this assumes a pure rotation. What would be the case for a pure translation?
And what if I want to warp an image with the homography matrix that is not from points purely on a plane?
I'm somewhat confused right now and would really really appreciate any explanations!
Thank you in advance!

If there is a nonzero translation, and the image contains more than a plane (or you are not looking at very far away things) then the images are not related by a homography. You can convince yourself that this is the case by noticing that some points visible in one image may be occluded in the other one.
If the image shows a plane among other things, and you estimate a homography using only point correspondences on that plane, than the homography will correctly transform all points on that plane, but will map incorrectly all other points.

Related

Check if two points from two calibrated cameras can be 3D Triangulated

Currently I am using the triangulatePoints function from OpenCV to triangulate two 2D points from two calibrated cameras into a 3D Point.
cv::triangulatePoints(calibration[camera1Index].getProjectionMatrix(), calibration[camera2Index].getProjectionMatrix(), points1, points2, points4D);
My code uses the projection matrix of the cameras and the undistorted point for each camera to find the 3D point.
The problem is that sometimes the cameras send points that can't be correlated.
My question is if there is any method to find the zone where a point from the first camera should be projected in the second camera and discard all the points from the second camera outside of that zone.
I've been searching and I guess homography is the solution here but I wasn't able to find an approach for this algorithm or if there's an existing method in OpenCV for this.
Thanks in advance.

How to calculate the 3D coordinate from a picture with given Z-value

I've got a picture of a plane with 4 known points on it. I've got the intrinsic and extrinsic camera parameters and also (using the Rodriguez function) the position of the camera. The plane is defined as my ground level (Z = 0). If I select a point in my image, is there an easy way to calculate the coordinates, where this point would be on my plane?
Not much can be labeled as 'easy' when dealing with 3D rendering.
For your question, I would look into ray tracing. I am not going to try to explain it, as most sites will do a better job of explaining it then I can.
When you look at opencv in calib3d module. you will see this equation:
https://docs.opencv.org/master/d9/d0c/group__calib3d.html
Please scroll down the link and see the perspective transformation equations
From what you say, you declare the plane ground level(Z=0). you also know the internsic (focal point in pixels , image center) camera parameter and you know the exterinsic (rotation and translation) camera parameter. and you want to access some pixels in your image (is it?) and from there, you want to estimate where it is on the plane??
You can use triangulatePoints() function in calib3D module of opencv. you need at least 2 images tough.
But your case seems unlikely to me, if you try to detect 4 known points, you will have to define the world coordinate of those plane first, usually, you define top left corner of the plane as original (0,0,0), then, you will know the position of those 4 known points in world coordinate by manual calculation. when you detect it in opencv program, it gives you the pixel coordinates of those 4 points. then, usually, what people expect to calculate is the pose ( rotation and translation ).
Alternatively, if your case is what you said, you can make a simple matrix operation code based on perspective transformation equation.

OpenCV Structure from Motion Reprojection Issue

I am currently facing an issue with my Structure from Motion program based on OpenCv.
I'm gonna try to depict you what it does, and what it is supposed to do.
This program lies on the classic "structure from motion" method.
The basic idea is to take a pair of images, detect their keypoints and compute the descriptors of those keypoints. Then, the keypoints matching is done, with a certain number of tests to insure the result is good. That part works perfectly.
Once this is done, the following computations are performed : fundamental matrix, essential matrix, SVD decomposition of the essential matrix, camera matrix computation and finally, triangulation.
The result for a pair of images is a set of 3D coordinates, giving us points to be drawn in a 3D viewer. This works perfectly, for a pair.
Indeed, here is my problem : for a pair of images, the 3D points coordinates are calculated in the coordinate system of the first image of the image pair, taken as the reference image. When working with more than two images, which is the objective of my program, I have to reproject the 3D points computed in the coordinate system of the very first image, in order to get a consistent result.
My question is : How do I reproject 3D points coordinate given in a camera related system, into an other camera related system ? With the camera matrices ?
My idea was to take the 3D point coordinates, and to multiply them by the inverse of each camera matrix before.
I clarify :
Suppose I am working on the third and fourth image (hence, the third pair of images, because I am working like 1-2 / 2-3 / 3-4 and so on).
I get my 3D point coordinates in the coordinate system of the third image, how do I do to reproject them properly in the very first image coordinate system ?
I would have done the following :
Get the 3D points coordinates matrix, apply the inverse of the camera matrix for image 2 to 3, and then apply the inverse of the camera matrix for image 1 to 2.
Is that even correct ?
Because those camera matrices are non square matrices, and I can't inverse them.
I am surely mistaking somewhere, and I would be grateful if someone could enlighten me, I am pretty sure this is a relative easy one, but I am obviously missing something...
Thanks a lot for reading :)
Let us say you have a 3 * 4 extrinsic parameter matrix called P. To match the notations of OpenCV documentation, this is [R|t].
This matrix P describes the projection from world space coordinates to the camera space coordinates. To quote the documentation:
[R|t] translates coordinates of a point (X, Y, Z) to a coordinate system, fixed with respect to the camera.
You are wondering why this matrix is non-square. That is because in the usual context of OpenCV, you are not expecting homogeneous coordinates as output. Therefore, to make it square, just add a fourth row containing (0,0,0,1). Let's call this new square matrix Q.
You have one such matrix for each pair of cameras, that is you have one Qk matrix for each pair of images {k,k+1} that describes the projection from the coordinate space of camera k to that of camera k+1. Those matrices are inversible because they describe isometries in homogeneous coordinates.
To go from the coordinate space of camera 3 to that of camera 1, just apply to your points the inverse of Q2 and then the inverse of Q1.

Calculate Camera Position from Homography Decomposition

I have a reference image A with a known position and I want to calculate the relative position of the camera at image B (i.e. tx, ty, tz in meters). The images are taken with the same camera so the camera matrix stays the same. I'm using SIFT to detect and compute the keypoints and descriptors in both images and match them with FLANN. From there I can get the homography matrix which I decompose with cv::decomposeHomography(..). This function is based on this paper: PDF.
In this paper it is stated, that the translation matrix is normalized by d*, which is the plane depth.
In order to get the correct translation I need to know the plane depth. Is there a way to get this without knowing the size of an object found in the image?
The 3D translation computed using homography decomposition is only computable up to an unknown scale factor. This is a classical problem with computing 3D geometry from monocular images using only apparent motion in the images. Typically 3D reconstructions from monocular images are called metric reconstructions for this reason (rather than Euclidean reconstructions where scale is resolved). To resolve the scale factor some more information is needed, such as knowing the depth of a point on the plane or the distance moved by the camera between images.

Homography for Image Rectification

We can do image rectification given stereo images. For a single image we can find the two vanishing points and then the vanishing line. Using this vanishing line we can do projective rectification. But what constraints are required for affine rectification??
I basically want to rectify an image to its frontal view such that parallel line in the world are parallel and also parallel to the x-axis. I hope am making myself clear ..
Also what is the homography to transfer a vanishing point (x,y) in the image to (1,0,0) ???
Thanks in advance
A projective transformation transfers an imaged point to an ideal point (1,0,0). It thereby rectifies an image up to an affine transformation of what people would normally want to see. This process is "affine rectification". Affine transformations keep lines parallel, but can skew the image and make it look weird, so I'm not sure if that's where you want to stop. Affine transformations can turn circles into ellipses and can change the angle of intersecting lines.
To remove the affine transformation (metric rectification), one must identify the "conic dual to the circular points". This can be done by identifying two pairs of perpendicular lines; identifying an ellipse that should really be a circle; or by using two known length ratios. Once this is done, the only differences between your image and how you would like it to look will be its scale, its x and y positions and the rotation of the whole image.