Camera Calibration with OpenCV: Using the distortion and rotation-translation matrix - c++

I am reading the following documentation: http://docs.opencv.org/doc/tutorials/calib3d/camera_calibration/camera_calibration.html
I have managed to successfully calibrate the camera obtaining the camera matrix and the distortion matrix.
I had two sub-questions:
1) How do I use the distortion matrix as I don't know 'r'?
2) For all the views I have the rotation and translation vectors which transform the object points (given in the model coordinate space) to the image points (given in the world coordinate space). So a total of 6 coordinates per image(3 rotational, 3 translational). How do I make use of this information to obtain the rotational-translational matrix?
Any help would be appreciated. Thanks!

Answers in order:
1) "r" is the pixel's radius with respect to the distortion center. That is:
r = sqrt((x - x_c)^2 + (y - y_c)^2)
where (x_c, y_c) is the center of the nonlinear distortion (i.e. the point in the image that has zero nonlinear distortion. This is usually (and approximately) identified with the principal point, i.e. the intersection of the camera focal axis with the image plane. The coordinates of the principal point are in the 3rd column of the matrix of the camera intrinsic paramers.
2) Use Rodrigues's formula to convert between rotation vectors and rotation matrices.

Related

Mapping between different camera views

I have a calibrated (virtual) camera in Blender that views a roughly planar object. I make an image from a first camera pose P0 and move the camera to a new pose P1. So I have the 4x4 camera matrix for both views from which I can calculate the transformation between the cameras as given below. I also know the intrinsics matrix K. Using those, I want to map the points from the image for P0 to a new image seen from P1 (of course, I have the ground truth to compare because I can render in Blender after the camera has moved to P1). If I only rotate the camera between P0 and P1, I can calculate the homography perfectly. But if there is translation, the calculated homography matrix does not take that into account. The theory says, after calculating M10, the last row and column should be dropped for a planar scene. However, when I check M10, I see that the translation values are in the rightmost column, which I drop to get the 3x3 homography matrix H10. Then, if there is no rotation, H10 is equal to the identity matrix. What is going wrong here?
Edit: I know that the images are related by a homography because given the two images from P0 and P1, I can find a homography (by feature matching) that perfectly maps the image from P0 to the image from P1, even in presence of a translational camera movement.
The theory became more clear to me after reading from two other books: "Multiple View Geometry" from Hartley and Zissermann (Example 13.2) and particularly "An Invitation to 3-D Vision: From Images to Geometric Models" (Section 5.3.1, Planar homography). Below is an outline, please check the above-mentioned sources for a thorough explanation.
Consider two images of points p on a 2D plane P in 3D space, the transformation between the two camera frames can be written as: X2 = R*X1 + T (1) where X1 and X2 are the coordinates of the world point p in camera frames 1 and 2, respectively, R the rotation and T the translation between the two camera frames. Denoting the unit normal vector of the plane P to the first camera frame as N and the distance from the plane P to the first camera as d, we can use the plane equation to write N.T*X1=d (.T means transpose), or equivalently (1/d)*N.T*X1=1 (2) for all X1 on the plane P. Substituting (2) into (1) gives X2 = R*X1+T*(1/d)*N.T*X1 = (R+(1/d)*T*N.T)*X1. Therefore, the planar homography matrix (3x3) can be extracted as H=R+(1/d)*T*N.T, that is X2 = H*X1. This is a linear transformation from X1 to X2.
The distance d can be computed as the dot product between the plane normal and a point on the plane. Then, the camera intrinsics matrix K should be used to calculate the projective homography G = K * R+(1/d)*T*N.T * inv(K). If you are using a software like Blender or Unity, you can set the camera intrinsics yourself and thus obtain K. For Blender, there a nice code snippet is given in this excellent answer.
OpenCV has some nice code example in this tutorial; see "Demo 3: Homography from the camera displacement".

Compute Homography Matrix based on intrinsic and extrinsic camera parameters

I am willing to perform a 360° Panorama stitching for 6 fish-eye cameras.
In order to find the relation among cameras I need to compute the Homography Matrix. The latter is usually computed by finding features in the images and matching them.
However, for my camera setup I already know:
The intrinsic camera matrix K, which I computed through camera calibration.
Extrinsic camera parameters R and t. The camera orientation is fixed and does not change at any point. The cameras are located on a circle of known diameter d, being each camera positioned with a shift of 60° degrees with respect to the circle.
Therefore, I think I could manually compute the Homography Matrix, which I am assuming would result in a more accurate approach than performing feature matching.
In the literature I found the following formula to compute the homography Matrix which relates image 2 to image 1:
H_2_1 = (K_2) * (R_2)^-1 * R_1 * K_1
This formula only takes into account a rotation angle among the cameras but not the translation vector that exists in my case.
How could I plug the translation t of each camera in the computation of H?
I have already tried to compute H without considering the translation, but as d>1 meter, the images are not accurate aligned in the panorama picture.
EDIT:
Based on Francesco's answer below, I got the following questions:
After calibrating the fisheye lenses, I got a matrix K with focal length f=620 for an image of size 1024 x 768. Is that considered to be a big or small focal length?
My cameras are located on a circle with a diameter of 1 meter. The explanation below makes it clear for me, that due to this "big" translation among the cameras, I have remarkable ghosting effects with objects that are relative close to them. Therefore, if the Homography model cannot fully represent the position of the cameras, is it possible to use another model like Fundamental/Essential Matrix for image stitching?
You cannot "plug" the translation in: its presence along with a nontrivial rotation mathematically implies that the relationship between images is not a homography.
However, if the imaged scene is and appears "far enough" from the camera, i.e. if the translations between cameras are small compared to the distances of the scene objects from the cameras, and the cameras' focal lengths are small enough, then you may use the homography induced by a pure rotation as an approximation.
Your equation is wrong. The correct formula is obtained as follows:
Take a pixel in camera 1: p_1 = (x, y, 1) in homogeneous coordinates
Back project it into a ray in 3D space: P_1 = inv(K_1) * p_1
Decompose the ray in the coordinates of camera 2: P_2 = R_2_1 * P1
Project the ray into a pixel in camera 2: p_2 = K_2 * P_2
Put the equations together: p_2 = [K_2 * R_2_1 * inv(K_1)] * p_1
The product H = K2 * R_2_1 * inv(K1) is the homography induced by the pure rotation R_2_1. The rotation transforms points into frame 2 from frame 1. It is represented by a 3x3 matrix whose columns are the components of the x, y, z axes of frame 1 decomposed in frame 2. If your setup gives you the rotations of all the cameras with respect to a common frame 0, i.e. as R_i_0, then it is R_2_1 = R_2_0 * R_1_0.transposed.
Generally speaking, you should use the above homography as an initial estimation, to be refined by matching points and optimizing. This is because (a) the homography model itself is only an approximation (since it ignores the translation), and (b) the rotations given by the mechanical setup (even a calibrated one) are affected by errors. Using matched pixels to optimize the transformation will minimize the errors where it matters, on the image, rather than in an abstract rotation space.

solvePnP: Obtaining the rotation translation matrix

I am trying to image coordinates to 3D coordinates. Using the solvePnP function (in C++)has given me 3X1 rotation matrix and 3X1 translation matrix. But isn't the [R|t] matrix supposed to be 3X4?
Any help will be greatly appreciated!
From the OpenCV documentation for solvePnP:
"rvec – Output rotation vector (see Rodrigues() ) that, together with tvec , brings points from the model coordinate system to the camera coordinate system."
Following the link to Rodrigues():
src – Input rotation vector (3x1 or 1x3) or rotation matrix (3x3).
dst – Output rotation matrix (3x3) or rotation vector (3x1 or 1x3), respectively.

Computer Vision: labelling camera pose

I am trying to create a dataset of images of objects at different poses, where each image is annotated with camera pose (or object pose).
If, for example, I have a world coordinate system and I place the object of interest at the origin and place the camera at a known position (x,y,z) and make it face the origin. Given this information, how can I calculate the pose (rotation matrix) for the camera or for the object.
I had one idea, which was to have a reference coordinate i.e. (0,0,z') where I can define the rotation of the object. i.e. its tilt, pitch and yaw. Then I can calculate the rotation from (0,0,z') and (x,y,z) to give me a rotation matrix. The problem is, how to now combine the two rotation matrices?
BTW, I know the world position of the camera as I am rendering these with OpenGL from a CAD model as opposed to physically moving a camera around.
The homography matrix maps between homogeneous screen coordinates (i,j) to homogeneous world coordinates (x,y,z).
homogeneous coordinates are normal coordinates with a 1 appended. So (3,4) in screen coordinates is (3,4,1) as homogeneous screen coordinates.
If you have a set of homogeneous screen coordinates, S and their associated homogeneous world locations, W. The 4x4 homography matrix satisfies
S * H = transpose(W)
So it boils down to finding several features in world coordinates you can also identify the i,j position in screen coordinates, then doing a "best fit" homography matrix (openCV has a function findHomography)
Whilst knowing the camera's xyz provides helpful info, its not enough to fully constrain the equation and you will have to generate more screen-world pairs anyway. Thus I don't think its worth your time integrating the cameras position into the mix.
I have done a similar experiment here: http://edinburghhacklab.com/2012/05/optical-localization-to-0-1mm-no-problemo/

In OpenCV, converting 2d image point to 3d world unit vector

I have calibrated my camera with OpenCV (findChessboard etc) so I have:
- Camera Distortion Coefficients & Intrinsics matrix
- Camera Pose information (Translation & Rotation, computed separatedly via other means) as Euler Angles & a 4x4
- 2D points within the camera frame
How can I convert these 2D points into 3D unit vectors pointing out into the world? I tried using cv::undistortPoints but that doesn't seem to do it (only returns 2D remapped points), and I'm not exactly sure what method of matrix math to use to model the camera via the Camera intrinsics I have.
Convert your 2d point into a homogenous point (give it a third coordinate equal to 1) and then multiply by the inverse of your camera intrinsics matrix. For example
cv::Matx31f hom_pt(point_in_image.x, point_in_image.y, 1);
hom_pt = camera_intrinsics_mat.inv()*hom_pt; //put in world coordinates
cv::Point3f origin(0,0,0);
cv::Point3f direction(hom_pt(0),hom_pt(1),hom_pt(2));
//To get a unit vector, direction just needs to be normalized
direction *= 1/cv::norm(direction);
origin and direction now define the ray in world space corresponding to that image point. Note that here the origin is centered on the camera, you can use your camera pose to transform to a different origin. Distortion coefficients map from your actual camera to the pinhole camera model and should be used at the very beginning to find your actual 2d coordinate. The steps then are
Undistort 2d coordinate with distortion coefficients
Convert to ray (as shown above)
Move that ray to whatever coordinate system you like.