camera extrinsic calibration - computer-vision

I have a fisheye camera, which I have already calibrated. I need to calculate the camera pose w.r.t a checkerboard just by using a single image of said checkerboard,the intrinsic parameters, and the size of the squares of the checkerboards. Unfortunately many calibration libraries first calculate the extrinsic parameters from a set of images and then the intrinsic parameters, which is essentially the "inverse" procedure of what I want. Of course I can just put my checkerboard image inside the set of other images I used for the calibration and run the calib procedure again, but it's very tedious, and moreover, I can't use a checkerboard of different size from the ones used for the instrinsic calibration. Can anybody point me in the right direction?
EDIT: After reading francesco's answer, I realized that I didn't explain what I mean by calibrating the camera. My problem begins with the fact that I don't have the classic intrinsic parameters matrix (so I can't actually use the method Francesco described).In fact I calibrated the fisheye camera with the Scaramuzza's procedure (https://sites.google.com/site/scarabotix/ocamcalib-toolbox), which basically finds a polynom which maps 3d world points into pixel coordinates( or, alternatively, the polynom which backprojects pixels to the unit sphere). Now, I think these information are enough to find the camera pose w.r.t. a chessboard, but I'm not sure exactly how to proceed.

the solvePnP procedure calculates extrinsic pose for Chess Board (CB) in camera coordinates. openCV added a fishEye library to its 3D reconstruction module to accommodate significant distortions in cameras with a large field of view. Of course, if your intrinsic matrix or transformation is not a classical intrinsic matrix you have to modify PnP:
Undo whatever back projection you did
Now you have so-called normalized camera where intrinsic matrix effect was eliminated.
k*[u,v,1]T = R|T * [x, y, z, 1]T
The way to solve this is to write the expression for k first:
k=R20*x+R21*y+R22*z+Tz
then use the above expression in
k*u = R00*x+R01*y+R02*z+Tx
k*v = R10*x+R11*y+R12*z+Tx
you can rearrange the terms to get Ax=0, subject to |x|=1, where unknown
x=[R00, R01, R02, Tx, R10, R11, R12, Ty, R20, R21, R22, Tz]T
and A, b
are composed of known u, v, x, y, z - pixel and CB corner coordinates;
Then you solve for x=last column of V, where A=ULVT, and assemble rotation and translation matrices from x. Then there are few ‘messy’ steps that are actually very typical for this kind of processing:
A. Ensure that you got a real rotation matrix - perform orthogonal Procrustes on your R2 = UVT, where R=ULVT
B. Calculate scale factor scl=sum(R2(i,j)/R(i,j))/9;
C. Update translation vector T2=scl*T and check for Tz>0; if it is negative invert T and negate R;
Now, R2, T2 give you a good starting point for non linear algorithm optimization such as Levenberg Marquardt. It is required because a previous linear step optimizes only an algebraic error of parameters while non-linear one optimizes a correct metrics such as squared error in pixel distances. However, if you don’t want to follow all these steps you can take advantage of the fish-eye library of openCV.

I assume that by "calibrated" you mean that you have a pinhole model for your camera.
Then the transformation between your chessboard plane and the image plane is a homography, which you can estimate from the image of the corners using the usual DLT algorithm. You can then express it as the product, up to scale, of the matrix of intrinsic parameters A and [x y t], where x and y columns are the x and y unit vectors of the world's (i.e. chessboard's) coordinate frame, and t is the vector from the camera centre to the origin of that same frame. That is:
H = scale * A * [x|y|t]
Therefore
[x|y|t] = 1/scale * inv(A) * H
The scale is chosen so that x and y have unit length. Once you have x and y, the third axis is just their cross product.

Related

Interchange the of origin of a 3D plane

I am working on a fiducial marker system (like Aruco) to obtain a 3d pose of markers (3d coordinates (x, y, z) and the roll, pitch, yaw of the marker) with respect to the camera. The overall setup is as shown in the figure.
Marker-Camera
Right now, for some reason, I am getting the pose representation of camera with respect to the marker (Thus, considering marker as an origin). But for my purpose, I want the pose representation of the marker, with respect to the camera. I cannot make changes in the way I am getting the pose, and I must use an external transformation. Currently, I using C++ Eigen library.
From what I have read so far, I have to do a rotation around the yaw (z) axis and then translate the obtained pose by the translation vector (x,y,z). But I am not sure how to represent this in Eigen. I tried to define my pose as Affine3f but I am not getting correct results.
Can anyone help me? Thanks!
If you are using ArUco, this might answer your questions: https://stackoverflow.com/a/59754199/8371691
However, if you are using some other marker system, the most robust way is to construct the attitude matrix and take inverse.
It is not clear how you represent your pose, but whether you use Euler angles or quaternion, it can be easily converted into an attitude matrix, R.
Then, the inverse transformation is simply taking inverse of R.
But given the nature of the configuration space that R belongs to, the inverse of R is also the transpose of R, which is computationally less expensive.
In Eigen, it's simply R.transpose().
If you are using ArUco with OpenCV, you can simply use built-in Rodrigues function.
But, if you are using ArUco, rvec is actually the rotation of the marker with respect to the camera frame.

How to estimate camera translation given relative rotation and intrinsic matrix for stereo images?

I have 2 images (left and right) of a scene captured by a single camera.
I know the intrinsic matrices K_L and K_R for both images and the relative rotation R between the two cameras.
How do I compute the precise relative translation t between the two cameras?
You can only do it up to scale, unless you have a separate means to resolve scale, for example by observing an object of known size, or by having a sensor (e.g. LIDAR) give you the distance from a ground plane or from an object visible in both views.
That said, the solution is quite easy. You could do it by calculating and then decomposing the essential matrix, but here is a more intuitive way. Let xl and xr be two matched pixels in the two views in homogeneous image coordinates, and let X be their corresponding 3D world point, expressed in left camera coordinates. Let Kli and Kri be respectively the inverse of the left and right camera matrices Kl and Kr. Denote with R and t the transform from the right to the left camera coordinates. It is then:
X = sl * Kli * xl = t + sr * R * Kri * xr
where sl and sr are scales for the left and right rays back-projecting to point X from left and right camera respectively.
The second equality above represents 3 scalar equations in 5 unknowns: the 3 components of t, sl and sr. Depending on what additional information you have, you can solve it in different ways.
For example, if you know (e.g. from LIDAR measurements) the distance from the cameras to X, you can remove the scale terms from the equations above and solve directly. If there is a segment of known length [X1, X2] that is visible in both images, you can write two equations like above and again solve directly.

Calculate baseline distance between 2 cameras (images)

I want to estimate the depth map between left and right images from "http://perso.lcpc.fr/tarel.jean-philippe/syntim/paires/GrRub.html". I understand that I must first calculate depth from disparity using formula Z = B * F/d
The data set unfortunately does not provide Baseline distance B.
Could you suggest how I can calculate this(if possible) or how I could calculate depth map from given data alone?
Thank you for your help.
As I am new to stackoverflow and computer vision, do let me know if I should provide more details.
If you have the extrinsic parameters, rotational matrices R and translation vector t, there are two cases
a) (most probable) one of your camera (the main camera) is the centre of your coordinate system: the R1 matrix is the identity matrix and the related t1 is equal to [0,0,0]. In this case you could think the baseline B as the euclidean norm of the translation vector t2 of the other camera
b) in case none of your camera is the centre of your coordinate system, at least you should have calibrated your cameras with respect to the same reference system. The baseline B is the euclidean norm of the difference vector (t1 - t2)
(I was not able to open the left/right camera links, so I could not verify)

use homography to rotate around x/y axes

The Project
I am working on a texture tracking project for mobile. It exclusively tracks planar surfaces so I have been using openCV's cv::FindHomography() to calculate the homography between two frames. That function runs very very slow however and is the primary bottleneck in my pipeline. I decided that an algorithm that can take an initial estimate of the homography would run much faster because my change in homography between frames is very small. Also, my outlier percentage is very small so robust methods are optional. Unfortunately, to my knowledge open CV does not include a homography finder that takes an initial estimate. It does however include solvePnP() which takes the original 3d world coordinates of the scene, the current 2d image coordinates, a camera matrix, distortion parameters, and most importantly an initial estimate. I am trying to replace FindHomography with solvePnP. Since I use only 2d coordinates throughout the pipeline and solvePnP asks for 3d coordinates I am trying to move from 2d->3d->3d_transform->2d_transform. Right now that process runs 6x faster than FindHomography() if it is given a good initial guess but it has issues.
The Problem
Something is wrong with how I am converting. My theory was that since a camera matrix is not required to find a homography it should not be required for this process since I only want the information contained in a homography in the end. I also assumed that since I throw out all z information in the end how I initialize z should not matter. My process is as follows
First I convert all my initial 2d coordinates to 3d by giving them a z pos of 1. I can assume that my original coordinates lie flat in the x-y plane. Then
cv::Mat rot_mat; //3x3 rotation matrix
cv::Mat pnp_rot; //3x1 rotation vector
cv::Mat pnp_tran; //3x1 translation vector
cv::Matx33f camera_matrix(1,0,0,
0,1,0,
0,0,1);
cv::Matx41f dist(0,0,0,0);
cv::solvePnP(original_cord, current_cord, camera_matrix, dist, pnp_rot, pnp_tran,true);
//Rodrigues converts from a rotation vector to a rotation matrix
cv::Rodrigues(pnp_rot, rot_mat);
cv::Matx33f homography(rot_mat(0,0),rot_mat(0,1),pnp_tran(0),
rot_mat(1,0),rot_mat(1,1),pnp_tran(1),
rot_mat(2,0),rot_mat(2,1),pnp_tran(2)+1);
The conversion to a homography here is simple. The first two columns of the homography are from the 3x3 rotation matrix, the last column is the translation vector. The one trick is that homography(2,2) corresponds to scale while pnp_tran(2) corresponds to movement in the z axis. Given that I initialize my z coordinates to 1, scale is z_translation + 1. This process works perfectly for 4 of 6 degrees of freedom. Translation_x, translation_x, scale, and rotation about z all work. Rotation about x and y however display significant error. I believe that this is due to initializing my points at z = 1 but I don't know how to fix it.
The Question
Was my assumption that I can get good results from solvePnP by using a faked camera matrix and initial z coord correct? If so, how should I set up my camera matrix and z coordinates to make x and y rotation work? Also if anyone knows where I could get a homography finding algorithm that takes an initial guess and works only in 2d, or information on techniques for writing my own it would be very helpful. I will most likely be moving in that direction once I get this working.
Update
I built myself a test program which takes a homography, generates a set of coplanar points from that homography, and then runs the points through solvePnP to recover the specified homography. In the process of doing this I realized that I am fundamentally misunderstanding some part of how homographies are constructed. I have been assuming that a homography is constructed as follows.
hom(0,2) = x translation
hom(1,2) = y translation
hom(2,2) = scale, I can divide the entire matrix by this to normalize
the first two columns I assumed were the first two columns of a 3x3 rotation matrix. This essentially amounts to taking a 3x4 transform and throwing away column(2). I have discovered however that this is not true. The test case showing me the error of my ways was trying to make a homography which rotates points some small angle around the y axis.
//rotate by .0175 rad about y axis
rot_mat = (1,0,.0174,
0,1,0,
-.0174,0,1);
//my conversion method to make this a homography gives
homography = (1,0,0,
0,1,0,
-.0174,0,1);
The above homography does not work at all. Take for example a point x,y,1 where x > 58. The result will be x,y,some_negative_number. When I convert from homogeneous coordinates back to cartesian my x and y values will both flip signs.
All that is to say, I now have a much simpler question that I think would let me solve everything. How do I construct a homography that rotates points by some angle around the x and y axis?
Homographies are not simple translation or rotation matrices. The aim is to map straight lines to straight lines rather than to map single points to other points. They take into account perspective matrices to achieve this and are explained here
Hence, homography matrices cannot be easily decomposed, but there are (complicated) ways to do so shown here. This may help you extract rotations and translations out of it.
This should help you better understand a homography, but the rest I am unfamiliar with.

How to calculate Rotation and Translation matrices from homography?

I have already done the comparison of 2 images of same scene which are taken by one camera with different view angles(say left and right) using SURF in emgucv (C#). And it gave me a 3x3 homography matrix for 2D transformation. But now I want to make those 2 images in 3D environment (using DirectX). To do that I need to calculate relative location and orientation of 2nd image(right) to the 1st image(left) in 3D form. How can I calculate Rotation and Translate matrices for 2nd image?
I need also z value for 2nd image.
I read something called 'Homograhy decomposition'. Is it the way?
Is there anybody who familiar with homography decomposition and is there any algorithm which it implement?
Thanks in advance for any help.
Homography only works for planar scenes (ie: all of your points are coplanar). If that is the case then the homography is a projective transformation and it can be decomposed into its components.
But if your scene isn't coplanar (which I think is the case from your description) then it's going to take a bit more work. Instead of a homography you need to calculate the fundamental matrix (which emgucv will do for you). The fundamental matrix is a combination of the camera intrinsic matrix (K), the relative rotation (R) and translation (t) between the two views. Recovering the rotation and translation is pretty straight forward if you know K. It looks like emgucv has methods for camera calibration. I am not familiar with their particular method but these generally involve taking several images of a scene with know geometry.
To figure out camera motion (exact rotation and translation up to a scaling factor) you need
Calculate fundamental matrix F, for example, using eight-point
algorithm
Calculate Essential matrix E = A’FA, where A is intrinsic camera matrix
Decompose E which is by definition Tx * R via SVD into E=ULV’
Create a special 3x3 matrix
0 -1 0
W = 1 0 0
0 0 1
that helps to run decomposition:
R = UW-1VT, Tx = ULWUT, where
0 -tx ty
Tx = tz 0 -tx
-ty tx 0
Since E can have an arbitrary sign and W can be replace by Winv we have 4 distinct solution and have to select the one which produces most points in front of the camera.
It's been a while since you asked this question. By now, there are some good references on this problem.
One of them is "invitation to 3D image" by Ma, chapter 5 of it is free here http://vision.ucla.edu//MASKS/chapters.html
Also, Vision Toolbox of Peter Corke includes the tools to perform this. However, he does not explain much math of the decomposition