I have been following this documentation to use OpenCV. In the formula below, I have successfully calculated both the intrinsic as well as the extrinsic matrices(I have made use of the solvePnP() procedure to obtain these matrices). Since, the object is lying on the ground I have substituted Z = 0. Then, I just removed the third column of the extrinsic matrix and multiplied it with intrinsic matrix to obtain a 3X3 projection matrix. I took it's inverse, and multiplied it by image coordinates i.e. su,sv and s.
However, all points in the world coordinates seem to be off by 1 mm or lesser, and hence I am getting not so accurate co-ordinates. Does anyone know where I might be going wrong?
Thanks
The camera calibration will probably always somewhat inaccurate, because for more than 2 calibration images instead of getting one true solution to equation system acquired from calibration images, You get the solution with the smallest error.
The same goes to cv::solvePnP() . You use one of three methods of optimising the many possible solutions for given equation system.
I do not understand how did You get the intrinsic and extrinsic matrices from cv::solvePnP() , which is used to calculate the rotation and translation of the object in camera coordinate system.
What You can do:
Try to get better intrinsic parameters
Try other methods for solvePnP like EPNP or check the RANSAC version
I have a fisheye camera, which I have already calibrated. I need to calculate the camera pose w.r.t a checkerboard just by using a single image of said checkerboard,the intrinsic parameters, and the size of the squares of the checkerboards. Unfortunately many calibration libraries first calculate the extrinsic parameters from a set of images and then the intrinsic parameters, which is essentially the "inverse" procedure of what I want. Of course I can just put my checkerboard image inside the set of other images I used for the calibration and run the calib procedure again, but it's very tedious, and moreover, I can't use a checkerboard of different size from the ones used for the instrinsic calibration. Can anybody point me in the right direction?
EDIT: After reading francesco's answer, I realized that I didn't explain what I mean by calibrating the camera. My problem begins with the fact that I don't have the classic intrinsic parameters matrix (so I can't actually use the method Francesco described).In fact I calibrated the fisheye camera with the Scaramuzza's procedure (https://sites.google.com/site/scarabotix/ocamcalib-toolbox), which basically finds a polynom which maps 3d world points into pixel coordinates( or, alternatively, the polynom which backprojects pixels to the unit sphere). Now, I think these information are enough to find the camera pose w.r.t. a chessboard, but I'm not sure exactly how to proceed.
the solvePnP procedure calculates extrinsic pose for Chess Board (CB) in camera coordinates. openCV added a fishEye library to its 3D reconstruction module to accommodate significant distortions in cameras with a large field of view. Of course, if your intrinsic matrix or transformation is not a classical intrinsic matrix you have to modify PnP:
Undo whatever back projection you did
Now you have so-called normalized camera where intrinsic matrix effect was eliminated.
k*[u,v,1]T = R|T * [x, y, z, 1]T
The way to solve this is to write the expression for k first:
k=R20*x+R21*y+R22*z+Tz
then use the above expression in
k*u = R00*x+R01*y+R02*z+Tx
k*v = R10*x+R11*y+R12*z+Tx
you can rearrange the terms to get Ax=0, subject to |x|=1, where unknown
x=[R00, R01, R02, Tx, R10, R11, R12, Ty, R20, R21, R22, Tz]T
and A, b
are composed of known u, v, x, y, z - pixel and CB corner coordinates;
Then you solve for x=last column of V, where A=ULVT, and assemble rotation and translation matrices from x. Then there are few ‘messy’ steps that are actually very typical for this kind of processing:
A. Ensure that you got a real rotation matrix - perform orthogonal Procrustes on your R2 = UVT, where R=ULVT
B. Calculate scale factor scl=sum(R2(i,j)/R(i,j))/9;
C. Update translation vector T2=scl*T and check for Tz>0; if it is negative invert T and negate R;
Now, R2, T2 give you a good starting point for non linear algorithm optimization such as Levenberg Marquardt. It is required because a previous linear step optimizes only an algebraic error of parameters while non-linear one optimizes a correct metrics such as squared error in pixel distances. However, if you don’t want to follow all these steps you can take advantage of the fish-eye library of openCV.
I assume that by "calibrated" you mean that you have a pinhole model for your camera.
Then the transformation between your chessboard plane and the image plane is a homography, which you can estimate from the image of the corners using the usual DLT algorithm. You can then express it as the product, up to scale, of the matrix of intrinsic parameters A and [x y t], where x and y columns are the x and y unit vectors of the world's (i.e. chessboard's) coordinate frame, and t is the vector from the camera centre to the origin of that same frame. That is:
H = scale * A * [x|y|t]
Therefore
[x|y|t] = 1/scale * inv(A) * H
The scale is chosen so that x and y have unit length. Once you have x and y, the third axis is just their cross product.
I'm working on stereo-vision with the stereoRectifyUncalibrated() method under OpenCV 3.0.
I calibrate my system with the following steps:
Detect and match SURF feature points between images from 2 cameras
Apply findFundamentalMat() with the matching paairs
Get the rectifying homographies with stereoRectifyUncalibrated().
For each camera, I compute a rotation matrix as follows:
R1 = cameraMatrix[0].inv()*H1*cameraMatrix[0];
To compute 3D points, I need to get projection matrix but i don't know how i can estimate the translation vector.
I tried decomposeHomographyMat() and this solution https://stackoverflow.com/a/10781165/3653104 but the rotation matrix is not the same as what I get with R1.
When I check the rectified images with R1/R2 (using initUndistortRectifyMap() followed by remap()), the result seems correct (I checked with epipolar lines).
I am a little lost with my weak knowledge in vision. Thus if somebody could explain to me. Thank you :)
The code in the link that you have provided (https://stackoverflow.com/a/10781165/3653104) computes not the Rotation but 3x4 pose of the camera.
The last column of the pose is your Translation vector
Lets say I have image1 and image2 obtained from a webcam. For taking the image2, the webcam undergoes a rotation (yaw, pitch, roll) and a translation.
What I want: Remove the rotation from image2 so that only the translation remains (to be precise: my tracked points (x,y) from image2 will be rotated to the the same values as in image1 so that only the translation component remains).
What I have done/tried so far:
I tracked corresponding features from image1 and image2.
Calculated the fundamental matrix F with RANSAC to remove outliers.
Calibrated the camera so that I got a CAM_MATRIX (fx, fy and so on).
Calculated Essential Matrix from F with CAM_Matrix (E = cam_matrix^t * F * cam_matrix)
Decomposed the E matrix with OpenCV's SVD function so that I have a rotation matrix and translation vector.
-I know that there are 4 combinations and only 1 is the right translation vector/rotation matrix.
So my thought was: I know that the camera movement from image1 to image2 won't be more than lets say about 20°/AXIS so I can eliminate at least 2 possibilities where the angles are too far off.
For the 2 remaining I have to triangulate the points and see which one is the correct one (I have read that I only need 1 , but due possible errors/outliers it should be done with some more to be sure which one is the right). I think I could use the OpenCV's triangulation function for this? Is my thought right so far? Do I need to calculate the projection error?
Let's move on and assume that I finally obtained the right R|t matrix.
How do I continue? I tried to multiply the normal, as well as transposed rotation matrix which should reverse the rotation (?) (for testing purpose I just tried both possible combinations of R|t, I have not done the triangulation in code yet) with a tracked point in image2. but the calculated point is way too far off from what it should be. Do I need the calibration matrix here as well?
So how can I invert the rotation applied to image2? (to be exact, apply the inverse rotation to my std::vector<cv::Point2f> array which contains the tracked (x,y) points from image2)
Displaying the de-rotated image would be also nice to have. This is done with warpPerspective function? Like in this post ?
(I just don't fully understand what the purpose of A1/A2 and dist in the T matrix is or how I can adopt this solution to solve my problem.)
I have a coordinate system in which the orientation of a camera is represented as R=Rz(k) * Ry(p) * Rx(o) where R is a 3x3 matrix of the composition of 3x3 rotation matrices around each of X,Y,Z-axis. Moreover, I have a convention in which Z-axis is in the viewing direction of the camera. The X-axis is left-right and the Y-axis is bottom-up.
This R matrix is used in a multi-view stereo reconstruction algorithm. I have a data test set which comes with pre-calibrated camera information. I want to use the R matrices that come with this data set. However, I have no idea what kind of rotation order they assume or even their handed-ness.
How would I be able to figure this out? Any ideas?
R=Rz(k) * Ry(p) * Rx(o)
This is a very instable way of doing it. Euler angles are prone to go into gimbal lock, so I strongly advise against their use.
How would I be able to figure this out?
Well, this problem is difficult to express in a closed solution. Your best bet is treating this as a optimization problem in 3 space, where you try find the values for k, p and o to match up with the given rotation matrix R. There are 3 possible permutations on the evaulation order, so you do that optimization for all 3 of them and take the best matching result.