I'm currently working on a project to recover camera 6-DOF-Pose from two images by using SIFT/SURF. In old version of OpenCV, I use findFundamentalMat to find fundamental matrix, then further getting essential matrix with know camera intrinsic K and eventually get R and t by matrix decomposition. The result is very sensitive and unstable.
I saw some people have the same issue here
OpenCV findFundamentalMat very unstable and sensitive
Some people suggest apply Nister's 5-point algorithm which has implemented in the latest version of OpenCV3.0.
I have read an examples from
OpenCV documentation
In the example, it use focal = 1.0 and Point2d pp(0.0, 0.0).
Is this the real focal length and principle point of the camera? what are the unit? in pixel? or in actual size? I am having trouble to understand these two parameters. I think these two parameters should be acquired from a calibration routine, right?
For my current camera (VGA mode), I use Matlab Camera Calibrator to get these two parameters, and these parameters are
Focal length (millimeters): [ 1104 1102]
Principal point (pixels):[ 259 262]
So, if I want to use my camera parameters instead, should I need to directly fill in these values? or convert them to actual size, like millimeter?
Also, the translation result I get looks like a direction rather than actual size, is there any way I can get the actual size translation rather than a direction?
Any help is appreciated.
Focal Length
The focal length that you get from camera calibration is in pixels. It is actually the ratio of the "real" focal length (e.g. in mm) and the pixel size (also in mm). The world units cancel out, and you are left with pixels. Unfortunately, you cannot estimate both the focal length in world units and the pixel size, only their ratio.
Principal Point
The principal point is also in pixels. It is simply the point in the image where it intersects the optical axis. One caveat: the principal point you get from the Camera Calibrator in MATLAB uses 1-based pixel coordinates, where the center of the top-right pixel of the image is (1,1). OpenCV uses 0-based pixel coordinates. So if you want to use your camera parameters in OpenCV, you have to subtract 1 from the principal point.
Translation Vector
The translation vector you get from the essential matrix is a unit vector, because the essential matrix is only defined up to scale. In other words, you get a reconstruction where the unit is the distance between the cameras. If you need a metric reconstruction (in actual world units) you would either need to know the actual distance between the cameras, or you need to be able to detect an object of a known size in the scene. See this example.
Related
I've got a question related to multiple view geometry.
I'm currently dealing with a problem where I have a number of images collected by a drone flying around an object of interest. This object is planar, and I am hoping to eventually stitch the images together.
Letting aside the classical way of identifying corresponding feature pairs, computing a homography and warping/blending, I want to see what information related to this task I can infer from prior known data.
Specifically, for each acquired image I know the following two things: I know the correspondence between the central point of my image and a point on the object of interest (on whose plane I would eventually want to warp my image). I also have a normal vector to the plane of each image.
So, knowing the centre point (in object-centric world coordinates) and the normal, I can derive the plane equation of each image.
My question is, knowing the plane equation of 2 images is it possible to compute a homography (or part of the transformation matrix, such as the rotation) between the 2?
I get the feeling that this may seem like a very straightforward/obvious answer to someone with deep knowledge of visual geometry but since it's not my strongest point I'd like to double check...
Thanks in advance!
Your "normal" is the direction of the focal axis of the camera.
So, IIUC, you have a 3D point that projects on the image center in both images, which is another way of saying that (absent other information) the motion of the camera consists of the focal axis orbiting about a point on the ground plane, plus an arbitrary rotation about the focal axis, plus an arbitrary translation along the focal axis.
The motion has a non-zero baseline, therefore the transformation between images is generally not a homography. However, the portion of the image occupied by the ground plane does, of course, transform as a homography.
Such a motion is defined by 5 parameters, e.g. the 3 components of the rotation vector for the orbit, plus the the angle of rotation about the focal axis, plus the displacement along the focal axis. However the one point correspondence you have gives you only two equations.
It follows that you don't have enough information to constrain the homography between the images of the ground plane.
I have been following this documentation to use OpenCV. In the formula below, I have successfully calculated both the intrinsic as well as the extrinsic matrices(I have made use of the solvePnP() procedure to obtain these matrices). Since, the object is lying on the ground I have substituted Z = 0. Then, I just removed the third column of the extrinsic matrix and multiplied it with intrinsic matrix to obtain a 3X3 projection matrix. I took it's inverse, and multiplied it by image coordinates i.e. su,sv and s.
However, all points in the world coordinates seem to be off by 1 mm or lesser, and hence I am getting not so accurate co-ordinates. Does anyone know where I might be going wrong?
Thanks
The camera calibration will probably always somewhat inaccurate, because for more than 2 calibration images instead of getting one true solution to equation system acquired from calibration images, You get the solution with the smallest error.
The same goes to cv::solvePnP() . You use one of three methods of optimising the many possible solutions for given equation system.
I do not understand how did You get the intrinsic and extrinsic matrices from cv::solvePnP() , which is used to calculate the rotation and translation of the object in camera coordinate system.
What You can do:
Try to get better intrinsic parameters
Try other methods for solvePnP like EPNP or check the RANSAC version
i have seen many blog entries and videos and source coude on the internet about how to carry out camera + projector calibration using openCV, in order to produce the camera.yml, projector.yml and projectorExtrinsics.yml files.
I have yet to see anyone discussing what to do with this files afterwards. Indeed I have done a calibration myself, but I don't know what is the next step in my own application.
Say I write an application that now uses the camera - projector system that I calibrated to track objects and project something on them. I will use contourFind() to grab some points of interest from the moving objects and now I want to project these points (from the projector!) onto the objects!
what I want to do is (for example) track the centre of mass (COM) of an object and show a point on the camera view of the tracked object (at its COM). Then a point should be projected on the COM of the object in real time.
It seems that projectPoints() is the openCV function I should use after loading the yml files, but I am not sure how I will account for all the intrinsic & extrinsic calibration values of both camera and projector. Namely, projectPoints() requires as parameters the
vector of points to re-project (duh!)
rotation + translation matrices. I think I can use the projectorExtrinsics here. or I can use the composeRT() function to generate a final rotation & a final translation matrix from the projectorExtrinsics (which I have in the yml file) and the cameraExtrinsics (which I don't have. side question: should I not save them too in a file??).
intrinsics matrix. this tricky now. should I use the camera or the projector intrinsics matrix here?
distortion coefficients. again should I use the projector or the camera coefs here?
other params...
So If I use either projector or camera (which one??) intrinsics + coeffs in projectPoints(), then I will only be 'correcting' for one of the 2 instruments . Where / how will I use the other's instruments intrinsics ??
What else do I need to use apart from load() the yml files and projectPoints() ? (perhaps undistortion?)
ANY help on the matter is greatly appreciated .
If there is a tutorial or a book (no, O'Reilly "Learning openCV" does not talk about how to use the calibration yml files either! - only about how to do the actual calibration), please point me in that direction. I don't necessarily need an exact answer!
First, you seem to be confused about the general role of a camera/projector model: its role is to map 3D world points to 2D image points. This sounds obvious, but this means that given extrinsics R,t (for orientation and position), distortion function D(.) and intrisics K, you can infer for this particular camera the 2D projection m of a 3D point M as follows: m = K.D(R.M+t). The projectPoints function does exactly that (i.e. 3D to 2D projection), for each input 3D point, hence you need to give it the input parameters associated to the camera in which you want your 3D points projected (projector K&D if you want projector 2D coordinates, camera K&D if you want camera 2D coordinates).
Second, when you jointly calibrate your camera and projector, you do not estimate a set of extrinsics R,t for the camera and another for the projector, but only one R and one t, which represent the rotation and translation between the camera's and projector's coordinate systems. For instance, this means that your camera is assumed to have rotation = identity and translation = zero, and the projector has rotation = R and translation = t (or the other way around, depending on how you did the calibration).
Now, concerning the application you mentioned, the real problem is: how do you estimate the 3D coordinates of a given point ?
Using two cameras and one projector, this would be easy: you could track the objects of interest in the two camera images, triangulate their 3D positions using the two 2D projections using function triangulatePoints and finally project this 3D point in the projector 2D coordinates using projectPoints in order to know where to display things with your projector.
With only one camera and one projector, this is still possible but more difficult because you cannot triangulate the tracked points from only one observation. The basic idea is to approach the problem like a sparse stereo disparity estimation problem. A possible method is as follows:
project a non-ambiguous image (e.g. black and white noise) using the projector, in order to texture the scene observed by the camera.
as before, track the objects of interest in the camera image
for each object of interest, correlate a small window around its location in the camera image with the projector image, in order to find where it projects in the projector 2D coordinates
Another approach, which unlike the one above would use the calibration parameters, could be to do a dense 3D reconstruction using stereoRectify and StereoBM::operator() (or gpu::StereoBM_GPU::operator() for the GPU implementation), map the tracked 2D positions to 3D using the estimated scene depth, and finally project into the projector using projectPoints.
Anyhow, this is easier, and more accurate, using two cameras.
Hope this helps.
The quality of calibration is measured by the reprojection error (is there an alternative?), which requires a knowledge world coordinates of some 3d point(s).
Is there a simple way to produce such known points? Is there a way to verify the calibration in some other way (for example, Zhang's calibration method only requires that the calibration object be planar and the geometry of the system need not to be known)
You can verify the accuracy of the estimated nonlinear lens distortion parameters independently of pose. Capture images of straight edges (e.g. a plumb line, or a laser stripe on a flat surface) spanning the field of view (an easy way to span the FOV is to rotate the camera keeping the plumb line fixed, then add all the images). Pick points on said line images, undistort their coordinates, fit mathematical lines, compute error.
For the linear part, you can also capture images of multiple planar rigs at a known relative pose, either moving one planar target with a repeatable/accurate rig (e.g. a turntable), or mounting multiple planar targets at known angles from each other (e.g. three planes at 90 deg from each other).
As always, a compromise is in order between accuracy requirements and budget. With enough money and a friendly machine shop nearby you can let your fantasy run wild with rig geometry. I had once a dodecahedron about the size of a grapefruit, machined out of white plastic to 1/20 mm spec. Used it to calibrate the pose of a camera on the end effector of a robotic arm, moving it on a sphere around a fixed point. The dodecahedron has really nice properties in regard to occlusion angles. Needless to say, it's all patented.
The images used in generating the intrinsic calibration can also be used to verify it. A good example of this is the camera-calib tool from the Mobile Robot Programming Toolkit (MRPT).
Per Zhang's method, the MRPT calibration proceeds as follows:
Process the input images:
1a. Locate the calibration target (extract the chessboard corners)
1b. Estimate the camera's pose relative to the target, assuming that the target is a planar chessboard with a known number of intersections.
1c. Assign points on the image to a model of the calibration target in relative 3D coordinates.
Find an intrinsic calibration that best explains all of the models generated in 1b/c.
Once the intrinsic calibration is generated, we can go back to the source images.
For each image, multiply the estimated camera pose with the intrinsic calibration, then apply that to each of the points derived in 1c.
This will map the relative 3D points from the target model back to the 2D calibration source image. The difference between the original image feature (chessboard corner) and the reprojected point is the calibration error.
MRPT performs this test on all input images and will give you an aggregate reprojection error.
If you want to verify a full system, including both the camera intrinsics and the camera-to-world transform, you will probably need to build a jig that places the camera and target in a known configuration, then test calculated 3D points against real-world measurements.
On Engine's question: the pose matrix is a [R|t] matrix where R is a pure 3D rotation and t a translation vector. If you have computed a homography from the image, section 3.1 of Zhang's Microsoft Technical Report (http://research.microsoft.com/en-us/um/people/zhang/Papers/TR98-71.pdf) gives a closed form method to obtain both R and t using the known homography and the intrinsic camera matrix K. ( I can't comment, so I added as a new answer)
Should be just variance and bias in calibration (pixel re-projection) errors given enough variability in calibration rig poses. It is better to visualize these errors rather than to look at the values. For example, error vectors pointing to the center would be indicative of wrong focal length. Observing curved lines can give intuition about distortion coefficients.
To calibrate the camera one has to jointly solve for extrinsic and intrinsic. The latter can be known from manufacturer, the solving for extrinsic (rotation and translation) involves decomposition of calculated homography: Decompose Homography matrix in opencv python
Calculate a Homography with only Translation, Rotation and Scale in Opencv
The homography is used here since most calibration targets are flat.
I am currently researching the use of a low resolution camera facing vertically at the ground (fixed height) to measure the speed (speed of the camera passing over the surface). Using OpenCV 2.1 with C++.
Since the entire background will be constantly moving, translating and/or rotating between consequtive frames, what would be the most suitable method in determining the displacement of the frames in a 'useable value' form? (Function that returns frame displacement?) Then based on the height of the camera and the frame area captured (dimensions of the frame in real world), I would be able to calculate the displacement in the real world based on the frame displacement, then calculating the speed for a measured time interval.
Trying to determine my method of approach or if any example code is available, converting a frame displacement (or displacement of a set of pixels) into a distance displacement based on the height of the camera.
Thanks,
Josh.
It depends on your knowledge in computer vision. For the start, I would use what opencv can offer. please have a look at the feature2d module.
What you need is to first extract feature points (e.g. sift or surf), then use its build in matching algorithms to match points extracted from two frames. Each match will give you some constraints, and you will end up solving a over-saturated Ax=B.
Of course, do your experiments offline, i.e. shooting a video first and then operate on the single images.
UPDATE:
In case of mulit-camera calibration, your goal is to determine the 3D location of each camera, which is exactly what you have. Imagine instead of moving your single camera around, you have as many cameras as the number of images in the video captured by your single camera and you want to know the 3D location of each camera location, which represent the location of each image being taken by your single moving camera.
There is a matrix where you can map any 3D point in the world to a 2D point on your image see wiki. The camera matrix consists of 2 parts, intrinsic and extrinsic parameters. I (maybe inexactly) referred intrinsic parameter as the internal matrix. The intrinsic parameters consists of static parameters for a single camera (e.g. focal length), while the extrinsic ones consists of the location and rotation of your camera.
Now, once you have the intrinsic parameters of your camera and the matched points, you can then stack a lot of those projection equations on top of each other and solve the system for both the actual 3D location of all your matched points and all the extrinsic parameters.
Given interest points as described above, you can find the translational transformation with opevcv's findHomography.
Also, if you can assume that transformations will be somewhat small and near-linear, you can just compare image pixels of two consecutive frames to find the best match. With enough downsampling, this doesn't take too long, and from my experience works rather well.
Good luck!