How to calibrate the Kinect camera? - computer-vision

Camera calibration is the process of estimating intrinsic and/or extrinsic parameters. Intrinsic parameters deal with the camera's internal characteristics, such as, its focal length, skew, distortion, and image center. Extrinsic parameters describe its position and orientation in the world. Knowing intrinsic parameters is an essential first step for 3D computer vision, as it allows you to estimate the scene's structure in Euclidean space and removes lens distortion, which degraces accuracy.
I'm using the Kinect for Computer Vision but I need to calibrate it. I've already read some articles about Kinect calibration but I didn't understand very clearly.
I want to start from nothing. Because I need to know how the calibration is done.
How do I do this?
Thanks.

The Kinect is slightly different than your standard camera. There is a customized toolbox here http://www.ee.oulu.fi/~dherrera/kinect/
I'd suggest your read the paper and try to understand what calibration is.
In very simplistic terms you need calibration so that other geometric algorithms can work. The vast majority of geometry-based algorithms in vision assume the pinhole camera model. That is, the center of the camera is a tiny pinhole and rays reflecting off of objects travel in straight lines.
However, a pinhole camera is not practical to manufacture. You can make a pinhole camera at home, but the image quality won't be good.
People use lenses to deal with this. But, lenses are imperfect and they have distortion.
Distortion means that pixel coordinates do not correspond to straight lines anymore. So, many of the algorithms fail to compute the right thing.
Camera intrinsic calibration corrects the distortion in the lens so that the projection is as close to a pinhole as possible.
The Kinect has two cameras. The RGB camera, and the IR camera. Both are factory calibrated, but you can get better results customized for the sensor you use using the toolbox above.
HTH

Thulio, for calibrating the color camera of Kinect have a look at:
camera calibration toolbox. Basically you need to print a chessboard pattern on paper, glue it on a planar surface, take a lot of Kinect color pictures and load it with the toolbox to get your camera parameters. I suspect that somebody else may have done that before you (most Kinects will have the same intrinsic parameters, I guess).

Related

OpenCv Blob tracking of point relative to plane

Am doing an installation that tracks blobs using openCv, and projecting graphics over the blobs. Problem is my camera is off and away from the projector.
I'm thinking to get the point's position in relation to the projection's plane, I would need to calibrate by marking out the plane's corners as seen in the camera view.
My problem is how do i use that 4 points info, and then convert the tracked blob from the camera view to the projection plane, so the projected graphic lines up with the tracked blob? Not sure what i should be searching for.
After you detect the 4 corners points, you can calculate the transformation to the projector plane by using PerspectiveTransform.
Once you have this transformation, you could use warpPerspective, to go from one coordinate system to another.
Unfortunately I'm unable to help with a minimal code example at the moment, but I recommend having a look at ofxCv and it's examples. There is a camera based undistort example, but the wrapper also provides utilities for warping/unwarping perspective via warpPerspective and unwarpPerspective.
Bare in mind ofxCv has handy function to convert to/from ofImage to cv::Mat like toCv() and toOf()
ofxCv may make it easier to use the OpenCV functions Elad Joseph recommends (which sound like exactly what you need)

OpenCV triangulatePoints varying distance

I am using OpenCV's triangulatePoints function to determine 3D coordinates of a point imaged by a stereo camera.
I am experiencing that this function gives me different distance to the same point depending on angle of camera to that point.
Here is a video:
https://www.youtube.com/watch?v=FrYBhLJGiE4
In this video, we are tracking the 'X' mark. In the upper left corner info is displayed about the point that is being tracked. (Youtube dropped the quality, the video is normally much sharper. (2x1280) x 720)
In the video, left camera is the origin of 3D coordinate system and it's looking in positive Z direction. Left camera is undergoing some translation, but not nearly as much as the triangulatePoints function leads to believe. (More info is in the video description.)
Metric unit is mm, so the point is initially triangulated at ~1.94m distance from the left camera.
I am aware that insufficiently precise calibration can cause this behaviour. I have ran three independent calibrations using chessboard pattern. The resulting parameters vary too much for my taste. ( Approx +-10% for focal length estimation).
As you can see, the video is not highly distorted. Straight lines appear pretty straight everywhere. So the optimimum camera parameters must be close to the ones I am already using.
My question is, is there anything else that can cause this?
Can a convergence angle between the two stereo cameras can have this effect? Or wrong baseline length?
Of course, there is always a matter of errors in feature detection. Since I am using optical flow to track the 'X' mark, I get subpixel precision which can be mistaken by... I don't know... +-0.2 px?
I am using the Stereolabs ZED stereo camera. I am not accessing the video frames using directly OpenCV. Instead, I have to use the special SDK I acquired when purchasing the camera. It has occured to me that this SDK I am using might be doing some undistortion of its own.
So, now I wonder... If the SDK undistorts an image using incorrect distortion coefficients, can that create an image that is neither barrel-distorted nor pincushion-distorted but something different altogether?
The SDK provided with the ZED Camera performs undistortion and rectification of images. The geometry model is based on the same as openCV :
intrinsic parameters and distortion parameters for both Left and Right cameras.
extrinsic parameters for rotation/translation between Right and Left.
Through one of the tool of the ZED ( ZED Settings App), you can enter your own intrinsic matrix for Left/Right and distortion coeff, and Baseline/Convergence.
To get a precise 3D triangulation, you may need to adjust those parameters since they have a high impact on the disparity you will estimate before converting to depth.
OpenCV gives a good module to calibrate 3D cameras. It does :
-Mono calibration (calibrateCamera) for Left and Right , followed by a stereo calibration (cv::StereoCalibrate()). It will output Intrinsic parameters (focale, optical center (very important)), and extrinsic (Baseline = T[0], Convergence = R[1] if R is a 3x1 matrix). the RMS (return value of stereoCalibrate()) is a good way to see if the calibration has been done correctly.
The important thing is that you need to do this calibration on raw images, not by using images provided with the ZED SDK. Since the ZED is a standard UVC Camera, you can use opencv to get the side by side raw images (cv::videoCapture with the correct device number) and extract Left and RIght native images.
You can then enter those calibration parameters in the tool. The ZED SDK will then perform the undistortion/rectification and provide the corrected images. The new camera matrix is provided in the getParameters(). You need to take those values when you triangulate, since images are corrected as if they were taken from this "ideal" camera.
hope this helps.
/OB/
There are 3 points I can think of and probably can help you.
Probably the least important, but from your description you have separately calibrated the cameras and then the stereo system. Running an overall optimization should improve the reconstruction accuracy, as some "less accurate" parameters compensate for the other "less accurate" parameters.
If the accuracy of reconstruction is important to you, you need to have a systematic approach to reducing it. Building an uncertainty model, thanks to the mathematical model, is easy and can write a few lines of code to build that for you. Say you want to see if the 3d point is 2 meters away, at a particular angle to the camera system, and you have a specific uncertainty on the 2d projections of the 3d point, it's easy to backproject the uncertainty to the 3d space around your 3d point. By adding uncertainty to the other parameters of the system then you can see which ones are more important and need to have lower uncertainty.
This inaccuracy is inherent in the problem and the method you're using.
First if you model the uncertainty you will see the reconstructed 3d points further away from the center of cameras have a much higher uncertainty. The reason is that the angle <left-camera, 3d-point, right-camera> is narrower. I remember the MVG book had a good description of this with a figure.
Second, if you look at the implementation of triangulatePoints you see that the pseudo-inverse method is implemented using SVD to construct the 3d point. That can lead to many issues, which you probably remember from linear algebra.
Update:
But I consistently get larger distance near edges and several times
the magnitude of the uncertainty caused by the angle.
That's the result of using pseudo-inverse, a numerical method. You can replace that with a geometrical method. One easy method is to back-project the 2d-projections to get 2 rays in 3d space. Then you want to find where the intersect, which doesn't happen due to the inaccuracies. Instead you want to find the point where the 2 rays have the least distance. Without considering the uncertainty you will consistently favor a point from the set of feasible solutions. That's why with pseudo inverse you don't see any fluctuation but a gross error.
Regarding the general optimization, yes, you can run an iterative LM optimization on all the parameters. This is the method used in applications like SLAM for autonomous vehicles where accuracy is very important. You can find some papers by googling bundle adjustment slam.

Camera pose estimation

I am trying to write a program from scratch that can estimate the pose of a camera. I am open to any programming language and using inbuilt functions/methods for feature detection...
I have been exploring different ways of estimating pose like SLAM, PTAM, DTAM etc... but I don't really need need tracking and mapping, I just need the pose.
Can any of you suggest an approach or any resource that can help me ? I know what pose is and a rough idea of how to estimate it but I am unable to find any resources that explain how it can be done.
I was thinking of starting with a video recorded, extracting features from the video and then using these features and geometry to estimate the pose.
(Please forgive my naivety, I am not a computer vision person and am fairly new to all of this)
In order to compute a camera pose, you need to have a reference frame that is given by some known points in the image.
These known points come for example from a calibration pattern, but can also be some known landmarks in your images (for example, the 4 corners of teh base of Gizeh pyramids).
The problem of estimating the pose of the camera given known landmarks seen by the camera (ie, finding 3D position from 2D points) is classically known as PnP.
OpenCV provides you a ready-made solver for this problem.
However, you need first to calibrate your camera, ie, you need to determine what makes it unique.
The parameters that you need to estimate are called intrinsic parameters, because they will depend on the camera focal length, sensor size... but not on the camera location or orientation.
These parameters will mathematically explain how world points are projected onto your camera sensor frame.
You can estimate them from known planar patterns (again, OpenCV has some ready-made functions for that).
Generally, you can extract the pose of a camera only relative to a given reference frame.
It is quite common to estimate the relative pose between one view of a camera to another view.
The most general relationship between two views of the same scene from two different cameras, is given by the fundamental matrix (google it).
You can calculate the fundamental matrix from correspondences between the images. For example look in the Matlab implementation:
http://www.mathworks.com/help/vision/ref/estimatefundamentalmatrix.html
After calculating this, you can use a decomposition of the fundamental matrix in order to get the relative pose between the cameras. (Look here for example: http://www.daesik80.com/matlabfns/function/DecompPMatQR.m).
You can work a similar procedure in case you have a calibrated camera, and then you need the Essential matrix instead of fundamnetal.

Augmented Reality OpenGL+OpenCV

I am very new to OpenCV with a limited experience on OpenGL. I am willing to overlay a 3D object on a calibrated image of a checkerboard. Any tips or guidance?
The basic idea is that you have 2 cameras: one is the physical one (the one where you are retriving the images with opencv) and one is the opengl one. You have to align those two matrices.
To do that, you need to calibrate the physical camera.
First. You need a distortion parameters (because every lens more or less has some optical distortion), and build with those parameters the so called intrinsic parameters. You do this with printing a chessboard in a paper, using it for get some images and calibrate the camera. It's full of nice tutorial about that on the internet, and from your answer it seems you have them. That's nice.
Then. You have to calibrate the position of the camera. And this is done with the so called extrinsic parameters. Those parameters encoded the position and the rotation the the 3D world of those camera.
The intrinsic parameters are needed by the OpenCV methods cv::solvePnP and cv::Rodrigues and that uses the rodrigues method to get the extrinsic parameters. This method get in input 2 set of corresponding points: some 3D knowon points and their 2D projection. That's why all augmented reality applications need some markers: usually the markers are square, so after detecting it you know the 2D projection of the point P1(0,0,0) P2(0,1,0) P3(1,1,0) P4(1,0,0) that forms a square and you can find the plane lying on them.
Once you have the extrinsic parameters all the game is easily solved: you just have to make a perspective projection in OpenGL with the FoV and the aperture angle of the camera from intrinsic parameter and put the camera in the position given by the extrinsic parameters.
Of course, if you want (and you should) understand and handle each step of this process correctly.. there is a lot of math - matrices, angles, quaternion, matrices again, and.. matrices again. You can find a reference in the famous Multiple View Geometry in Computer Vision from R. Hartley and A. Zisserman.
Moreover, to handle correctly the opengl part you have to deal with the so called "Modern OpenGL" (remember that glLoadMatrix is deprecated) and a little bit of shader for loading the matrices of the camera position (for me this was a problem because I didn't knew anything about it).
I have dealt with this some times ago and I have some code so feel free to ask any kind of problems you have. Here some links I found interested:
http://ksimek.github.io/2012/08/14/decompose/ (really good explanation)
Camera position in world coordinate from cv::solvePnP (a question I asked about that)
http://www.morethantechnical.com/2010/11/10/20-lines-ar-in-opencv-wcode/ (fabulous blog about computer vision)
http://spottrlabs.blogspot.it/2012/07/opencv-and-opengl-not-always-friends.html (nice tricks)
http://strawlab.org/2011/11/05/augmented-reality-with-OpenGL/
http://www.songho.ca/opengl/gl_projectionmatrix.html (very good explanation on opengl camera settings basics)
some other random usefull stuffs:
http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html (documentation, always look at the docs!!!)
Determine extrinsic camera with opencv to opengl with world space object
Rodrigues into Eulerangles and vice versa
Python Opencv SolvePnP yields wrong translation vector
http://answers.opencv.org/question/23089/opencv-opengl-proper-camera-pose-using-solvepnp/
Please read them before anything else. As usual, once you got the concept it is an easy joke, need to crash your brain a little bit against the wall. Just don't be scared from all those math : )

How to verify that the camera calibration is correct? (or how to estimate the error of reprojection)

The quality of calibration is measured by the reprojection error (is there an alternative?), which requires a knowledge world coordinates of some 3d point(s).
Is there a simple way to produce such known points? Is there a way to verify the calibration in some other way (for example, Zhang's calibration method only requires that the calibration object be planar and the geometry of the system need not to be known)
You can verify the accuracy of the estimated nonlinear lens distortion parameters independently of pose. Capture images of straight edges (e.g. a plumb line, or a laser stripe on a flat surface) spanning the field of view (an easy way to span the FOV is to rotate the camera keeping the plumb line fixed, then add all the images). Pick points on said line images, undistort their coordinates, fit mathematical lines, compute error.
For the linear part, you can also capture images of multiple planar rigs at a known relative pose, either moving one planar target with a repeatable/accurate rig (e.g. a turntable), or mounting multiple planar targets at known angles from each other (e.g. three planes at 90 deg from each other).
As always, a compromise is in order between accuracy requirements and budget. With enough money and a friendly machine shop nearby you can let your fantasy run wild with rig geometry. I had once a dodecahedron about the size of a grapefruit, machined out of white plastic to 1/20 mm spec. Used it to calibrate the pose of a camera on the end effector of a robotic arm, moving it on a sphere around a fixed point. The dodecahedron has really nice properties in regard to occlusion angles. Needless to say, it's all patented.
The images used in generating the intrinsic calibration can also be used to verify it. A good example of this is the camera-calib tool from the Mobile Robot Programming Toolkit (MRPT).
Per Zhang's method, the MRPT calibration proceeds as follows:
Process the input images:
1a. Locate the calibration target (extract the chessboard corners)
1b. Estimate the camera's pose relative to the target, assuming that the target is a planar chessboard with a known number of intersections.
1c. Assign points on the image to a model of the calibration target in relative 3D coordinates.
Find an intrinsic calibration that best explains all of the models generated in 1b/c.
Once the intrinsic calibration is generated, we can go back to the source images.
For each image, multiply the estimated camera pose with the intrinsic calibration, then apply that to each of the points derived in 1c.
This will map the relative 3D points from the target model back to the 2D calibration source image. The difference between the original image feature (chessboard corner) and the reprojected point is the calibration error.
MRPT performs this test on all input images and will give you an aggregate reprojection error.
If you want to verify a full system, including both the camera intrinsics and the camera-to-world transform, you will probably need to build a jig that places the camera and target in a known configuration, then test calculated 3D points against real-world measurements.
On Engine's question: the pose matrix is a [R|t] matrix where R is a pure 3D rotation and t a translation vector. If you have computed a homography from the image, section 3.1 of Zhang's Microsoft Technical Report (http://research.microsoft.com/en-us/um/people/zhang/Papers/TR98-71.pdf) gives a closed form method to obtain both R and t using the known homography and the intrinsic camera matrix K. ( I can't comment, so I added as a new answer)
Should be just variance and bias in calibration (pixel re-projection) errors given enough variability in calibration rig poses. It is better to visualize these errors rather than to look at the values. For example, error vectors pointing to the center would be indicative of wrong focal length. Observing curved lines can give intuition about distortion coefficients.
To calibrate the camera one has to jointly solve for extrinsic and intrinsic. The latter can be known from manufacturer, the solving for extrinsic (rotation and translation) involves decomposition of calculated homography: Decompose Homography matrix in opencv python
Calculate a Homography with only Translation, Rotation and Scale in Opencv
The homography is used here since most calibration targets are flat.