3D reconstruction from 2 images with baseline and single camera calibration - c++

my semester project is to Calibrate Stereo Cameras with a big baseline (~2m).
so my approach is to run without exact defined calibration pattern like the chessboard cause it had to be huge and would be hard to handle.
my problem is similar to this: 3d reconstruction from 2 images without info about the camera
Program till now:
Corner detection left image goodFeaturesToTrack
refined corners cornerSubPix
Find corner locations in right image calcOpticalFlowPyrLK
calculate fundamental matrix F findFundamentalMat
calculate H1, H2 rectification homography matrix stereoRectifyUncalibrated
Rectify images warpPerspective
Calculate Disparity map sgbm
so far so good it works passably but rectified images are "jumping" in perspective if i change the number of corners..
don't know if this if form imprecision or mistakes i mad or if it cant be calculated due to no known camera parameters or no lens distortion compensation (but also happens on Tsukuba pics..)
suggestions are welcome :)
but not my main problem, now i want to reconstruct the 3D points.
but reprojectImageTo3D needs the Q matrix which i don't have so far. so my question is how to calculate it? i have the baseline, distance between the two cameras. My feeling says if i convert des disparity map in to a 3d point cloud the only thing im missing is the scale right? so if i set in the baseline i got the 3d reconstruction right? then how to?
im also planing to compensate lens distortion as the first step for each camera separately with a chessboard (small and close to one camera at a time so i haven't to be 10-15m away with a big pattern in the overlapping area of both..) so if this is helping i could also use the camera parameters..
is there a documentation besides the http://docs.opencv.org? that i can see and understand what and how the Q matrix is calculated or can i open the source code (probably hard to understand for me ^^) if i press F2 in Qt i only see the function with the transfer parameter types.. (sorry im really new to all of this )
left: input with found corners
top h1, h2: rectify images (looks good with this corner count ^^)
SGBM: Disparity map

so i found out what the Q matrix constrains here:
Using OpenCV to generate 3d points (assuming frontal parallel configuration)
all these parameters are given by the single camera calibration:
c_x , c_y , f
and the baseline is what i have measured:
T_x
so this works for now, only the units are not that clear to me, i have used them form single camera calib which are in px and set the baseline in meters, divided the disparity map by 16, but it seams not the right scale..
by the way the disparity map above was wrong ^^ and now it looks better. you have to do a anti Shearing Transform cause the stereoRectifyUncalibrated is Shearing your image (not documented?).
described in this paper at "7 Shearing Transform" by Charles Loop Zhengyou Zhang:
http://research.microsoft.com/en-us/um/people/Zhang/Papers/TR99-21.pdf
Result:
http://i.stack.imgur.com/UkuJi.jpg

Related

OpenCV 3.4 camera calibration yields strange principal point

I'm doing camera calibration using the calibration.cpp sample provided in the OpenCV 3.4 release. I'm using a simple 9x6 chessboard, with square length = 3.45 mm.
Command to run the code:
Calib.exe -w=9 -h=6 -s=3.45 -o=camera.yml -oe imgList.xml
imgList.xml
I'm using a batch of 28 images available here
camera.yml (output)
Image outputs from drawChessboardCorners: here
There are 4 images without the chessboard overlay drawn, findChessboardCorners has failed for these.
Results look kind of strange (if I understand them correctly). I'm taking focal length value for granted, but the principal point seems way off at c = (834, 1513). I was expecting a point closer to the image center at (1280, 960) since the orientation of the camera to the surface viewed is very close to 90 degrees.
Also if I place an object at the principal point and move it in the Z axis I shouldn't see it move along x and y in the image, is this correct?
I suspect I should add images with greater tilt of the chessboard with respect to the camera to get better results (z-angle). But the camera has a really narrow depth of field, and this prevents the chessboard corners from being detected.
The main issue you have is you don't feed the camera software enough information to get the right estimation of different parameters.
In all the 28 images you changed only the orientation of the chessboard around the z axis in the same plane. You don't need to take that much photos, for me around 15 is okay. You need to add more ddl to your images: change the distance of the chessboard from the camera and tilt the chessboard around its X and Y axis. Re calibrate the camera and you should get the right parameters.
It really depends on the camera and lens you use.
More specifically on things like:
precision of chip deployment
attachment of screw thread of lens
manufacturing of lens itself
Some cheap webcam with small chip could even have the principal point out of the image size (means it could be also a negative number). So in your case C could be both - (834,1513) or (1513,834).
If you are using industrial cam or something similar, C should be in range of tens of percent around the centre of the image ->e.g. (1280,960)+-25%.
About the problem with narrow DOF (in nutshell) - to make it wider you need to get aperture as small as possible, prolong the exposure and add some extra light behind the camera to compensate the aperture.
Also you could refocus to get sharp shots from different distances, only your accuracy gets lower as refocusing is slightly changing the focal length. But in most cases you do not need this super extra ultra accuracy so this should not be the problem.

TOF camera calibration - distance to chessboard

For my application, I need to calibrate my TOF-camera (kinect v2).
I have done this with matlab camera calibration. After my calibration, I recognized that right-angled planes are oblique.
For example here a result of two "right-angled" planes:
I think this result is so oblique cause of the wrong parameters from calibration. Therefore I want to improve my calibration process of the kinect.
So I have three major questions:
Is the distance between TOF-camera and chessboard important for the calibration result? For my application, I need a quite high accurancy in the interval 2.5m - 3m (Z-distance between camera and object). So I choose this intervall to get the best result for this area, expecialy because it is a TOF-camera. Or should I take a quite short distance (1-1.5m) to get a good chessboard with a high resolution?
What kind of images (viewpoint: rotated,obiquely / images in the middle or corner) are important to get a good result for the tangential distortion ( I think this parameter turns totataly wrong)? Any tipps to improve here my results?
I fixed my chessboard on a flat wall and fixed my camera on a tripod. For different calibration images I move my tripod. Would this procedure be also ok? Or do I have to move the chessboard pattern to get better results?

OpenCV stitch images by warping both

I already found a lot questions and answers about image stitching and warping with OpenCV but I still could not find an answer to my question.
I have two fisheye cameras which I calibrated successfully so the distortion is removed in both images.
Now I want to stitch those rectified images together. So I pretty much follow this example which is also mentioned in a lot of other stitching questions:
Image Stitching Example
So I do the Keypoint and Descriptor detection. I find matches and also get the Homography matrix so I can warp one of the images which gives me a really stretched image as result. The other image stays untouched. The stretching is something I want to avoid. So I found a nice solution here:
Stretch solution.
On slide 7 you can see that both images are warped. I think this will reduce the stretching of one image (in my opinion the stretching will be separated like for example 50:50). If I am wrong please tell me.
The problem I have is that I don't know how to warp two images so that they fit. Do I have to calculate two homografies? Do I have to define a reference plane like a Rect() or something? How to achieve a warping result as shown on slide 7?
To make it clear, I am not studying at TU Dresden so this is just something I found while doing research.
Warping one of the two images in the coordinate frame of the other is more common because it is easier: one can directly compute the 2D warping transformation from image correspondences.
Warping both images into a new coordinate frame is possible but more complex, because it involves 3D transformations and require to accurately define a new 3D coordinate frame with respect to the initial two.
The basic idea is (very roughly) represented in the hand drawing on the slide #2 in the linked presentation. I made a bigger one:
Basically, the procedure would be as follows:
If your cameras are calibrated, you can estimate the relative 3D pose between the two images exclusively from feature correspondences by computing the fundamental matrix, deducing the essential matrix [HZ03 paragraph 9.6 and equation 9.12], and deducing the relative pose [HZ03 paragraph 9.6.2]. Hence, you can estimate for example the 3D rigid transformation T2<-1 mapping the coordinate frame of img1 onto the coordinate frame of img2:
T2<-1 = R2<-1 * [ I3 | 0 ]
From this, you can define very accurately the image plane for the new image, with respect to the other two images. For example:
Tn<-1 = square_root( R2<-1) * [ I3 | 0 ]
Tn<-2 = Tn<-1 * T2<-1-1
From these two relative poses, you can derive the pixel 2D transformations to warp the two images in the new image plane [HZ03, example 13.2]. Basically, the warping homography respecively from img1 to the new image and from img2 to the new image are:
Hn<-1 = K * Rn<-1 * K-1
Hn<-2 = K * Rn<-2 * K-1
Then you can also compute the range of valid pixels (i.e. xmin, xmax, ymin, ymax) in the new image plane, to crop it and form a new image.
Note that step #3 assumes that the images are taken from the same point in space (pure camera rotation), otherwise there could be some parallax between the images, which could produce visible stitching imperfections.
Hope this helps.
Reference: [HZ03] Hartley, Richard, and Andrew Zisserman. Multiple view geometry in computer vision. Cambridge university press, 2003.

projection matrix from homography

I'm working on stereo-vision with the stereoRectifyUncalibrated() method under OpenCV 3.0.
I calibrate my system with the following steps:
Detect and match SURF feature points between images from 2 cameras
Apply findFundamentalMat() with the matching paairs
Get the rectifying homographies with stereoRectifyUncalibrated().
For each camera, I compute a rotation matrix as follows:
R1 = cameraMatrix[0].inv()*H1*cameraMatrix[0];
To compute 3D points, I need to get projection matrix but i don't know how i can estimate the translation vector.
I tried decomposeHomographyMat() and this solution https://stackoverflow.com/a/10781165/3653104 but the rotation matrix is not the same as what I get with R1.
When I check the rectified images with R1/R2 (using initUndistortRectifyMap() followed by remap()), the result seems correct (I checked with epipolar lines).
I am a little lost with my weak knowledge in vision. Thus if somebody could explain to me. Thank you :)
The code in the link that you have provided (https://stackoverflow.com/a/10781165/3653104) computes not the Rotation but 3x4 pose of the camera.
The last column of the pose is your Translation vector

How to verify that the camera calibration is correct? (or how to estimate the error of reprojection)

The quality of calibration is measured by the reprojection error (is there an alternative?), which requires a knowledge world coordinates of some 3d point(s).
Is there a simple way to produce such known points? Is there a way to verify the calibration in some other way (for example, Zhang's calibration method only requires that the calibration object be planar and the geometry of the system need not to be known)
You can verify the accuracy of the estimated nonlinear lens distortion parameters independently of pose. Capture images of straight edges (e.g. a plumb line, or a laser stripe on a flat surface) spanning the field of view (an easy way to span the FOV is to rotate the camera keeping the plumb line fixed, then add all the images). Pick points on said line images, undistort their coordinates, fit mathematical lines, compute error.
For the linear part, you can also capture images of multiple planar rigs at a known relative pose, either moving one planar target with a repeatable/accurate rig (e.g. a turntable), or mounting multiple planar targets at known angles from each other (e.g. three planes at 90 deg from each other).
As always, a compromise is in order between accuracy requirements and budget. With enough money and a friendly machine shop nearby you can let your fantasy run wild with rig geometry. I had once a dodecahedron about the size of a grapefruit, machined out of white plastic to 1/20 mm spec. Used it to calibrate the pose of a camera on the end effector of a robotic arm, moving it on a sphere around a fixed point. The dodecahedron has really nice properties in regard to occlusion angles. Needless to say, it's all patented.
The images used in generating the intrinsic calibration can also be used to verify it. A good example of this is the camera-calib tool from the Mobile Robot Programming Toolkit (MRPT).
Per Zhang's method, the MRPT calibration proceeds as follows:
Process the input images:
1a. Locate the calibration target (extract the chessboard corners)
1b. Estimate the camera's pose relative to the target, assuming that the target is a planar chessboard with a known number of intersections.
1c. Assign points on the image to a model of the calibration target in relative 3D coordinates.
Find an intrinsic calibration that best explains all of the models generated in 1b/c.
Once the intrinsic calibration is generated, we can go back to the source images.
For each image, multiply the estimated camera pose with the intrinsic calibration, then apply that to each of the points derived in 1c.
This will map the relative 3D points from the target model back to the 2D calibration source image. The difference between the original image feature (chessboard corner) and the reprojected point is the calibration error.
MRPT performs this test on all input images and will give you an aggregate reprojection error.
If you want to verify a full system, including both the camera intrinsics and the camera-to-world transform, you will probably need to build a jig that places the camera and target in a known configuration, then test calculated 3D points against real-world measurements.
On Engine's question: the pose matrix is a [R|t] matrix where R is a pure 3D rotation and t a translation vector. If you have computed a homography from the image, section 3.1 of Zhang's Microsoft Technical Report (http://research.microsoft.com/en-us/um/people/zhang/Papers/TR98-71.pdf) gives a closed form method to obtain both R and t using the known homography and the intrinsic camera matrix K. ( I can't comment, so I added as a new answer)
Should be just variance and bias in calibration (pixel re-projection) errors given enough variability in calibration rig poses. It is better to visualize these errors rather than to look at the values. For example, error vectors pointing to the center would be indicative of wrong focal length. Observing curved lines can give intuition about distortion coefficients.
To calibrate the camera one has to jointly solve for extrinsic and intrinsic. The latter can be known from manufacturer, the solving for extrinsic (rotation and translation) involves decomposition of calculated homography: Decompose Homography matrix in opencv python
Calculate a Homography with only Translation, Rotation and Scale in Opencv
The homography is used here since most calibration targets are flat.