Opencv stitching planar images - c++

I'm writing a program that stitches aerial images from a video in real time taken by a drone. In fact, what I'm doing is:
get two consecutive frames
find features between these two frames
calculate homography
calculate images offset and stitch them together
The problem is the 3rd step, and that's why: Both my code and opencv stitching function calculate homography asserting that I've a rotation in some axis (because it's a panorama). Actually, the drone is acting like a "scanner", so instead having a rotation I've a translation of the camera. This influences my homography and then my stitching step creating artifacts. There's a way to consider the camera translation instead a rotation (like orthographic thing)?
If it helps, I've intrinsic camera parameters calculated with the calibration tutorial
Ps: I'm programming in c++
EDIT
I've found this library written by Nasa http://ti.arc.nasa.gov/tech/asr/intelligent-robotics/nasa-vision-workbench/
could it help? Can I use it togheter with opencv?

Related

Get 3-D world co-ordinates of specific feature points from stereo camera system

I have two webcams (let's call them left webcam and right webcam, respectively), that I am using as a stereo camera system. I have the following information regarding my setup:
Camera intrinsics and distortion coefficients of both the webcams;
Rotation and translation vectors of one of the webcams with respect to the other;
Fundamental and Essential matrices;
Reprojection matrices for both the webcams, so that the images from the webcams can be reprojected in such a way that the epipolar lines are horizontal.
Disparity map (using StereoSGBM)
Now, let's say that using Harris detection (or some other feature-detection technique), I have acquired a list of feature points from the left webcam image. How do I get the 3-D world co-ordinates of these points? Also, how would the procedure, to do so, change if I had feature points from right webcam image?
I am using OpenCV 3.4.10 C++ on Ubuntu 18.04

Stitching images from 2 overlapping cameras stationary relative to each-other

I'm new to CV, and trying to stitch together a video of two cameras which are stationary one relative to the other. The details:
The cameras are one beside the other and I can adjust the rotation angle between them. The cameras will be moving with respect to the world, so the scene will be changing.
The amount of frames to be stitched is roughly 300 (each frame is composed of two pictures, one from each camera).
I don't need to do the stitching in real time, but I want to do it as fast as possible using the fact that I know the relative positions of the cameras. Resolution of each picture is relatively high, around 900x600.
Right now I'm at the stage where I have code to stitch 2 single pictures, courtesy of http://ramsrigoutham.com/2012/11/22/panorama-image-stitching-in-opencv/
The main stages are:
Using SURF detector to find SURF descriptor in both images
matching the SURF descriptor using FLANN Matcher
Postprocessing matches to find good matches
Using RANSAC to estimate the Homography matrix using the matched SURF descriptors
Warping the images based on the homography matrix
My question is: How can I optimize the process based on the fact that I already know the camera positions?
Ideally I would like to do some initial calculation once to find the transform between the camera perspectives, and then reuse it. But not sure with my rudimentary CV knowledge if this is indeed possible, and what transform I could use if so.
I understand that calculating the homography matrix once and reusing it won't work, since the scene is changing.
Two other possibilities:
I found a similar case (but stationary scene) where the transform is computed once and reused. Which transform is this, and could it work in my case?
The other possibility I found is to use the initial knowledge to find the overlapping region between two pictures, and ignore the rest of the pictures to save time. Relevant thread
Any help would be greatly appreciated!
Ron

how do I re-project points in a camera - projector system (after calibration)

i have seen many blog entries and videos and source coude on the internet about how to carry out camera + projector calibration using openCV, in order to produce the camera.yml, projector.yml and projectorExtrinsics.yml files.
I have yet to see anyone discussing what to do with this files afterwards. Indeed I have done a calibration myself, but I don't know what is the next step in my own application.
Say I write an application that now uses the camera - projector system that I calibrated to track objects and project something on them. I will use contourFind() to grab some points of interest from the moving objects and now I want to project these points (from the projector!) onto the objects!
what I want to do is (for example) track the centre of mass (COM) of an object and show a point on the camera view of the tracked object (at its COM). Then a point should be projected on the COM of the object in real time.
It seems that projectPoints() is the openCV function I should use after loading the yml files, but I am not sure how I will account for all the intrinsic & extrinsic calibration values of both camera and projector. Namely, projectPoints() requires as parameters the
vector of points to re-project (duh!)
rotation + translation matrices. I think I can use the projectorExtrinsics here. or I can use the composeRT() function to generate a final rotation & a final translation matrix from the projectorExtrinsics (which I have in the yml file) and the cameraExtrinsics (which I don't have. side question: should I not save them too in a file??).
intrinsics matrix. this tricky now. should I use the camera or the projector intrinsics matrix here?
distortion coefficients. again should I use the projector or the camera coefs here?
other params...
So If I use either projector or camera (which one??) intrinsics + coeffs in projectPoints(), then I will only be 'correcting' for one of the 2 instruments . Where / how will I use the other's instruments intrinsics ??
What else do I need to use apart from load() the yml files and projectPoints() ? (perhaps undistortion?)
ANY help on the matter is greatly appreciated .
If there is a tutorial or a book (no, O'Reilly "Learning openCV" does not talk about how to use the calibration yml files either! - only about how to do the actual calibration), please point me in that direction. I don't necessarily need an exact answer!
First, you seem to be confused about the general role of a camera/projector model: its role is to map 3D world points to 2D image points. This sounds obvious, but this means that given extrinsics R,t (for orientation and position), distortion function D(.) and intrisics K, you can infer for this particular camera the 2D projection m of a 3D point M as follows: m = K.D(R.M+t). The projectPoints function does exactly that (i.e. 3D to 2D projection), for each input 3D point, hence you need to give it the input parameters associated to the camera in which you want your 3D points projected (projector K&D if you want projector 2D coordinates, camera K&D if you want camera 2D coordinates).
Second, when you jointly calibrate your camera and projector, you do not estimate a set of extrinsics R,t for the camera and another for the projector, but only one R and one t, which represent the rotation and translation between the camera's and projector's coordinate systems. For instance, this means that your camera is assumed to have rotation = identity and translation = zero, and the projector has rotation = R and translation = t (or the other way around, depending on how you did the calibration).
Now, concerning the application you mentioned, the real problem is: how do you estimate the 3D coordinates of a given point ?
Using two cameras and one projector, this would be easy: you could track the objects of interest in the two camera images, triangulate their 3D positions using the two 2D projections using function triangulatePoints and finally project this 3D point in the projector 2D coordinates using projectPoints in order to know where to display things with your projector.
With only one camera and one projector, this is still possible but more difficult because you cannot triangulate the tracked points from only one observation. The basic idea is to approach the problem like a sparse stereo disparity estimation problem. A possible method is as follows:
project a non-ambiguous image (e.g. black and white noise) using the projector, in order to texture the scene observed by the camera.
as before, track the objects of interest in the camera image
for each object of interest, correlate a small window around its location in the camera image with the projector image, in order to find where it projects in the projector 2D coordinates
Another approach, which unlike the one above would use the calibration parameters, could be to do a dense 3D reconstruction using stereoRectify and StereoBM::operator() (or gpu::StereoBM_GPU::operator() for the GPU implementation), map the tracked 2D positions to 3D using the estimated scene depth, and finally project into the projector using projectPoints.
Anyhow, this is easier, and more accurate, using two cameras.
Hope this helps.

How to verify that the camera calibration is correct? (or how to estimate the error of reprojection)

The quality of calibration is measured by the reprojection error (is there an alternative?), which requires a knowledge world coordinates of some 3d point(s).
Is there a simple way to produce such known points? Is there a way to verify the calibration in some other way (for example, Zhang's calibration method only requires that the calibration object be planar and the geometry of the system need not to be known)
You can verify the accuracy of the estimated nonlinear lens distortion parameters independently of pose. Capture images of straight edges (e.g. a plumb line, or a laser stripe on a flat surface) spanning the field of view (an easy way to span the FOV is to rotate the camera keeping the plumb line fixed, then add all the images). Pick points on said line images, undistort their coordinates, fit mathematical lines, compute error.
For the linear part, you can also capture images of multiple planar rigs at a known relative pose, either moving one planar target with a repeatable/accurate rig (e.g. a turntable), or mounting multiple planar targets at known angles from each other (e.g. three planes at 90 deg from each other).
As always, a compromise is in order between accuracy requirements and budget. With enough money and a friendly machine shop nearby you can let your fantasy run wild with rig geometry. I had once a dodecahedron about the size of a grapefruit, machined out of white plastic to 1/20 mm spec. Used it to calibrate the pose of a camera on the end effector of a robotic arm, moving it on a sphere around a fixed point. The dodecahedron has really nice properties in regard to occlusion angles. Needless to say, it's all patented.
The images used in generating the intrinsic calibration can also be used to verify it. A good example of this is the camera-calib tool from the Mobile Robot Programming Toolkit (MRPT).
Per Zhang's method, the MRPT calibration proceeds as follows:
Process the input images:
1a. Locate the calibration target (extract the chessboard corners)
1b. Estimate the camera's pose relative to the target, assuming that the target is a planar chessboard with a known number of intersections.
1c. Assign points on the image to a model of the calibration target in relative 3D coordinates.
Find an intrinsic calibration that best explains all of the models generated in 1b/c.
Once the intrinsic calibration is generated, we can go back to the source images.
For each image, multiply the estimated camera pose with the intrinsic calibration, then apply that to each of the points derived in 1c.
This will map the relative 3D points from the target model back to the 2D calibration source image. The difference between the original image feature (chessboard corner) and the reprojected point is the calibration error.
MRPT performs this test on all input images and will give you an aggregate reprojection error.
If you want to verify a full system, including both the camera intrinsics and the camera-to-world transform, you will probably need to build a jig that places the camera and target in a known configuration, then test calculated 3D points against real-world measurements.
On Engine's question: the pose matrix is a [R|t] matrix where R is a pure 3D rotation and t a translation vector. If you have computed a homography from the image, section 3.1 of Zhang's Microsoft Technical Report (http://research.microsoft.com/en-us/um/people/zhang/Papers/TR98-71.pdf) gives a closed form method to obtain both R and t using the known homography and the intrinsic camera matrix K. ( I can't comment, so I added as a new answer)
Should be just variance and bias in calibration (pixel re-projection) errors given enough variability in calibration rig poses. It is better to visualize these errors rather than to look at the values. For example, error vectors pointing to the center would be indicative of wrong focal length. Observing curved lines can give intuition about distortion coefficients.
To calibrate the camera one has to jointly solve for extrinsic and intrinsic. The latter can be known from manufacturer, the solving for extrinsic (rotation and translation) involves decomposition of calculated homography: Decompose Homography matrix in opencv python
Calculate a Homography with only Translation, Rotation and Scale in Opencv
The homography is used here since most calibration targets are flat.

warping images and problem with the matrix of Homography opencv

I have a problem when I'm getting the results of using the function cvFindHomograhpy().
The results it gives negative coordinates, I will explain now what I'm doing.
I'm working on video stabilization by using optical flow method. I have estimated the location of the features in the first and second frame. Now, my aim is to warp the images in order to stabilize the images. Before this step I should calculate the homography matrix between the frames which I used the function mentioned above but the problem that I'm getting this results which it doesn't seem to be realistic because it has negative values and these values can be changed to more weird results.
0.482982 53.5034 -0.100254
-0.000865877 63.6554 -0.000213824
-0.0901095 0.301558 1
After obtaining these results I get a problem to apply for image Warping by using CvWarpPerspective(). The error shows that there is a problem with using the matrices. Incorrect transforming from "cvarrTomat"?
So where is the problem? Can you give me another suggestion if it's available?
Notice: if you can help me about implementing the warping in c++ it would be great.
Thank you
A poor homography estimation can generate warping error inside CvWarpPerspective().
The homography values you have posted show that you have a full projective transformation that move points at infinity to points in 2d euclidean plane and it could be wrong.
In video stabilization to compute a good homography model usually other features are used such as harris corner, hessian affine or SIFT/SURF combined with a robust model estimator such as RANSAC or LMEDS.
Check out this link for a matlab example on video stabilization...