how to perform dense matching between two arbitrary angle stereo pairs? - computer-vision

I have stereo pairs from two orbits of a satellite. I am trying to generate dense match points between these pairs so that I can estimate 3D world coordinates using elements of Satellite-Earth Geometry.
Question: Which method/approach can be used for this arbitrary matching of the stereo pairs (which are not rectified so that x-only parallax is present)?
So far:
1) I am successful at using Hierarchical image matching based on normalized cross correlation which gives good results but takes a lot of time.
2) I have tried Semi Global Matching on images obtained after uncalibrated - rectification. But this method doesnot give good correspondence, though the disparity map generated is nice and smooth. Also, uncalibrated-rectification is very sensitive process.
Assume I do not have any camera parameters with me.
Edit:
1) The images are taken by pushbroom model with a line-sensor.
2) The images are neither georeferenced nor orthorectified, they are just radiometrically corrected.

Related

Stitching images from 2 overlapping cameras stationary relative to each-other

I'm new to CV, and trying to stitch together a video of two cameras which are stationary one relative to the other. The details:
The cameras are one beside the other and I can adjust the rotation angle between them. The cameras will be moving with respect to the world, so the scene will be changing.
The amount of frames to be stitched is roughly 300 (each frame is composed of two pictures, one from each camera).
I don't need to do the stitching in real time, but I want to do it as fast as possible using the fact that I know the relative positions of the cameras. Resolution of each picture is relatively high, around 900x600.
Right now I'm at the stage where I have code to stitch 2 single pictures, courtesy of http://ramsrigoutham.com/2012/11/22/panorama-image-stitching-in-opencv/
The main stages are:
Using SURF detector to find SURF descriptor in both images
matching the SURF descriptor using FLANN Matcher
Postprocessing matches to find good matches
Using RANSAC to estimate the Homography matrix using the matched SURF descriptors
Warping the images based on the homography matrix
My question is: How can I optimize the process based on the fact that I already know the camera positions?
Ideally I would like to do some initial calculation once to find the transform between the camera perspectives, and then reuse it. But not sure with my rudimentary CV knowledge if this is indeed possible, and what transform I could use if so.
I understand that calculating the homography matrix once and reusing it won't work, since the scene is changing.
Two other possibilities:
I found a similar case (but stationary scene) where the transform is computed once and reused. Which transform is this, and could it work in my case?
The other possibility I found is to use the initial knowledge to find the overlapping region between two pictures, and ignore the rest of the pictures to save time. Relevant thread
Any help would be greatly appreciated!
Ron

displacement between two images using opencv surf

I am working on image processing with OPENCV.
I want to find the x,y and the rotational displacement between two images in OPENCV.
I have found the features of the images using SURF and the features have been matched.
Now i want to find the displacement between the images. How do I do that? Can RANSAC be useful here?
regards,
shiksha
Rotation and two translations are three unknowns so your min number of matches is two (since each match delivers two equations or constraints). Indeed imagine a line segment between two points in one image and the corresponding (matched) line segment in another image. The difference between segments' orientations gives you a rotation angle. After you rotated just use any of the matched points to find translation. Thus this is 3DOF problem that requires two points. It is called Euclidean transformation or rigid body transformation or orthogonal Procrustes.
Using Homography (that is 8DOF problem ) that has no close form solution and relies on non-linear optimization is a bad idea. It is slow (in RANSAC case) and inaccurate since it adds 5 extra DOF. RANSAC is only needed if you have outliers. In the case of pure noise and overdetrmined system (more than 2 points) your optimal solution that minimizes the sum of squares of geometric distance between matched points is given in a close form by:
Problem statement: min([R*P+t-Q]2), R-rotation, t-translation
Solution: R = VUT, t = R*Pmean-Qmean
where X=P-Pmean; Y=Q-Qmean and we take SVD to get X*YT=ULVT; all matrices have data points as columns. For a gentle intro into rigid transformations see this

OpenCV Image stiching when camera parameters are known

We have pictures taken from a plane flying over an area with 50% overlap and is using the OpenCV stitching algorithm to stitch them together. This works fine for our version 1. In our next iteration we want to look into a few extra things that I could use a few comments on.
Currently the stitching algorithm estimates the camera parameters. We do have camera parameters and a lot of information available from the plane about camera angle, position (GPS) etc. Would we be able to benefit anything from this information in contrast to just let the algorithm estimate everything based on matched feature points?
These images are taken in high resolution and the algorithm takes up quite amount of RAM at this point, not a big problem as we just spin large machines up in the cloud. But I would like to in our next iteration to get out the homography from down sampled images and apply it to the large images later. This will also give us more options to manipulate and visualize other information on the original images and be able to go back and forward between original and stitched images.
If we in question 1 is going to take apart the stitching algorithm to put in the known information, is it just using the findHomography method to get the info or is there better alternatives to create the homography when we actually know the plane position and angles and the camera parameters.
I got a basic understanding of opencv and is fine with c++ programming so its not a problem to write our own customized stitcher, but the theory is a bit rusty here.
Since you are using homographies to warp your imagery, I assume you are capturing areas small enough that you don't have to worry about Earth curvature effects. Also, I assume you don't use an elevation model.
Generally speaking, you will always want to tighten your (homography) model using matched image points, since your final output is a stitched image. If you have the RAM and CPU budget, you could refine your linear model using a max likelihood estimator.
Having a prior motion model (e.g. from GPS + IMU) could be used to initialize the feature search and match. With a good enough initial estimation of the feature apparent motion, you could dispense with expensive feature descriptor computation and storage, and just go with normalized crosscorrelation.
If I understand correctly, the images are taken vertically and overlap by a known amount of pixels, in that case calculating homography is a bit overkill: you're just talking about a translation matrix, and using more powerful algorithms can only give you bad conditioned matrixes.
In 2D, if H is a generalised homography matrix representing a perspective transformation,
H=[[a1 a2 a3] [a4 a5 a6] [a7 a8 a9]]
then the submatrixes R and T represent rotation and translation, respectively, if a9==1.
R= [[a1 a2] [a4 a5]], T=[[a3] [a6]]
while [a7 a8] represents the stretching of each axis. (All of this is a bit approximate since when all effects are present they'll influence each other).
So, if you known the lateral displacement, you can create a 3x3 matrix having just a3, a6 and a9=1 and pass it to cv::warpPerspective or cv::warpAffine.
As a criteria of matching correctness you can, f.e., calculate a normalized diff between pixels.

aligning 2 face images based on their marker points

I am using open cv and C++. I have 2 face images which contain marker points on them. I have already found the coordinates of the marker points. Now I need to align those 2 face images based on those coordinates. The 2 images may not be necessarily of the same height, that is why I can't figure out how to start aligning them, what should be done etc.
In your case, you cannot apply the homography based alignment procedure. Why not? Because it does not fit in this use case. It was designed to align flat surfaces. Faces (3D objects) with markers at different places and depths are clearly no planar surface.
Instead, you can:
try to match the markers between images then interpolate the displacement field of the other pixels. Classical ways of doing it will include moving least squares interpolation or RBF's;
otherwise, a more "Face Processing" way of doing it would be to use the decomposition of faces images between a texture and a face model (like AAM does) and work using the decomposition of your faces in this setup.
Define "align".
Or rather, notice that there does not exist a unique warp of the face-side image that matches the overlapping parts of the frontal one - meaning that there are infinite such warps.
So you need to better specify what your goal is, and what extra information you have, in addition to the images and a few matched points on them. For example, is your camera setup calibrated? I.e do you know the focal lengths of the cameras and their relative position and poses?
Are you trying to build a texture map (e.g. a projective one) so you can plaster a "merged" face image on top of a 3d model that you already have? Then you may want to look into cylindrical or spherical maps, and build a cylindrical or spherical projection of your images from their calibrated poses.
Or are you trying to reconstruct the whole 3d shape of the head based on those 2 views? Obviously you can do this only over the small strip where the two images overlap, and they quality of the images you posted seems a little too poor for that.
Or...?

3D reconstruction using stereo vison - theory

I am currently reading into the topic of stereo vision, using the book of Hartley&Zimmerman alongside some papers, as I am trying to develop an algorithm capable of creating elevation maps from two images.
I am trying to come up with the basic steps for such an algorithm. This is what I think I have to do:
If I have two images I somehow have to find the fundamental matrix, F, in order to find the actual elevation values at all points from triangulation later on. If the cameras are calibrated this is straightforward if not it is slightly more complex (plenty of methods for this can be found in H&Z).
It is necessary to know F in order to obtain the epipolar lines. These are lines that are used in order to find image point x in the first image back in the second image.
Now comes the part were it gets a bit confusing for me:
Now I would start taking a image point x_i in the first picture and try to find the corresponding point x_i’ in the second picture, using some matching algorithm. Using triangulation it is now possible to compute the real world point X and from that it’s elevation. This process will be repeated for every pixel in the right image.
In the perfect world (no noise etc) triangulation will be done based on
x1=P1X
x2=P2X
In the real world it is necessary to find a best fit instead.
Doing this for all pixels will lead to the complete elevation map as desired, some pixels will however be impossible to match and therefore can't be triangulated.
What confuses me most is that I have the feeling that Hartley&Zimmerman skip the entire discussion on how to obtain your point correspondences (matching?) and that the papers I read in addition to the book talk a lot about disparity maps which aren’t mentioned in H&Z at all. However I think I understood correctly that the disparity is simply the difference x1_i- x2_i?
Is this approach correct, and if not where did I make mistakes?
Your approach is in general correct.
You can think of a stereo camera system as two points in space where their relative orientation is known. This are the optical centers. In front of each optical center, you have a coordinate system. These are the image planes. When you have found two corresponding pixels, you can then calculate a line for each pixel, wich goes throug the pixel and the respectively optical center. Where the two lines intersect, there is the object point in 3D. Because of the not perfect world, they will probably not intersect and one may use the point where the lines are closest to each other.
There exist several algorithms to detect which points correspond.
When using disparities, the two image planes need to be aligned such that the images are parallel and each row in image 1 corresponds to the same row in image 2. Then correspondences only need to be searched on a per row basis. Then it is also enough to know about the differences on x-axis of the single corresponding points. This is then the disparity.