I'm working on a Project for my University where we want a Quadrcopter to stabilize himself with his camera. Unfortunately the Fundamental matrix reacts very sensible to little changes within the featurpoints, i'll give you examples later on.
I think my matching already works pretty good thanks to ocv.
I'm using SURF Features and match them with the knn-Method:
SurfFeatureDetector surf_detect;
surf_detect = SurfFeatureDetector(400);
//detect keypoints
surf_detect.detect(fr_one.img, fr_one.kp);
surf_detect.detect(fr_two.img, fr_two.kp);
//extract keypoints
SurfDescriptorExtractor surf_extract;
surf_extract.compute(fr_one.img, fr_one.kp, fr_one.descriptors);
surf_extract.compute(fr_two.img, fr_two.kp, fr_two.descriptors);
//match keypoints
vector<vector<DMatch> > matches1,matches2;
vector<DMatch> symMatches,goodMatches;
FlannBasedMatcher flann_match;
flann_match.knnMatch(fr_one.descriptors, fr_two.descriptors, matches1,2);
flann_match.knnMatch(fr_two.descriptors, fr_one.descriptors, matches2,2);
//test matches in both ways
symmetryTest(matches1,matches2,symMatches);
std::vector<cv::Point2f> points1, points2;
for (std::vector<cv::DMatch>::const_iterator it= symMatches.begin();
it!= symMatches.end(); ++it)
{
//left keypoints
float x= fr_one.kp[it->queryIdx].pt.x;
float y= fr_one.kp[it->queryIdx].pt.y;
points1.push_back(cv::Point2f(x,y));
//right keypoints
x = fr_two.kp[it->trainIdx].pt.x;
y = fr_two.kp[it->trainIdx].pt.y;
points2.push_back(cv::Point2f(x,y));
}
//kill outliers with ransac
vector<uchar> inliers(points1.size(),0);
findFundamentalMat(Mat(points1),Mat(points2),
inliers,CV_FM_RANSAC,3.f,0.99f);
std::vector<uchar>::const_iterator
itIn= inliers.begin();
std::vector<cv::DMatch>::const_iterator
itM= symMatches.begin();
for ( ;itIn!= inliers.end(); ++itIn, ++itM)
{
if (*itIn)
{
goodMatches.push_back(*itM);
}
}
Now i want to compute the Fundamental Matrix with these matches. I'm using the 8POINT method for this example - i already tried it with LMEDS and RANSAC - there it only get's worse because there are more matches which change.
vector<int> pointIndexes1;
vector<int> pointIndexes2;
for (vector<DMatch>::const_iterator it= goodMatches.begin();
it!= goodMatches.end(); ++it) {
pointIndexes1.push_back(it->queryIdx);
pointIndexes2.push_back(it->trainIdx);
}
vector<Point2f> selPoints1, selPoints2;
KeyPoint::convert(fr_one.kp,selPoints1,pointIndexes1);
KeyPoint::convert(fr_two.kp,selPoints2,pointIndexes2);
Mat F = findFundamentalMat(Mat(selPoints1),Mat(selPoints2),CV_FM_8POINT);
When i call these calculations within a loop on the same pair of images the result of F varies very much - theres no way to extract movement from such calculations.
I generated an example where i filtered out some matches so that you can see the effect i mentioned for yourselves.
http://abload.de/img/div_c_01ascel.png
http://abload.de/img/div_c_02zpflj.png
Is there something wrong with my code or do i have to think about other reasons like image-quality and so on ?
Thanks in advance for the Help !
derfreak
To summarize what others have already stated and elaborate in more detail,
As currently implemented in OpenCV, the 8-point algorithm has no outlier rejection. It is a least-squares algorithm and cannot be used with RANSAC or LMEDS because these flags override the 8-point flag. It is recommended that the input points are normalized to improve the condition number of the matrix in the linear equation, as stated in "In Defence of the 8-point Algorithm". However, the OpenCV implementation automatically normalizes the input points, so there is no need to normalize them manually.
The 5-point and 7-point algorithms both have outlier rejection, using RANSAC or LMEDS. If you are using RANSAC, you may need to tune the threshold to get good results. The OpenCV documentation shows that the default threshold for RANSAC is 1.0, which in my opinion is a bit large. I might recommend using something around 0.1 pixels. On the other hand, if you are using LMEDS you won't need to worry about the threshold, because LMEDS minimizes the median error instead of counting inliers. LMEDS and RANSAC both have similar accuracy if the correct threshold is used and both have comparable computation time.
The 5-point algorithm is more robust than the 7-point algorithm because it only has 5 degrees of freedom (3 rotation and 2 for the unit-vector translation) instead of 7 (the additional 2 parameters are for the camera principle points). This minimal parameterization allows the rotation and translation to be easily extracted from the matrix using SVD and avoids the planar structure degeneracy problem.
However, in order to get accurate results with the 5-point algorithm, the focal length must be known. The paper suggests that the focal length should be known within 10%, otherwise the 5-point algorithm is no better than the other uncalibrated algorithms. If you haven't performed camera calibration before, check out the OpenCV camera calibration tutorial. Also, if you are using ROS, there is a nice camera calibration package.
When using the OpenCV findEssentialMat function I recommend first passing the pixel points to undistortPoints. This not only reverses the effect of lens distortion, but also transforms the coordinates to normalized image coordinates. Normalized image coordinates (not to be confused with the normalization done in the 8-point algorithm) are camera agnostic coordinates that do not depend on any of the camera intrinsic parameters. They represent the angle of the bearing vector to the point in the real world. For example, a normalized image coordinate of (1, 0) would correspond to a bearing angle of 45 degrees from the optical axis of the camera in the x direction and 0 degrees in the y direction.
After using RANSAC to obtain a good hypothesis, the best estimate can be improved by using iterative robust non-linear least-squares. This is mentioned in the paper and described in more detail in "Bundle Adjustment - A Modern Synthesis". Unfortunately, it appears that the OpenCV implementation of the 5-point algorithm does not use any iterative refinement methods.
Even if your algorithm is correct, 8 point F matrix computation is very error prone due to image noise. The lesser correspondences you use the better. The best you can do is doing 5 point Essential (E) matrix computation, but that would require you to pre-calibrate the camera and convert the detected pixel image points after SIFT/SURF to normalized pixels (metric pixel locations). Then apply Nister's 5-point algorithm either from the freely available Matlab implementation or from Bundler (c++ implementation by Noah Snavely). In my experience with SfM, 5-point E matrix is much much better/stable than 7 or 8 point F matrix computation. And ofcourse do RANSAC after 5 point to get more robust estimates. Hope this helps.
The 8-point algorithm is the simplest method of computing fundamental matrix, but if care is taken you can perform it well. The key to obtain the good results is proper careful normalization of the input data before constructing the equations to solve. Many of algorithms can do it.
Pixels point coordinate must be changed to camera coordinates, I don't see that you are doing these. As I understand, your
vector<int> pointIndexes1; is expressed in the pixel coordinates.
You must known the intrinsic camera parameters, if you want get more stable results. You may find them by many methods: tutorial openCV. Then you have two options of normalize it. You may apply for your fundamental matrix,
Mat E = K.t() * F * K; where K is Intrinsic Camera Parameters.[see on Wiki]
However this assumption is not accurate. If camera calibration matrix K is known, then you may apply inverse to the point x to obtain the point expressed in camera normalized coordinates.
pointNormalize1= K.inv()*pointIndexes1 where pointIndexes1(2), z is equal 1.
In the case of the 8PA, a simple transformation of points improve and hence in the stability of the results. The suggested normalization is a translation and scaling of each image so that the centroid of the reference points is at origin of the coordinates and the RMS distance of the points from the origin is equal to ![sqrt{2}]. Note that it is recommended that the singularity condition should be enforced before denormalization.
Reference: check it if : you are still interested
Related
Im beginning to learn computer vision and I'm confused on the difference between the two.
I know that the 8 point algorithm is used to compute the fundamental matrix and the 5 point algorithm is used to compute the essential matrix. Both of which can be used to determine the relative camera pose.
I also found that the relative camera pose can be determined using ransac with homography https://inspirit.github.io/jsfeat/#multiview in the ransac method
Is there a difference between using ransac with homography as opposed to using the algorithms?
First of all, note that you still need RANSAC with the 8-point or 5-point algorithms, since in practice outliers are to be expected in the matching process.
I think the main downside of pose from homography is that the point matches you use need to be coplanar. Additionaly, if I'm not mistaken, in a scene with more than one plane, you might get different homographies depending on which planes you select in the scene. That is why applying a homography to correct perspective adds distortion to some other parts of the image (see the example in this video). So in complex scenes (e.g. urban environements) where matching is more difficult, I'd use one of the 8-point or the 5-point algorithms.
Note that you can also recover the relative pose directly (up to scale, obviously), and compute the essential from that (see this paper). It's easier than computing the fundamental/essential and then extracting relative pose.
I use opencv with cpp.
I have std::vector<std::pair<cv::Point2d, cv::Point2d> > wich represent a warp.
For each point of an image A, i associate a point of an image B.
I don't know all association between points of image A and points of image B. The points of image A are on a sparse matrix. These data have also probably epsilon error.
So I would like interpolate.
In opencv I don't found a function which do simply an interpolation.
How do this ?
I found the function cv::warpPoint but I don't know the cv::Mat Camera intrinsic parameters nor cv::Mat Camera rotation matrix.
How compute these matrix from my data ?
I think the best way is piecewise affine warper:
https://code.google.com/p/imgwarp-opencv/
I have my own fast implementation, but comments are in russian, you can find it here.
So there are 2 questions:
how to warp the points from one image to the other.
Try cv::remap to do that, once you have dense (interpolated) description. See http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/remap/remap.html for example.
How to compute non-given point pairs by interpolation.
I don't have a solution for this, but some ideas:
don't use point pairs but displacement vectors. displacement might be easier to interpolate.
use inverse formulation to get a dense description of the second image (otherwise there might be pixel that aren't touched at all
But I guess the "real" method to do this would be some kind of spline interpolation.
I am working on image processing with OPENCV.
I want to find the x,y and the rotational displacement between two images in OPENCV.
I have found the features of the images using SURF and the features have been matched.
Now i want to find the displacement between the images. How do I do that? Can RANSAC be useful here?
regards,
shiksha
Rotation and two translations are three unknowns so your min number of matches is two (since each match delivers two equations or constraints). Indeed imagine a line segment between two points in one image and the corresponding (matched) line segment in another image. The difference between segments' orientations gives you a rotation angle. After you rotated just use any of the matched points to find translation. Thus this is 3DOF problem that requires two points. It is called Euclidean transformation or rigid body transformation or orthogonal Procrustes.
Using Homography (that is 8DOF problem ) that has no close form solution and relies on non-linear optimization is a bad idea. It is slow (in RANSAC case) and inaccurate since it adds 5 extra DOF. RANSAC is only needed if you have outliers. In the case of pure noise and overdetrmined system (more than 2 points) your optimal solution that minimizes the sum of squares of geometric distance between matched points is given in a close form by:
Problem statement: min([R*P+t-Q]2), R-rotation, t-translation
Solution: R = VUT, t = R*Pmean-Qmean
where X=P-Pmean; Y=Q-Qmean and we take SVD to get X*YT=ULVT; all matrices have data points as columns. For a gentle intro into rigid transformations see this
I am trying to find a way to parametrize the precision of my homography calculation. I would like to obtain a value that describes the precision of the homography calculation for a measurement taken at a certain position.
I currently have succesfully calculated the homography (with cv::findHomography) and I can use it to map a point on my camera image onto a 2D map (using cv::perspectiveTransform). Now I want to track these objects on my 2D map and to do this I want to take in account that objects that are in the back of my camera image have a less precise position on my 2D map than the objects that are all the way in the front.
I have looked at the following example on this website that mentions plane fitting but I don't really understand how to fill the matrices correctly using this method. The visualisation of the result does seem to fit my needs. Is there any way to do this with standard OpenCV functions?
EDIT:
Thanks Francesco for your recommendations. But, I think I am looking for something different than your answer. I am not looking to test the precision of the homography itself, but the relation between the density of measurements in one real camera view and the actual size on a map I create. I want to know that when I am 1 pixel off on my detection in the camera image, how many meters this will be on my map at this point.
I can of course calculate by taking some pixels around my measurement on my camera image and then use the homography to see how many meters on my map this represent every time I do a homography, but I don't want to calculate this every time. What I would like is to have a formula that tells me the relation between pixels in my image and pixels on my map so I can take this in account for my tracking on the map.
What you are looking for is called "predictive error bars" or "prediction uncertainty". You should definitely consult a good introductory book on estimation theory for details (e.g. this one). But briefly, the predictive uncertainty is the probability that...
A certain pixel p in image 1 will is the mapping H(p') of a pixel p' in image 2 under the homography H...
Given the uncertainty in H which is due to the errors in the matched pairs (q0, q0'), (q1, q1'), ..., that have been used to estimate H, ...
But assuming the model is correct, that is, that the true map between images 1 and 2 is, in fact, a homography (although the estimated parameters of the homography itself may be affected by errors).
In order to estimate this probability distribution you'll need a model for the errors in the measurements, and a model for how they propagate through the (homography) model.
I have a program in which I am querying all points included in a sphere S of radius R. The points are 3D points actually aligned on the vertices of a 3D regular grid, but I don't think this detail is relevant to the question. The center of the lookup volume (the sphere) can be anywhere in 3D pace.
The points hold some data (say a real). My question, is how can I interpolate/filter the data held by the points which are included in the sphere using a 3D filter (such as gaussian filter for instance). My understanding is that you need to do something like this (pseudo code):
interp_data = 0;
for (each point contained in the lookup sphere S of radius R)
// compute square distance from point location to sphere centre
dist2 = distance2(sphere_center, curr_point_loc);
// compute gaussian weight
w = exp(-100 * dist2);
sumWeight += w;
interp_data += curr_point_data * w;
interp_data /= sumWeight;
Is it correct. I have seen some code using a similar technique. I understand the value 100 on the exp function relates somehow to what seems to be called the standard normal deviation. The value 100 was hard coded in the source code that I have seen, but I assume this should somehow relate to the radius of the sphere? As the weight of the gaussian filter is supposed to drop to 0 when dist2 = R^2.
If someone could shed some light on this it would be great.
Also is it actually the best way of filtering 3D data? Is there a better/faster/more reliable method?
Thanks a lot for your help.
Your proposal is mostly reasonable, though probably not efficient. (Also, why distance squared and not just distance?)
You can perform a 3D Gaussian more efficiently by doing the following things:
1) Separate out the kernel into 3 1-dimensional passes with a 1D Gaussian kernel. This is explained on the Gaussian blur wikipedia page
2) You can approximate a Gaussian kernel by doing a box-blur several times in a row, and which you can implement by using summed area tables
3) You could also use a fast fourier transform and do the convolution by multiplying the image by the kernel in frequency space instead.