I have a program in which I am querying all points included in a sphere S of radius R. The points are 3D points actually aligned on the vertices of a 3D regular grid, but I don't think this detail is relevant to the question. The center of the lookup volume (the sphere) can be anywhere in 3D pace.
The points hold some data (say a real). My question, is how can I interpolate/filter the data held by the points which are included in the sphere using a 3D filter (such as gaussian filter for instance). My understanding is that you need to do something like this (pseudo code):
interp_data = 0;
for (each point contained in the lookup sphere S of radius R)
// compute square distance from point location to sphere centre
dist2 = distance2(sphere_center, curr_point_loc);
// compute gaussian weight
w = exp(-100 * dist2);
sumWeight += w;
interp_data += curr_point_data * w;
interp_data /= sumWeight;
Is it correct. I have seen some code using a similar technique. I understand the value 100 on the exp function relates somehow to what seems to be called the standard normal deviation. The value 100 was hard coded in the source code that I have seen, but I assume this should somehow relate to the radius of the sphere? As the weight of the gaussian filter is supposed to drop to 0 when dist2 = R^2.
If someone could shed some light on this it would be great.
Also is it actually the best way of filtering 3D data? Is there a better/faster/more reliable method?
Thanks a lot for your help.
Your proposal is mostly reasonable, though probably not efficient. (Also, why distance squared and not just distance?)
You can perform a 3D Gaussian more efficiently by doing the following things:
1) Separate out the kernel into 3 1-dimensional passes with a 1D Gaussian kernel. This is explained on the Gaussian blur wikipedia page
2) You can approximate a Gaussian kernel by doing a box-blur several times in a row, and which you can implement by using summed area tables
3) You could also use a fast fourier transform and do the convolution by multiplying the image by the kernel in frequency space instead.
Related
I'm working on a Project for my University where we want a Quadrcopter to stabilize himself with his camera. Unfortunately the Fundamental matrix reacts very sensible to little changes within the featurpoints, i'll give you examples later on.
I think my matching already works pretty good thanks to ocv.
I'm using SURF Features and match them with the knn-Method:
SurfFeatureDetector surf_detect;
surf_detect = SurfFeatureDetector(400);
//detect keypoints
surf_detect.detect(fr_one.img, fr_one.kp);
surf_detect.detect(fr_two.img, fr_two.kp);
//extract keypoints
SurfDescriptorExtractor surf_extract;
surf_extract.compute(fr_one.img, fr_one.kp, fr_one.descriptors);
surf_extract.compute(fr_two.img, fr_two.kp, fr_two.descriptors);
//match keypoints
vector<vector<DMatch> > matches1,matches2;
vector<DMatch> symMatches,goodMatches;
FlannBasedMatcher flann_match;
flann_match.knnMatch(fr_one.descriptors, fr_two.descriptors, matches1,2);
flann_match.knnMatch(fr_two.descriptors, fr_one.descriptors, matches2,2);
//test matches in both ways
symmetryTest(matches1,matches2,symMatches);
std::vector<cv::Point2f> points1, points2;
for (std::vector<cv::DMatch>::const_iterator it= symMatches.begin();
it!= symMatches.end(); ++it)
{
//left keypoints
float x= fr_one.kp[it->queryIdx].pt.x;
float y= fr_one.kp[it->queryIdx].pt.y;
points1.push_back(cv::Point2f(x,y));
//right keypoints
x = fr_two.kp[it->trainIdx].pt.x;
y = fr_two.kp[it->trainIdx].pt.y;
points2.push_back(cv::Point2f(x,y));
}
//kill outliers with ransac
vector<uchar> inliers(points1.size(),0);
findFundamentalMat(Mat(points1),Mat(points2),
inliers,CV_FM_RANSAC,3.f,0.99f);
std::vector<uchar>::const_iterator
itIn= inliers.begin();
std::vector<cv::DMatch>::const_iterator
itM= symMatches.begin();
for ( ;itIn!= inliers.end(); ++itIn, ++itM)
{
if (*itIn)
{
goodMatches.push_back(*itM);
}
}
Now i want to compute the Fundamental Matrix with these matches. I'm using the 8POINT method for this example - i already tried it with LMEDS and RANSAC - there it only get's worse because there are more matches which change.
vector<int> pointIndexes1;
vector<int> pointIndexes2;
for (vector<DMatch>::const_iterator it= goodMatches.begin();
it!= goodMatches.end(); ++it) {
pointIndexes1.push_back(it->queryIdx);
pointIndexes2.push_back(it->trainIdx);
}
vector<Point2f> selPoints1, selPoints2;
KeyPoint::convert(fr_one.kp,selPoints1,pointIndexes1);
KeyPoint::convert(fr_two.kp,selPoints2,pointIndexes2);
Mat F = findFundamentalMat(Mat(selPoints1),Mat(selPoints2),CV_FM_8POINT);
When i call these calculations within a loop on the same pair of images the result of F varies very much - theres no way to extract movement from such calculations.
I generated an example where i filtered out some matches so that you can see the effect i mentioned for yourselves.
http://abload.de/img/div_c_01ascel.png
http://abload.de/img/div_c_02zpflj.png
Is there something wrong with my code or do i have to think about other reasons like image-quality and so on ?
Thanks in advance for the Help !
derfreak
To summarize what others have already stated and elaborate in more detail,
As currently implemented in OpenCV, the 8-point algorithm has no outlier rejection. It is a least-squares algorithm and cannot be used with RANSAC or LMEDS because these flags override the 8-point flag. It is recommended that the input points are normalized to improve the condition number of the matrix in the linear equation, as stated in "In Defence of the 8-point Algorithm". However, the OpenCV implementation automatically normalizes the input points, so there is no need to normalize them manually.
The 5-point and 7-point algorithms both have outlier rejection, using RANSAC or LMEDS. If you are using RANSAC, you may need to tune the threshold to get good results. The OpenCV documentation shows that the default threshold for RANSAC is 1.0, which in my opinion is a bit large. I might recommend using something around 0.1 pixels. On the other hand, if you are using LMEDS you won't need to worry about the threshold, because LMEDS minimizes the median error instead of counting inliers. LMEDS and RANSAC both have similar accuracy if the correct threshold is used and both have comparable computation time.
The 5-point algorithm is more robust than the 7-point algorithm because it only has 5 degrees of freedom (3 rotation and 2 for the unit-vector translation) instead of 7 (the additional 2 parameters are for the camera principle points). This minimal parameterization allows the rotation and translation to be easily extracted from the matrix using SVD and avoids the planar structure degeneracy problem.
However, in order to get accurate results with the 5-point algorithm, the focal length must be known. The paper suggests that the focal length should be known within 10%, otherwise the 5-point algorithm is no better than the other uncalibrated algorithms. If you haven't performed camera calibration before, check out the OpenCV camera calibration tutorial. Also, if you are using ROS, there is a nice camera calibration package.
When using the OpenCV findEssentialMat function I recommend first passing the pixel points to undistortPoints. This not only reverses the effect of lens distortion, but also transforms the coordinates to normalized image coordinates. Normalized image coordinates (not to be confused with the normalization done in the 8-point algorithm) are camera agnostic coordinates that do not depend on any of the camera intrinsic parameters. They represent the angle of the bearing vector to the point in the real world. For example, a normalized image coordinate of (1, 0) would correspond to a bearing angle of 45 degrees from the optical axis of the camera in the x direction and 0 degrees in the y direction.
After using RANSAC to obtain a good hypothesis, the best estimate can be improved by using iterative robust non-linear least-squares. This is mentioned in the paper and described in more detail in "Bundle Adjustment - A Modern Synthesis". Unfortunately, it appears that the OpenCV implementation of the 5-point algorithm does not use any iterative refinement methods.
Even if your algorithm is correct, 8 point F matrix computation is very error prone due to image noise. The lesser correspondences you use the better. The best you can do is doing 5 point Essential (E) matrix computation, but that would require you to pre-calibrate the camera and convert the detected pixel image points after SIFT/SURF to normalized pixels (metric pixel locations). Then apply Nister's 5-point algorithm either from the freely available Matlab implementation or from Bundler (c++ implementation by Noah Snavely). In my experience with SfM, 5-point E matrix is much much better/stable than 7 or 8 point F matrix computation. And ofcourse do RANSAC after 5 point to get more robust estimates. Hope this helps.
The 8-point algorithm is the simplest method of computing fundamental matrix, but if care is taken you can perform it well. The key to obtain the good results is proper careful normalization of the input data before constructing the equations to solve. Many of algorithms can do it.
Pixels point coordinate must be changed to camera coordinates, I don't see that you are doing these. As I understand, your
vector<int> pointIndexes1; is expressed in the pixel coordinates.
You must known the intrinsic camera parameters, if you want get more stable results. You may find them by many methods: tutorial openCV. Then you have two options of normalize it. You may apply for your fundamental matrix,
Mat E = K.t() * F * K; where K is Intrinsic Camera Parameters.[see on Wiki]
However this assumption is not accurate. If camera calibration matrix K is known, then you may apply inverse to the point x to obtain the point expressed in camera normalized coordinates.
pointNormalize1= K.inv()*pointIndexes1 where pointIndexes1(2), z is equal 1.
In the case of the 8PA, a simple transformation of points improve and hence in the stability of the results. The suggested normalization is a translation and scaling of each image so that the centroid of the reference points is at origin of the coordinates and the RMS distance of the points from the origin is equal to ![sqrt{2}]. Note that it is recommended that the singularity condition should be enforced before denormalization.
Reference: check it if : you are still interested
I am trying to find a way to parametrize the precision of my homography calculation. I would like to obtain a value that describes the precision of the homography calculation for a measurement taken at a certain position.
I currently have succesfully calculated the homography (with cv::findHomography) and I can use it to map a point on my camera image onto a 2D map (using cv::perspectiveTransform). Now I want to track these objects on my 2D map and to do this I want to take in account that objects that are in the back of my camera image have a less precise position on my 2D map than the objects that are all the way in the front.
I have looked at the following example on this website that mentions plane fitting but I don't really understand how to fill the matrices correctly using this method. The visualisation of the result does seem to fit my needs. Is there any way to do this with standard OpenCV functions?
EDIT:
Thanks Francesco for your recommendations. But, I think I am looking for something different than your answer. I am not looking to test the precision of the homography itself, but the relation between the density of measurements in one real camera view and the actual size on a map I create. I want to know that when I am 1 pixel off on my detection in the camera image, how many meters this will be on my map at this point.
I can of course calculate by taking some pixels around my measurement on my camera image and then use the homography to see how many meters on my map this represent every time I do a homography, but I don't want to calculate this every time. What I would like is to have a formula that tells me the relation between pixels in my image and pixels on my map so I can take this in account for my tracking on the map.
What you are looking for is called "predictive error bars" or "prediction uncertainty". You should definitely consult a good introductory book on estimation theory for details (e.g. this one). But briefly, the predictive uncertainty is the probability that...
A certain pixel p in image 1 will is the mapping H(p') of a pixel p' in image 2 under the homography H...
Given the uncertainty in H which is due to the errors in the matched pairs (q0, q0'), (q1, q1'), ..., that have been used to estimate H, ...
But assuming the model is correct, that is, that the true map between images 1 and 2 is, in fact, a homography (although the estimated parameters of the homography itself may be affected by errors).
In order to estimate this probability distribution you'll need a model for the errors in the measurements, and a model for how they propagate through the (homography) model.
I'm currently researching a way to implement smoothing of bone-vertex weights (skin weights for joint deformations) and coming up empty on methods that use geodesic (surface) distances between vertices within a parametric distance set by the user.
So far, someone has mentioned the possible use of Dijkstra's Algorithm for getting approximate geodesic distances - but it has limitations over certain types of mesh topology.
The only paper that I found specifically on this issue (so-called "Bone-vertex weight smoothing") uses Laplacian Smoothing of weights on a skinned mesh, but it only considers the one-ring neighboring vertices to each vertex which does not satisfy my need to include vertices up to a distance (shortest geodesic distance):
L(Wi) = 1/m * Sum(j from 0 to m-1)(Wj - Wi)
where vertex i and j are considered with respect to vertex i, m is the number of neighbor vertices and W is the weight on the vertex.
What I am envisioning is a modified Laplacian Smoothing wherein all of the vertices found to be within the parametric distance are used but the distance needs to be a factor also. Maybe just multiply the weight influence by the parametric distance minus the distance between the current vertex and the one being used in the sum. Something like this, maybe:
Wmj = Wj * (maxDistance - Dji)
L(Wi) = 1/m * Sum(j from 0 to m-1)(Wmj - Wi)
so that the influence of the smoothing by Wj is reduced (falloff) by its vertex distance (Dji). Of course, vertices at maxDistance will have no influence and might need to be ignored as part of m.
Would this work?
The first thought that came to my mind was projection. Start by getting the line representing euclidean distance between your start point and end point (going through the mesh). Then project that onto the mesh. But I realized that won't work in certain situations. For the benefit of others, one such situation is if the start point is one one side of a deep pit, and the target is on the opposite side, the shortest distance would be around the rim, rather than straight through. This still may be adequate for you, depending on the types of meshes you are working with, so I can elaborate a more complete approach along these lines if this is good enough for you.
So then my thoughts were to subdivide and then use search. I would use adaptive subdivision, i.e. split edges until all edges are less than some threshold. From that point you can use Dijkstra's, or A* or any other number of search methods. This gets around the problem of skinny triangles, because edges will be subdivided until they are small, so there will be no long, skinny edges.
I am attempting to convert an image in polar coordinates (axes are angle x radius) to an image in cartesian coordinates (axes are x and y).
This is simple enough in matlab using pcolor() but the issue is that I must do this in a mex file (c++ interface to Matlab). This seem's easy enough except that Matlab ONLY uses array containers so I can't think of a clever or eloquent way of doing this.
I do have access to the image dimensions and I can imagine a very messy way of repackaging the input image array as a matrix in C++ and carying out the conversion but this would be messy and problematic.
Also, I need to be able to interpolate gaps between points in the xy plain.
Any ideas?
This is reasonably standard in image processing, particularly in registration. However, it takes some thought and isn't "obvious". It wasn't obvious to me the first time either.
I'm assuming you have two images, in different "domains", in your case a source image in polar coordinates and a target image in Cartesian coordinates. I'm assuming you know the region in the target image you want to populate.
The commonly known best thing to do in image processing is to loop over coordinates in the known area of the target image that you want to populate. For each of these positions (x,y), you'll have some conversion to polar. It's probably r = sqrt(x*x+y*y) and theta = atan2(y,x) or something like that. Then you sample from that position in the polar coordinate position with interpolation.
Among choices of interpolation are:
Nearest neighbor - you just round to the nearest r and theta and choose the value of that.
Bilinear -
Bi-cubic
...
Of course you should take care of boundary conditions and what happens if your r and theta go out of your image.
This procedure also is similar (looping over the target image and sampling from the source image, and doing lookups based on the reverse transform) for all kinds of coordinates transformations. The nice thing is that you don't leave holes where your source imagine is relevant.
Hope this helps with the image part.
As for the mex part, here's some links:
Mex tutorial
Mex tutorial
Can you be more specific about what you need about the mex part?
I have a video file recorded from the front of a moving vehicle. I am going to use OpenCV for object detection and recognition but I'm stuck on one aspect. How can I determine the distance from a recognized object.
I can know my current speed and real-world GPS position but that is all. I can't make any assumptions about the object I'm tracking. I am planning to use this to track and follow objects without colliding with them. Ideally I would like to use this data to derive the object's real-world position, which I could do if I could determine the distance from the camera to the object.
Your problem's quite standard in the field.
Firstly,
you need to calibrate your camera. This can be done offline (makes life much simpler) or online through self-calibration.
Calibrate it offline - please.
Secondly,
Once you have the calibration matrix of the camera K, determine the projection matrix of the camera in a successive scene (you need to use parallax as mentioned by others). This is described well in this OpenCV tutorial.
You'll have to use the GPS information to find the relative orientation between the cameras in the successive scenes (that might be problematic due to noise inherent in most GPS units), i.e. the R and t mentioned in the tutorial or the rotation and translation between the two cameras.
Once you've resolved all that, you'll have two projection matrices --- representations of the cameras at those successive scenes. Using one of these so-called camera matrices, you can "project" a 3D point M on the scene to the 2D image of the camera on to pixel coordinate m (as in the tutorial).
We will use this to triangulate the real 3D point from 2D points found in your video.
Thirdly,
use an interest point detector to track the same point in your video which lies on the object of interest. There are several detectors available, I recommend SURF since you have OpenCV which also has several other detectors like Shi-Tomasi corners, Harris, etc.
Fourthly,
Once you've tracked points of your object across the sequence and obtained the corresponding 2D pixel coordinates you must triangulate for the best fitting 3D point given your projection matrix and 2D points.
The above image nicely captures the uncertainty and how a best fitting 3D point is computed. Of course in your case, the cameras are probably in front of each other!
Finally,
Once you've obtained the 3D points on the object, you can easily compute the Euclidean distance between the camera center (which is the origin in most cases) and the point.
Note
This is obviously not easy stuff but it's not that hard either. I recommend Hartley and Zisserman's excellent book Multiple View Geometry which has described everything above in explicit detail with MATLAB code to boot.
Have fun and keep asking questions!
When you have moving video, you can use temporal parallax to determine the relative distance of objects. Parallax: (definition).
The effect would be the same we get with our eyes which which can gain depth perception by looking at the same object from slightly different angles. Since you are moving, you can use two successive video frames to get your slightly different angle.
Using parallax calculations, you can determine the relative size and distance of objects (relative to one another). But, if you want the absolute size and distance, you will need a known point of reference.
You will also need to know the speed and direction being traveled (as well as the video frame rate) in order to do the calculations. You might be able to derive the speed of the vehicle using the visual data but that adds another dimension of complexity.
The technology already exists. Satellites determine topographic prominence (height) by comparing multiple images taken over a short period of time. We use parallax to determine the distance of stars by taking photos of night sky at different points in earth's orbit around the sun. I was able to create 3-D images out of an airplane window by taking two photographs within short succession.
The exact technology and calculations (even if I knew them off the top of my head) are way outside the scope of discussing here. If I can find a decent reference, I will post it here.
You need to identify the same points in the same object on two different frames taken a known distance apart. Since you know the location of the camera in each frame, you have a baseline ( the vector between the two camera positions. Construct a triangle from the known baseline and the angles to the identified points. Trigonometry gives you the length of the unknown sides of the traingles for the known length of the baseline and the known angles between the baseline and the unknown sides.
You can use two cameras, or one camera taking successive shots. So, if your vehicle is moving a 1 m/s and you take fames every second, then successibe frames will gibe you a 1m baseline which should be good to measure the distance of objects up to, say, 5m away. If you need to range objects further away than the frames used need to be further apart - however more distant objects will in view for longer.
Observer at F1 sees target at T with angle a1 to velocity vector. Observer moves distance b to F2. Sees target at T with angle a2.
Required to find r1, range from target at F1
The trigonometric identity for cosine gives
Cos( 90 – a1 ) = x / r1 = c1
Cos( 90 - a2 ) = x / r2 = c2
Cos( a1 ) = (b + z) / r1 = c3
Cos( a2 ) = z / r2 = c4
x is distance to target orthogonal to observer’s velocity vector
z is distance from F2 to intersection with x
Solving for r1
r1 = b / ( c3 – c1 . c4 / c2 )
Two cameras so you can detect parallax. It's what humans do.
edit
Please see ravenspoint's answer for more detail. Also, keep in mind that a single camera with a splitter would probably suffice.
use stereo disparity maps. lots of implementations are afloat, here are some links:
http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/OWENS/LECT11/node4.html
http://www.ece.ucsb.edu/~manj/ece181bS04/L14(morestereo).pdf
In you case you don't have stereo camera, but depth can be evaluated using video
http://www.springerlink.com/content/g0n11713444148l2/
I think the above will be what might help you the most.
research has progressed so far that depth can be evaluated ( though not to a satisfactory extend) from a single monocular image
http://www.cs.cornell.edu/~asaxena/learningdepth/
Someone please correct me if I'm wrong, but it seems to me that if you're going to simply use a single camera and simply relying on a software solution, any processing you might do would be prone to false positives. I highly doubt that there is any processing that could tell the difference between objects that really are at the perceived distance and those which only appear to be at that distance (like the "forced perspective") in movies.
Any chance you could add an ultrasonic sensor?
first, you should calibrate your camera so you can get the relation between the objects positions in the camera plan and their positions in the real world plan, if you are using a single camera you can use the "optical flow technic"
if you are using two cameras you can use the triangulation method to find the real position (it will be easy to find the distance of the objects) but the probem with the second method is the matching, which means how can you find the position of an object 'x' in camera 2 if you already know its position in camera 1, and here you can use the 'SIFT' algorithme.
i just gave you some keywords wish it could help you.
Put and object of known size in the cameras field of view. That way you can have a more objective metric to measure angular distances. Without a second viewpoint/camera you'll be limited to estimating size/distance but at least it won't be a complete guess.