I have a vector of 2-D points, I am trying to use the meanshift algorithm to detect multiple modes in the data but am a bit confused by the method signature.
1) Can I pass in my vector (if so in what form) or must I conver to cv::Mat (if so how? if I have points with negative values).
2) How do I extract the multiple modes, from what I can see the function only returns an int
Thanks
OpenCV's implementation of mean shift is for tracking a single object (as part of the CamShift algorithm) and therefore I don't believe it has been extended to track multiple objects using multi-modal distributions. It will give you a bounding box centered on the mode of a probability image (returned by the reference pass of cv::Rect window).
Is your data represented as a mixture of Gaussians (or some other symmetric distribution)? If so you might be able to use k-means clustering to find the means of your distribution (which will be the mode for a symmetric distribution), although choosing k will be problematic.
Alternatively, a hack that might enable tracking of multiple objects (or finding multiple modes) could involve repeated calling this function, retrieving the mode and then zeroing this section from the back projected histogram.
As for your data's form, the function input is through a cv::Mat so you will have to convert your data. However, you claim to have negative values and this opencv function expects a probability histogram (which typically you calculate from an image using cv::calcBackProject()) so I expect it will complain if you try to pass it a cv::Mat containing negative values.
Related
I have an application where I have to detect the presence of some items in a scene. The items can be rotated and a little scaled (bigger or smaller). I've tried using keypoint detectors but they're not fast and accurate enough. So I've decided to first detect edges in the template and the search area, using Canny ( or a faster edge detection algo ), and then match the edges to find the position, orientation, and size of the match found.
All this needs to be done in less than a second.
I've tried using matchTemplate(), and matchShape() but the former is NOT scale and rotation invariant, and the latter doesn't work well with the actual images. Rotating the template image in order to match is also time consuming.
So far I have been able to detect the edges of the template but I don't know how to match them with the scene.
I've already gone through the following but wasn't able to get them to work (they're either using old version of OpenCV, or just not working with other images apart from those in the demo):
https://www.codeproject.com/Articles/99457/Edge-Based-Template-Matching
Angle and Scale Invariant template matching using OpenCV
https://answers.opencv.org/question/69738/object-detection-kinect-depth-images/
Can someone please suggest me an approach for this? Or a code snipped for the same if possible ?
This is my sample input image ( the parts to detect are marked in red )
These are some software that are doing this and also how I want it should be:
This topic is what I am actually dealing for a year on a project. So I will try to explain what my approach is and how I am doing that. I assume that you already did the preprocess steps(filters,brightness,exposure,calibration etc). And be sure you clean the noises on image.
Note: In my approach, I am collecting data from contours on a reference image which is my desired object. Then I am comparing these data with the other contours on the big image.
Use canny edge detection and find the contours on reference
image. You need to be sure here about that it shouldn't miss some parts of
contours. If it misses, probably preprocess part should have some
problems. The other important point is that you need to find an
appropriate mode of findContours because every modes have
different properties so you need to find an appropriate one for your
case. At the end you need to eliminate the contours which are okey
for you.
After getting contours from reference, you can find the length of
every contours using outputArray of findContours(). You can compare
these values on your big image and eliminate the contours which are
so different.
minAreaRect precisely draws a fitted, enclosing rectangle for
each contour. In my case, this function is very good to use. I am
getting 2 parameters using this function:
a) Calculate the short and long edge of fitted rectangle and compare the
values with the other contours on the big image.
b) Calculate the percentage of blackness or whiteness(if your image is
grayscale, get a percentage how many pixel close to white or black) and
compare at the end.
matchShape can be applied at the end to the rest of contours or you can also apply to all contours(I suggest first approach). Each contour is just an array so you can hold the reference contours in an array and compare them with the others at the end. After doing 3 steps and then applying matchShape is very good on my side.
I think matchTemplate is not good to use directly. I am drawing every contour to a different mat zero image(blank black surface) as a template image and then I compare with the others. Using a reference template image directly doesnt give good results.
OpenCV have some good algorithms about finding circles,convexity etc. If your situations are related with them, you can also use them as a step.
At the end, you just get the all data,values, and you can make a table in your mind. The rest is kind of statistical analysis.
Note: I think the most important part is preprocess part. So be sure about that you have a clean almost noiseless image and reference.
Note: Training can be a good solution for your case if you just want to know the objects exist or not. But if you are trying to do something for an industrial application, this is totally wrong way. I tried YOLO and haarcascade training algorithms several times and also trained some objects with them. The experiences which I get is that: they can find objects almost correctly but the center coordinates, rotation results etc. will not be totally correct even if your calibration is correct. On the other hand, training time and collecting data is painful.
You have rather bad image quality very bad light conditions, so you have only two ways:
1. To use filters -> binary threshold -> find_contours -> matchShape. But this very unstable algorithm for your object type and image quality. You will get a lot of wrong contours and its hard to filter them.
2. Haarcascades -> cut bounding box -> check the shape inside
All "special points/edge matching " algorithms will not work in such bad conditions.
I have created a point cloud of an irregular (non-planar) complex object using SfM. Each one of those 3D points was viewed in more than one image, so it has multiple (SIFT) features associated with it.
Now, I want to solve for the pose of this object in a new, different set of images using a PnP algorithm matching the features detected in the new images with the features associated with the 3D points in the point cloud.
So my question is: which descriptor do I associate with the 3D point to get the best results?
So far I've come up with a number of possible solutions...
Average all of the descriptors associated with the 3D point (taken from the SfM pipeline) and use that "mean descriptor" to do the matching in PnP. This approach seems a bit far-fetched to me - I don't know enough about feature descriptors (specifically SIFT) to comment on the merits and downfalls of this approach.
"Pin" all of the descriptors calculated during the SfM pipeline to their associated 3D point. During PnP, you would essentially have duplicate points to match with (one duplicate for each descriptor). This is obviously intensive.
Find the "central" viewpoint that the feature appears in (from the SfM pipeline) and use the descriptor from this view for PnP matching. So if the feature appears in images taken at -30, 10, and 40 degrees ( from surface normal), use the descriptor from the 10 degree image. This, to me, seems like the most promising solution.
Is there a standard way of doing this? I haven't been able to find any research or advice online regarding this question, so I'm really just curious if there is a best solution, or if it is dependent on the object/situation.
The descriptors that are used for matching in most SLAM or SFM systems are rotation and scale invariant (and to some extent, robust to intensity changes). That is why we are able to match them from different view points in the first place. So, in general it doesn't make much sense to try to use them all, average them, or use the ones from a particular image. If the matching in your SFM was done correctly, the descriptors of the reprojection of a 3d point from your point cloud in any of its observations should be very close, so you can use any of them 1.
Also, it seems to me that you are trying to directly match the 2d points to the 3d points. From a computational point of view, I think this is not a very good idea, because by matching 2d points with 3d ones, you lose the spatial information of the images and have to search for matches in a brute force manner. This in turn can introduce noise. But, if you do your matching from image to image and then propagate the results to the 3d points, you will be able to enforce priors (if you roughly know where you are, i.e. from an IMU, or if you know that your images are close), you can determine the neighborhood where you look for matches in your images, etc. Additionally, once you have computed your pose and refined it, you will need to add more points, no? How will you do it if you haven't done any 2d/2d matching, but just 2d/3d matching?
Now, the way to implement that usually depends on your application (how much covisibility or baseline you have between the poses from you SFM, etc). As an example, let's note your candidate image I_0, and let's note the images from your SFM I_1, ..., I_n. First, match between I_0 and I_1. Now, assume q_0 is a 2d point from I_0 that has successfully been matched to q_1 from I_1, which corresponds to some 3d point Q. Now, to ensure consistency, consider the reprojection of Q in I_2, and call it q_2. Match I_0 and I_2. Does the point to which q_0 is match in I_2 fall close to q_2? If yes, keep the 2d/3d match between q_0 and Q, and so on.
I don't have enough information about your data and your application, but I think that depending on your constraints (real-time or not, etc), you could come up with some variation of the above. The key idea anyway is, as I said previously, to try to match from frame to frame and then propagate to the 3d case.
Edit: Thank you for your clarifications in the comments. Here are a few thoughts (feel free to correct me):
Let's consider a SIFT descriptor s_0 from I_0, and let's note F(s_1,...,s_n) your aggregated descriptor (that can be an average or a concatenation of the SIFT descriptors s_i in their corresponding I_i, etc). Then when matching s_0 with F, you will only want to use a subset of the s_i that belong to images that have close viewpoints to I_0 (because of the 30deg problem that you mention, although I think it should be 50deg). That means that you have to attribute a weight to each s_i that depends on the pose of your query I_0. You obviously can't do that when constructing F, so you have to do it when matching. However, you don't have a strong prior on the pose (otherwise, I assume you wouldn't be needing PnP). As a result, you can't really determine this weight. Therefore I think there are two conclusions/options here:
SIFT descriptors are not adapted to the task. You can try coming up with a perspective-invariant descriptor. There is some literature on the subject.
Try to keep some visual information in the form of "Key-frames", as in many SLAM systems. It wouldn't make sense to keep all of your images anyway, just keep a few that are well distributed (pose-wise) in each area, and use those to propagate 2d matches to the 3d case.
If you only match between the 2d point of your query and 3d descriptors without any form of consistency check (as the one I proposed earlier), you will introduce a lot of noise...
tl;dr I would keep some images.
1 Since you say that you obtain your 3d reconstruction from an SFM pipline, some of them are probably considered inliers and some are outliers (indicated by a boolean flag). If they are outliers, just ignore them, if they are inliers, then they are the result of matching and triangulation, and their position has been refined multiple times, so you can trust any of their descriptors.
I'm looking for a fast way to compare a frame with a running average, and determine the difference between them (in terms of giving a high value if they're very similar, and a lower value if they're not that similar). I need to compare the entire frame, not just a smaller region.
I'm already using Otsu thresholding on the images to filter out the background (not interested in the background, nor the features of the foreground - just need shapes). Is there a nice, fast way to do what I want?
The classic method for this is Normalized Cross Correlation (try cv::matchTemplate()). You will need to set a treshold to decided if images are a match. Also you can use the output (which is thresholded) to compare several images.
In OpenCV, this method in matchTemplate is explained here, and the parameter you need to pass to the function.
Is there a feature extraction method that is scale invariant but not rotation invariant? I would like to match similar images that have been scaled but not rotated...
EDIT: Let me rephrase. How can I check whether an image is a scaled version (or close to) of an original?
Histogram and Gauss Pyramids are used for extract scale invariant features.
How can I check whether an image is a scaled version (or close to) of an original?
It's puzzle for me. Does you mean given two images, one is original and another is scaled? Or is one original and another is a fragment from the original but scaled and you want to locate the fragment in the original one?
[updated]
Given two images, a and b,
Detected their SIFT or SURF feature points and descriptions.
Get the corresponding regions between a and b. If none, return false. Refer to Pattern Matching - Find reference object in second image and Trying to match two images using sift in OpenCv, but too many matches. Name the region in a as Ra, and one in b as Rb.
Using algorithm like template match to determine if Ra is identical enough to Rb. If yes, calculate the scale ratio.
What does cv_haar_scale_image do in opencv's function cvhaardetectobjects?
It enables more optimization.
The face detect implementation is optimized for CV_HAAR_SCALE_IMAGE more than CV_HAAR_DO_CANNY_PRUNING.
Because CV_HAAR_SCALE_IMAGE method is more DMA (direct memory access) friendly. Default method (CV_HAAR_DO_CANNY_PRUNING) implementation needs random access to main memory area widely.
The flag CV_HAAR_SCALE_IMAGE, tells the algorithm to scale the image rather than the detector.
There is an example of its use here: Face detection: How to find faces with openCV
According to EMGU, which is an .NET wrapper for OpenCV, and sometimes has way better documentation than OpenCV,
DO_CANNY_PRUNING
If it is set, the function uses Canny edge detector
to reject some image regions that contain too few or too much edges
and thus can not contain the searched object. The particular threshold
values are tuned for face detection and in this case the pruning
speeds up the processing
SCALE_IMAGE
For each scale factor used the function will downscale
the image rather than "zoom" the feature coordinates in the classifier
cascade. Currently, the option can only be used alone, i.e. the flag
can not be set together with the others
FIND_BIGGEST_OBJECT
If it is set, the function finds the largest
object (if any) in the image. That is, the output sequence will
contain one (or zero) element(s)
DO_ROUGH_SEARCH
It should be used only when
CV_HAAR_FIND_BIGGEST_OBJECT is set and min_neighbors > 0. If the flag
is set, the function does not look for candidates of a smaller size as
soon as it has found the object (with enough neighbor candidates) at
the current scale. Typically, when min_neighbors is fixed, the mode
yields less accurate (a bit larger) object rectangle than the regular
single-object mode (flags=CV_HAAR_FIND_BIGGEST_OBJECT), but it is much
faster, up to an order of magnitude. A greater value of min_neighbors may be specified to improve the accuracy.
Source
CV_HAAR_DO_CANNY_PRUNING causes flat regions that have no lines to be skipped by the classifier