didnt find anything on internet. There have been some papers around recently about
clustering feature space descriptors (such from SIFT/SURF) using the Mean Shift algo.
Does anybody have any links or any code/library/tip to actually cluster SURF descriptors? (Matlab/C++)
I've already tried to use the 1D Mean-Shift (which perfectly works on the locations of the points) and also some other mean shifts which were avaiable...though all seem to have problems with higher dimensions.
Thanks in advance!
Why are you using a 1D classification algorithm with a high-dimensional dataset? Mean-shift segmentation is an unsupervised classification task while SIFT and SURF are used to find keypoints in an image. There is only one mean-shift. There are other alterntives such as CAMshift but are mostly independent of mean-shift. SURF and mean-shift are independent algorithms. Therefores you will find no implementation with dependencies unless it is tailored for a specific application.
Thereforemroe SIFT commonly employs a 128-dimensional EoH-based descriptor (similar dimensionality to the extended SURF descriptor) for a given keypoint. If you are going to account for the local position of each pixel (x,y) you will have a 130 dimensional feature space, not 1D.
If you wish to categorise the edge information in an image, you should first localise the keypoints in an image using SIFT or SURF. Then use a concatenated vector of the EoH and pixel position as the input to the segmentation algorithm. If you search on google or mathworks functions for an N-dimensional mean-shift algorithm you would have found one. Its the same process for a 1D dataset so has no gain being hard coded for a 1D user case. You would have also found that MATLAB's image toolbox already contains a SURF implementation.
Mean-Shift: http://www.mathworks.co.uk/matlabcentral/fileexchange/10161-mean-shift-clustering
SURF: http://www.mathworks.co.uk/help/vision/examples/object-detection-in-a-cluttered-scene-using-point-feature-matching.html
The C++ and MATLAB SIFT implementations are referenced on the original paper and it's site (A. Vedaldi, "An implementation of SIFT detector and descriptor", 2004).
SIFT: http://www.robots.ox.ac.uk/~vedaldi/code/sift.html
Original SURF paper: http://www.vision.ee.ethz.ch/~surf/eccv06.pdf
Original SIFT paper: http://www.robots.ox.ac.uk/~vedaldi/assets/sift/sift.pdf
Related
I want to ask two questions about dense sift(dsift) and vlfeat:
Any material that details dsift? I have seen many that said "dense SIFT is the SIFT's application to dense grids". But what does this mean? Can it be described in a more detailed manner? I read the source code dsift.c and dsift.h in vlfeat and the technique details about dsift. But there are many things I cannot understand. Existing papers usually focus on the application of dsift.
I use vlfeat in my C program and it works fine. But when I custom the parameters with vl_dsift_set_geometry, it goes wrong. Because I do not know how dsift works, I do not know how to set binSizeX/Y and numBinX/Y properly. I read in a paper "patch size 76". Does patch refer to a 4*4 grid? I somewhat got confused by the terms bin, patch and grid. Well, my question is, with patch size being 76, how to set binSizeX/Y and numBinX/Y?(image size 256*256)?
In SIFT, the first step is to detect key points. Key points detection is performed at multiple scale.
The next step is to describe the key point to generate the descriptor.
The distribution of the key points over the image is not uniform, depending on the detected key points.
In dense sift features, there is no key points detection, based on a grid at specific points, sift features will be detected at specific scale. This is not useful if you are matching objects that may appear at different scales.
There is the phow version which is a combination between dense sift and sift. Instead of detecting sift at pre-specified locations and pre-specified scales, sift features are detected at pre-specified locations but different scales. In phow, all sift features detected at the same point ( different scales) will be combined together to construct a single feature at the location
I need to implement a simple image search in my app using TensorFlow.
The requirements are these:
The dataset contains around a million images, all of the same size, each containing one unique object and only that object.
The search parameter is an image taken with a phone camera of some object that is potentially in the dataset.
I've managed to extract the image from the camera picture and straighten it to rectangular form and as a result, a reverse-search image indexer like TinEye was able to find a match.
Now I want to reproduce that indexer by using TensorFlow to create a model based on my data-set (make each image's file name a unique index).
Could anyone point me to tutorials/code that would explain how to achieve such thing without diving too much into computer vision terminology?
Much appreciated!
The Wikipedia article on TinEye says that Perceptual Hashing will yield results similar to TinEye's. They reference this detailed description of the algorithm. But TinEye refuses to comment.
The biggest issue with the Perceptual Hashing approach is that while it's efficient for identifying the same image (subject to skews, contrast changes, etc.), it's not great at identifying a completely different image of the same object (e.g. the front of a car vs. the side of a car).
TensorFlow has great support for deep neural nets which might give you better results. Here's a high level description of how you might use a deep neural net in TensorFlow to solve this problem:
Start with a pre-trained NN (such as GoogLeNet) or train one yourself on a dataset like ImageNet. Now we're given a new picture we're trying to identify. Feed that into the NN. Look at the activations of a fairly deep layer in the NN. This vector of activations is like a 'fingerprint' for the image. Find the picture in your database with the closest fingerprint. If it's sufficiently close, it's probably the same object.
The intuition behind this approach is that unlike Perceptual Hashing, the NN is building up a high-level representation of the image including identifying edges, shapes, and important colors. For example, the fingerprint of an apple might include information about its circular shape, red color, and even its small stem.
You could also try something like this 2012 paper on image retrieval which uses a slew of hand-picked features such as SIFT, regional color moments and object contour fragments. This is probably a lot more work and it's not what TensorFlow is best at.
UPDATE
OP has provided an example pair of images from his application:
Here are the results of using the demo on the pHash.org website on that pair of similar images as well as on a pair of completely dissimilar images.
Comparing the two images provided by the OP:
RADISH (radial hash): pHash determined your images are not similar with PCC = 0.518013
DCT hash: pHash determined your images are not similar with hamming distance = 32.000000.
Marr/Mexican hat wavelet: pHash determined your images are not similar with normalized hamming distance = 0.480903.
Comparing one of his images with a random image from my machine:
RADISH (radial hash): pHash determined your images are not similar with PCC = 0.690619.
DCT hash: pHash determined your images are not similar with hamming distance = 27.000000.
Marr/Mexican hat wavelet: pHash determined your images are not similar with normalized hamming distance = 0.519097.
Conclusion
We'll have to test more images to really know. But so far pHash does not seem to be doing very well. With the default thresholds it doesn't consider the similar images to be similar. And for one algorithm, it actually considers a completely random image to be more similar.
https://github.com/wuzhenyusjtu/VisualSearchServer
It is a simple implementation of similar image searching using TensorFlow and InceptionV3 model. The code implements two methods, a server that handles image search, and a simple indexer that do Nearest Neighbor matching based on the pool3 features extracted.
I'm trying to detect a shape (a cross) in my input video stream with the help of OpenCV. Currently I'm thresholding to get a binary image of my cross which works pretty good. Unfortunately my algorithm to decide whether the extracted blob is a cross or not doesn't perform very good. As you can see in the image below, not all corners are detected under certain perspectives.
I'm using findContours() and approxPolyDP() to get an approximation of my contour. If I'm detecting 12 corners / vertices in this approximated curve, the blob is assumed to be a cross.
Is there any better way to solve this problem? I thought about SIFT, but the algorithm has to perform in real-time and I read that SIFT is not really suitable for real-time.
I have a couple of suggestions that might provide some interesting results although I am not certain about either.
If the cross is always near the center of your image and always lies on a planar surface you could try to find a homography between the camera and the plane upon which the cross lies. This would enable you to transform a sample image of the cross (at a selection of different in plane rotations) to the coordinate system of the visualized cross. You could then generate templates which you could match to the image. You could do some simple pixel agreement tests to determine if you have a match.
Alternatively you could try to train a Haar-based classifier to recognize the cross. This type of classifier is often used in face detection and detects oriented edges in images, classifying faces by the relative positions of several oriented edges. It has good classification accuracy on faces and is extremely fast. Although I cannot vouch for its accuracy in this particular situation it might provide some good results for simple shapes such as a cross.
Computing the convex hull and then taking advantage of the convexity defects might work.
All crosses should have four convexity defects, making up four sets of two points, or four vectors. Furthermore, if your shape was a cross then these four vectors will have two pairs of supplementary angles.
In Matlab, there is a function "contour" (Matlab contour). If I use this for my Image, I got what I want. But my goal is to implement such a function to my image editor myself. I read the Matlab's "documentation" for "contour" function and based on that, I used Marching Squares algorithm. Hovewer, my result looks "ugly". Contours are crossing each other and I have very hight number of nested contours, which are eliminated in Matlab.
Anyone know about some solution, how to generate contours from grey-scale image with, lets say, every 10th brightness value ?
The openCV source for their contouring algorithm is available
One of the simplest serious algorithms is Paul Bourke's conrec (with source available) or there is a simple discussion of popular approaches at imageprocessingplace
I am trying to stitch 2 aerial images together with very little overlap, probably <500 px of overlap. These images have 3600x2100 resolution. I am using the OpenCV library to complete this task.
Here is my approach:
1. Find feature points and match points between the two images.
2. Find homography between two images
3. Warp one of the images using the homgraphy
4. Stitch the two images
Right now I am trying to get this to work with two images. I am having trouble with step 3 and possibly step 2. I used findHomography() from the OpenCV library to grab my homography between the two images. Then I called warpPerspective() on one of my images using the homgraphy.
The problem with the approach is that the transformed image is all distorted. Also it seems to only transform a certain part of the image. I have no idea why it is not transforming the whole image.
Can someone give me some advice on how I should approach this problem? Thanks
In the results that you have posted, I can see that you have at least one keypoint mismatch. If you use findHomography(src, dst, 0), it will mess up your homography. You should use findHomography(src, dst, CV_RANSAC) instead.
You can also try to use warpAffine instead of warpPerspective.
Edit: In the results that you posted in the comments to your question, I had the impression that the matching worked quite stable. That means that you should be able to get good results with the example as well. Since you mostly seem to have to deal with translation you could try to filter out the outliers with the following sketched algorithm:
calculate the average (or median) motion vector x_avg
calculate the normalized dot product <x_avg, x_match>
discard x_match if the dot product is smaller than a threshold
To make it work for images with smaller overlap, you would have to look at the detector, descriptors and matches. You do not specify which descriptors you work with, but I would suggest using SIFT or SURF descriptors and the corresponding detectors. You should also set the detector parameters to make a dense sampling (i.e., try to detect more features).
You can refer to this answer which is slightly related: OpenCV - Image Stitching
To stitch images using Homography, the most important thing that should be taken care of is finding of correspondence points in both the images. Lesser the outliers in the correspondence points, the better is the generated homography.
Using robust techniques such as RANSAC along with FindHomography() function of OpenCV(Use CV_RANSAC as option) will still generate reasonable homography provided percentage of inliers is more than percentage of outliers. Also make sure that there are at-least 4 inliers in the correspondence points that passed to the FindHomography function.