Locality Sensitivy Hashing in OpenCV for image processing - c++

This is my first image processing application, so please be kind with this filthy peasant.
THE APPLICATION:
I want to implement a fast application (performance are crucial even over accuracy) where given a photo (taken by mobile phone) containing a movie poster finds the most similar photo in a given dataset and return a similarity score. The dataset is composed by similar pictures (taken by mobile phone, containing a movie poster). The images can be of different size, resolutions and can be taken from different viewpoints (but there is no rotation, since the posters are supposed to always be right-oriented).
Any suggestion on how to implement such an application is well accepted.
FEATURE DESCRIPTIONS IN OPENCV:
I've never used OpenCV and I've read this tutorial about Feature Detection and Description by OpenCV.
From what I've understood, these algorithms are supposed to find keypoints (usually corners) and eventually define descriptors (which describe each keypoint and are used for matching two different images). I used "eventually" since some of them (eg FAST) provides only keypoints.
MOST SIMILAR IMAGE PROBLEM AND LSH:
The problems above doesn't solve the problem "given an image, how to find the most similar one in a dataset in a fast way". In order to do that, we can both use the keypoints and descriptors obtained by any of the previous algorithms. The problem stated above seems like a nearest neighbor problem and Locality Sensitive Hashing is a fast and popular solution for find an approximate solution for this problem in high-dimensionality spaces.
THE QUESTION:
What I don't understand is how to use the result of any of the previous algorithms (i.e. keypoints and descriptors) in LSH.
Is there any implementation for this problem?

I will provide a general answer, going beyond the scope of OpenCV library.
Quoting this answer:
descriptors: they are the way to compare the keypoints. They
summarize, in vector format (of constant length) some characteristics
about the keypoints.
With that said, we can imagine/treat (geometrically) a descriptor as point in a D dimensional space. So in total, all the descriptors are points in a D dimensional space. For example, for GIST, D = 960.
So actually descriptors describe the image, using less information that the whole image (because when you have 1 billion images, the size matters). They serve as the image's representatives, so we are processing them on behalf of the image (since they are easier/smaller to treat).
The problem you are mentioning is the Nearest Neighbor problem. Notice that an approximate version of this problem can lead to significant speed ups, when D is big (since the curse of dimensionality will make the traditional approaches, such as a kd-tree very slow, almost linear in N (number of points)).
Algorithms that solve the NN problem, which is a problem of optimization, are usually generic. They may not care if the data are images, molecules, etc., I for example have used my kd-GeRaF for both. As a result, the algorithms expect N points in a D dimensional space, so N descriptors you might want to say.
Check my answer for LSH here (which points to a nice implementation).
Edit:
LSH expects as input N vectors of D dimension and given a query vector (in D) and a range R, will find the vectors that lie within this range from the query vector.
As a result, we can say that every image is represented by just one vector, in SIFT format for example.
You see, LSH doesn't actually solve the k-NN problem directly, but it searches within a range (and can give you the k-NNs, if they are withing the range). Read more about R, in the Experiments section, High-dimensional approximate nearest neighbo. kd-GeRaF and FLANN solve directly the k-NN problem.

Related

Fisher Vector with LSH?

I want to implement a system where given an input image, it returns a reasonable similar one (approximation is acceptable) in a dataset of (about) 50K images. Time performances are crucial.
I'll use a parallel version of SIFT for obtaining a matrix of descriptors D. I've read about Fisher Vector (FV) (VLfeat and Yael implementations) as a learning and much more precise alternative to Bag of Features (BoF) for representing D as a single vector v.
My question are:
What distance is used for FVs? Is it the Euclidean one? In that case I would use LSH in eucledian distance for quickly find approximate near neighbor of FVs.
There is any other FV efficient (in terms of time) C++ implementation?
Another method you could take into consideration is VLAD encoding. (Basically a non-probabilistic version of FV, replacing GMMs by k-Means clustering)
Implementation differs only slightly from standard vector quantisation, but I my experiments it showed much better performance with significantly lower codebook size.
It uses euclidean distance to find the nearest codebook vector, but instead of just counting elements, it accumulates every elements residual.
An example for image search: Link
FV / VLAD paper: Paper

Query re. how to set up an SVM, which SVM variation … and how to define a metric

I’d like to learn how best set up an SVM in openCV (or other C++ library) for my particular problem (or if indeed there is a more appropriate algorithm).
My goal is to receive a weighting of how well an input set of labeled points on a 2D plane compares or fits with a set of ‘ideal’ sets of labeled 2D points.
I hope my illustrations make this clear – the first three boxes labeled A through C, indicate different ideal placements of 3 points, in my illustrations the labelling is managed by colour:
The second graphic gives examples of possible inputs:
If I then pass for instance example input set 1 to the algorithm it will compare that input set with each ideal set, illustrated here:
I would suggest that most observers would agree that the example input 1 is most similar to ideal set A, then B, then C.
My problem is to get not only this ordering out of an algorithm, but also ideally a weighting of by how much proportion is the input like A with respect to B and C.
For the example given it might be something like:
A:60%, B:30%, C:10%
Example input 3 might yield something such as:
A:33%, B:32%, C:35% (i.e. different order, and a less 'determined' result)
My end goal is to interpolate between the ideal settings using these weights.
To get the ordering I’m guessing the ‘cost’ involved of fitting the inputs to each set maybe have simply been compared anyway (?) … if so, could this cost be used to find the weighting? or maybe was it non-linear and some kind of transformation needs to happen? (but still obviously, relative comparisons were ok to determine the order).
Am I on track?
Direct question>> is the openCV SVM appropriate? - or more specifically:
A series of separated binary SVM classifiers for each ideal state and then a final ordering somehow ? (i.e. what is the metric?)
A version of an SVM such as multiclass, structured and so on from another library? (...that I still find hard to conceptually grasp as the examples seem so unrelated)
Also another critical component I’m not fully grasping yet is how to define what determines a good fit between any example input set and an ideal set. I was thinking Euclidian distance, and I simply sum the distances? What about outliers? My vector calc needs a brush up, but maybe dot products could nose in there somewhere?
Direct question>> How best to define a metric that describes a fit in this case?
The real case would have 10~20 points per set, and time permitting as many 'ideal' sets of points as possible, lets go with 30 for now. Could I expect to get away with ~2ms per iteration on a reasonable machine? (macbook pro) or does this kind of thing blow up ?
(disclaimer, I have asked this question more generally on Cross Validated, but there isn't much activity there (?))

Polynomial Least Squares for Image Curve Fitting

I am trying to fit a curve to a number of pixels in an image so I can do further processing regarding it's shape. Does anyone know how to implement a least squares method in C/++ preferably using the following parameters: an x array, a y array, and an answers array (the length of the answers array should tell how many coefficients need to be calculated)?
If this is not some exercise in implementing this yourself, I would suggest you use a ready-made library like GNU gsl. Have a look at the functions whose names start with gsl_multifit_, see e.g. the second example here.
If you are trying to fit ordered points (x,y) like in a graph you can use linear least squares methods but always with such methods you will need to specify the degree of the polynomial you use to approximate with (length of your answers array presumably). If your points are general ordered points in the plane that are able to form a closed loop or some outline of a structure (for example trying to fit points that describe an ellipse or a circle or other closed or more complex geometry) then you are going to need something more sophisticated. You can still use least squares but you will need to use a parametric type curve like a spline. Take a look at the pdf at this link which may give what you need (or at the very least illustrate what I am saying): http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CE0QFjAA&url=http%3A%2F%2Ffolk.uio.no%2Fin329%2Fnchap6.pdf&ei=Yp8CUNvHC8Kg0QX6r_mEBw&usg=AFQjCNHBUZ5t2Y7C8eONYSosRydLs4Zu4A
Without seeing an image of exactly what you are trying to fit it is hard to say - it is quite possible that your data can be fit in a non parametric way with linear least squares polynomials - if so all you will need is a linear algebra library and you can code the approximations yourself like so: http://en.wikipedia.org/wiki/Ordinary_least_squares
Even so, all forms of approximation require you to decide on your form (function basis and degree etc) before you fit it. For example, if you want to decide on whether you need a 4th,5th,6th or 7th degree polynomial fit your data you would need to fit each one and assess the suitability for yourself. There is no generic way (at least none that I know of) that will tell you the degree of approximation you need to fit to your data.

Parallelization of neighborhood point deletion

I am implementing the Good Features To Track/Shi-Tomasi corner detection algorithm on CUDA and need to find a way to parallelize the following part of the algorithm:
I start with an array of points obtained from an image sorted according to a certain intensity value (an eigenvalue of a previous calculation).
Starting with the first point of the array, I remove any point in the array that is within a certain physical distance of the first point. (This distance is calculated on the image plane, not on the array).
On the resulting array, we repeat step two for the remaining points.
Is this somehow parallelizable, specifically on CUDA? I'm suspecting not, since there will obviously be dependencies across the image.
I think the article Accelerated Corner-Detector Algorithms describes the way to solve this problem.

All k nearest neighbors in 2D, C++

I need to find for each point of the data set all its nearest neighbors. The data set contains approx. 10 million 2D points. The data are close to the grid, but do not form a precise grid...
This option excludes (in my opinion) the use of KD Trees, where the basic assumption is no points have same x coordinate and y coordinate.
I need a fast algorithm O(n) or better (but not too difficult for implementation :-)) ) to solve this problem ... Due to the fact that boost is not standardized, I do not want to use it ...
Thanks for your answers or code samples...
I would do the following:
Create a larger grid on top of the points.
Go through the points linearly, and for each one of them, figure out which large "cell" it belongs to (and add the points to a list associated with that cell).
(This can be done in constant time for each point, just do an integer division of the coordinates of the points.)
Now go through the points linearly again. To find the 10 nearest neighbors you only need to look at the points in the adjacent, larger, cells.
Since your points are fairly evenly scattered, you can do this in time proportional to the number of points in each (large) cell.
Here is an (ugly) pic describing the situation:
The cells must be large enough for (the center) and the adjacent cells to contain the closest 10 points, but small enough to speed up the computation. You could see it as a "hash-function" where you'll find the closest points in the same bucket.
(Note that strictly speaking it's not O(n) but by tweaking the size of the larger cells, you should get close enough. :-)
I have used a library called ANN (Approximate Nearest Neighbour) with great success. It does use a Kd-tree approach, although there was more than one algorithm to try. I used it for point location on a triangulated surface. You might have some luck with it. It is minimal and was easy to include in my library just by dropping in its source.
Good luck with this interesting task!