Fisher Vector with LSH? - c++

I want to implement a system where given an input image, it returns a reasonable similar one (approximation is acceptable) in a dataset of (about) 50K images. Time performances are crucial.
I'll use a parallel version of SIFT for obtaining a matrix of descriptors D. I've read about Fisher Vector (FV) (VLfeat and Yael implementations) as a learning and much more precise alternative to Bag of Features (BoF) for representing D as a single vector v.
My question are:
What distance is used for FVs? Is it the Euclidean one? In that case I would use LSH in eucledian distance for quickly find approximate near neighbor of FVs.
There is any other FV efficient (in terms of time) C++ implementation?

Another method you could take into consideration is VLAD encoding. (Basically a non-probabilistic version of FV, replacing GMMs by k-Means clustering)
Implementation differs only slightly from standard vector quantisation, but I my experiments it showed much better performance with significantly lower codebook size.
It uses euclidean distance to find the nearest codebook vector, but instead of just counting elements, it accumulates every elements residual.
An example for image search: Link
FV / VLAD paper: Paper

Related

Viola-Jones algorithm complexity

What is Viola-Jones algorithm complexity in form like O(log(N))?
Even though it's a preety simple algorithm there is no concrete info about it.
When we toking about Viola-Jones algorithm complexity we need to remember the steps of this algorithm.
According to the original article of Paul Viola and Michael Jones, the algorithm contains 4 main steps:
Haar Feature Selection
Creating an Integral Image
Adaboost Training
Cascading Classifiers
The complexity of the first step is O(1) because the decision of which Haar Feature to choose is not related to the input.
The complexity of the second step is O(N) because in this step we go over the image matrix. As you know, an Integral Image helps us to perform computations on all the pixels inside that particular feature in O(1) complexity. However, the creation of Integral Image cost O(N) because we go over each pixel in the original matrix and write in the new matrix new value. The value of each point in the new matrix is the sum of all pixels above and to the left, including the target pixel in the old matrix
The complexity of the third step is O(N D^2), where D is the number of features look here why.
The complexity of the fourth step is less than O(N) look here why.
To sum up, as we can calculate from each stage the complexity of the Viola-Jones algorithm is O(n)
It's linear (O(N)) in the number (N) of pixels of the input image. All Haar image features are computed in constant time upon the integral image, and computing the latter requires one pass over the input image.

Locality Sensitivy Hashing in OpenCV for image processing

This is my first image processing application, so please be kind with this filthy peasant.
THE APPLICATION:
I want to implement a fast application (performance are crucial even over accuracy) where given a photo (taken by mobile phone) containing a movie poster finds the most similar photo in a given dataset and return a similarity score. The dataset is composed by similar pictures (taken by mobile phone, containing a movie poster). The images can be of different size, resolutions and can be taken from different viewpoints (but there is no rotation, since the posters are supposed to always be right-oriented).
Any suggestion on how to implement such an application is well accepted.
FEATURE DESCRIPTIONS IN OPENCV:
I've never used OpenCV and I've read this tutorial about Feature Detection and Description by OpenCV.
From what I've understood, these algorithms are supposed to find keypoints (usually corners) and eventually define descriptors (which describe each keypoint and are used for matching two different images). I used "eventually" since some of them (eg FAST) provides only keypoints.
MOST SIMILAR IMAGE PROBLEM AND LSH:
The problems above doesn't solve the problem "given an image, how to find the most similar one in a dataset in a fast way". In order to do that, we can both use the keypoints and descriptors obtained by any of the previous algorithms. The problem stated above seems like a nearest neighbor problem and Locality Sensitive Hashing is a fast and popular solution for find an approximate solution for this problem in high-dimensionality spaces.
THE QUESTION:
What I don't understand is how to use the result of any of the previous algorithms (i.e. keypoints and descriptors) in LSH.
Is there any implementation for this problem?
I will provide a general answer, going beyond the scope of OpenCV library.
Quoting this answer:
descriptors: they are the way to compare the keypoints. They
summarize, in vector format (of constant length) some characteristics
about the keypoints.
With that said, we can imagine/treat (geometrically) a descriptor as point in a D dimensional space. So in total, all the descriptors are points in a D dimensional space. For example, for GIST, D = 960.
So actually descriptors describe the image, using less information that the whole image (because when you have 1 billion images, the size matters). They serve as the image's representatives, so we are processing them on behalf of the image (since they are easier/smaller to treat).
The problem you are mentioning is the Nearest Neighbor problem. Notice that an approximate version of this problem can lead to significant speed ups, when D is big (since the curse of dimensionality will make the traditional approaches, such as a kd-tree very slow, almost linear in N (number of points)).
Algorithms that solve the NN problem, which is a problem of optimization, are usually generic. They may not care if the data are images, molecules, etc., I for example have used my kd-GeRaF for both. As a result, the algorithms expect N points in a D dimensional space, so N descriptors you might want to say.
Check my answer for LSH here (which points to a nice implementation).
Edit:
LSH expects as input N vectors of D dimension and given a query vector (in D) and a range R, will find the vectors that lie within this range from the query vector.
As a result, we can say that every image is represented by just one vector, in SIFT format for example.
You see, LSH doesn't actually solve the k-NN problem directly, but it searches within a range (and can give you the k-NNs, if they are withing the range). Read more about R, in the Experiments section, High-dimensional approximate nearest neighbo. kd-GeRaF and FLANN solve directly the k-NN problem.

What does cv::TermCriteria() exactly do in OpenCV?

Official documentation said that TermCriteria(int Type, int MaxCount, double epsilon) is for defining termination criteria for iterative algorithms. And criteria type can be one of: COUNT, EPS or COUNT + EPS.
However, I can't quite understand what SVM does different in each iteration when I use svm->setTermCriteria(const cv::TermCriteria & val).
SVM training can be considered a large-scale quadratic programming problem and unfortunately it cannot be easily solved with standartd QP techniques. That's why in the past several years numerous decomposition algorithms were introduced in the literature.
The basic idea of these algorithms is to decompose the large QP problem into a series of smaller QP sub-problems. That is, in each iteration the algorithm keeps fixed most dimension of the optimization variable vector and varies a small subset of dimensions (namely working set) to get maximumal reduction of the object function.
As far as I know, OpenCV uses the SVMlight or generalized SMO algorithm and the TermCriteria parameter is a termination criteria of the iterative SVM training procedure which solves the above mentioned partial case of constrained QP problem. You can specify tolerance and/or the maximum number of iterations.

Compare similarity algorithms

I want to use string similarity functions to find corrupted data in my database.
I came upon several of them:
Jaro,
Jaro-Winkler,
Levenshtein,
Euclidean and
Q-gram,
I wanted to know what is the difference between them and in what situations they work best?
Expanding on my wiki-walk comment in the errata and noting some of the ground-floor literature on the comparability of algorithms that apply to similar problem spaces, let's explore the applicability of these algorithms before we determine if they're numerically comparable.
From Wikipedia, Jaro-Winkler:
In computer science and statistics, the Jaro–Winkler distance
(Winkler, 1990) is a measure of similarity between two strings. It is
a variant of the Jaro distance metric (Jaro, 1989, 1995) and
mainly[citation needed] used in the area of record linkage (duplicate
detection). The higher the Jaro–Winkler distance for two strings is,
the more similar the strings are. The Jaro–Winkler distance metric is
designed and best suited for short strings such as person names. The
score is normalized such that 0 equates to no similarity and 1 is an
exact match.
Levenshtein distance:
In information theory and computer science, the Levenshtein distance
is a string metric for measuring the amount of difference between two
sequences. The term edit distance is often used to refer specifically
to Levenshtein distance.
The Levenshtein distance between two strings is defined as the minimum
number of edits needed to transform one string into the other, with
the allowable edit operations being insertion, deletion, or
substitution of a single character. It is named after Vladimir
Levenshtein, who considered this distance in 1965.
Euclidean distance:
In mathematics, the Euclidean distance or Euclidean metric is the
"ordinary" distance between two points that one would measure with a
ruler, and is given by the Pythagorean formula. By using this formula
as distance, Euclidean space (or even any inner product space) becomes
a metric space. The associated norm is called the Euclidean norm.
Older literature refers to the metric as Pythagorean metric.
And Q- or n-gram encoding:
In the fields of computational linguistics and probability, an n-gram
is a contiguous sequence of n items from a given sequence of text or
speech. The items in question can be phonemes, syllables, letters,
words or base pairs according to the application. n-grams are
collected from a text or speech corpus.
The two core
advantages of n-gram models (and algorithms that use
them) are relative simplicity and the ability to scale up – by simply
increasing n a model can be used to store more context with a
well-understood space–time tradeoff, enabling small experiments to
scale up very efficiently.
The trouble is these algorithms solve different problems that have different applicability within the space of all possible algorithms to solve the longest common subsequence problem, in your data or in grafting a usable metric thereof. In fact, not all of these are even metrics, as some of them don't satisfy the triangle inequality.
Instead of going out of your way to define a dubious scheme to detect data corruption, do this properly: by using checksums and parity bits for your data. Don't try to solve a much harder problem when a simpler solution will do.
String similarity helps in a lot of different ways. For example
google's did you mean results are calculated using string similarity.
string similarity is used to correct OCR errors.
string similarity is used to correct keyboard entering errors.
string similarity is used to find most matching sequence of two DNAs in bioinformatics.
But as one size does not fit all. Every string similarity algorithm is designed for a specific usage though most of them are similar. For example Levenshtein_distance is about how many char you change to make two strings equal.
kitten → sitten
Here distance is 1 character change. You may give different weights to deletion, addition and substitution. For example OCR errors and keyboard errors give less weight for some changes. OCR ( some chars are very similar to others ), keyboard some chars are very near to each other. Bioinformatic string similarity allows a lot of insertion.
Your second example of "Jaro–Winkler distance metric is designed and best suited for short strings such as person names"
Therefore you should keep in your mind about your problem.
I want to use string similarity functions to find corrupted data in my database.
How your data is corrupted? Is it a user error , similar to keyboard input error? Or is it similar to OCR errors? Or something else entirely?

Parallelization of neighborhood point deletion

I am implementing the Good Features To Track/Shi-Tomasi corner detection algorithm on CUDA and need to find a way to parallelize the following part of the algorithm:
I start with an array of points obtained from an image sorted according to a certain intensity value (an eigenvalue of a previous calculation).
Starting with the first point of the array, I remove any point in the array that is within a certain physical distance of the first point. (This distance is calculated on the image plane, not on the array).
On the resulting array, we repeat step two for the remaining points.
Is this somehow parallelizable, specifically on CUDA? I'm suspecting not, since there will obviously be dependencies across the image.
I think the article Accelerated Corner-Detector Algorithms describes the way to solve this problem.