What does cv::TermCriteria() exactly do in OpenCV? - c++

Official documentation said that TermCriteria(int Type, int MaxCount, double epsilon) is for defining termination criteria for iterative algorithms. And criteria type can be one of: COUNT, EPS or COUNT + EPS.
However, I can't quite understand what SVM does different in each iteration when I use svm->setTermCriteria(const cv::TermCriteria & val).

SVM training can be considered a large-scale quadratic programming problem and unfortunately it cannot be easily solved with standartd QP techniques. That's why in the past several years numerous decomposition algorithms were introduced in the literature.
The basic idea of these algorithms is to decompose the large QP problem into a series of smaller QP sub-problems. That is, in each iteration the algorithm keeps fixed most dimension of the optimization variable vector and varies a small subset of dimensions (namely working set) to get maximumal reduction of the object function.
As far as I know, OpenCV uses the SVMlight or generalized SMO algorithm and the TermCriteria parameter is a termination criteria of the iterative SVM training procedure which solves the above mentioned partial case of constrained QP problem. You can specify tolerance and/or the maximum number of iterations.

Related

How should I compute the null space of a rectangular sparse matrix over GF(2) in C/C++?

UPDATE: I ended up not using Eigen and implementing my own GF(2) matrix representation where each row is an array of integers, and each bit of the integer represents a single entry. I then use a modified Gaussian Elimination with bit operations to obtain the desired vectors
I currently have a (large) rectangular sparse matrix that I'm storing using Eigen3 that I want to find the (right) null space over GF(2). I researched around and found some possible approaches to this:
(Modified) Gaussian Elimination
This means simply using some form of Gaussian Elimination to find a reduced form of the matrix that preserves the nullspace then extract the nullspace off of that. Though I know how I would do this by hand, I'm quite clueless as to how I would actually implement this.
SVD Decomposition
QR Decomposition
I'm not familiar with these, but I from my understanding the (orthonormal) basis vectors of the nullspace can be extracted from the decomposed form of the matrix.
Now my question is: Which approach should I use in my case (i.e. rectangular sparse matrix over GF(2)) that doesn't involve converting into a dense matrix? And if there are many approaches, what would recommended in terms of performance and ease of implementation?
I'm also open to using other libraries besides Eigen as well.
For context, I'm trying to find combine equivalence relations for factoring algorithms (e.g. as in Quadratic Sieve). Also, if possible, I would like to look into parallelising these algorithms in the future, so if there exists an approach that would allow this, that would be great!
Let's call the matrix in question M. Then (please correct me if I'm wrong):
GF(2) implies that M is a equivalent to a matrix of bits - each element can have one of two values.
Arithmetic on GF(2) is just like integer arithmetic on non-negative numbers, but done modulo 2, so addition is a bitwise XOR, and multiplication is a bitwise AND. It won't matter what exact elements the GF(2) has - they are all equivalent to bits.
Vectors in GF(2) are linearly independent as long as they are not equal, or as long as they differ by at least on bit, or v_1 + v_2 ≠ 0 (since addition in GF(2) is boolean XOR).
By definition, the (right) nullspace spans basis vectors that the matrix transforms to 0. A vector v would be in the nullspace if one multiplies each j-th column of M with the j-th bit of v, sum them, and the result is zero.
I see at least two ways of going about it.
Do dense Gaussian elimination in terms of bit operations, and organize the data and write the loops so that the compiler vectorizes everything and operates on 512-bit data types. You could use Compiler Explorer on godbolt.org to easily check that the vectorization takes place and e.g. AVX512 instructions are used. Linear gains will eventually lose out with the squared scaling of the problem, of course, but the performance increase over naive bool-based implementation will be massive and may be sufficient for your needs. The sparsity adds a possible complication: if the matrix won't comfortably fit in memory in a dense representation, then a suitable representation has to be devised that makes Gaussian elimination perform well. More is need to be known about the matrices you work on. Generally speaking, row operations will be performed at memory bandwidth if the implementation is correct, on the order of 1E10 elements/s, so a 1E3x1E3 M should process in about a second at most.
Since the problem is equivalent to a set of boolean equations, use a SAT solver (Boolean satisfiability problem solver) to incrementally generate the nullspace. The initial equation set is M × v = 0 and v ≠ 0, where v is a bit vector. Run the SAT until it finds some v, let's call it v_i. Then add a constraint v ≠ v_i, and run SAT again - adding the constraints in each iteration. That is, k-th iteration has constraints v ≠ 0, v ≠ v1, ... v ≠ v(k-1).
Since all bit vectors that are different are also linearly independent, the inequality constraints will force incremental generation of nullspace basis vectors.
Modern SAT excels at sparse problems with more boolean equations than variables, so I imagine this would work very well - the sparser the matrix, the better. The problem should be pre-processed to remove all zero columns in M to minimize the combinatorial explosion. Open source SAT solvers can easily deal with 1M variable problems - so, for a sparse problem, you could be realistically solving with 100k-1M columns in M, and about 10 "ones" in each row. So a 1Mx1M sparse matrix with 10 "ones" in each row on average would be a reasonable task for common SAT solvers, and I imagine that state of the art could deal with 10Mx10M matrices and beyond.
Furthermore, your application is ideal for incremental solvers: you find one solution, stop, add a constraint, resume, and so on. So I imagine you may get very good results, and there are several good open source solvers to choose from.
Since you use Eigen already, the problem would at least fit into the SparseMatrix representation with byte-sized elements, so it's not a very big problem as far as SAT is concerned.
I wonder whether this nullspace basis finding is a case of a cover problem, possibly relaxed. There are some nice algorithms for those, but it's always a question of whether the specialized algorithm will work better than just throwing SAT at it and waiting it out, so to speak.
Updated answer - thanks to harold: QR decomposition is not applicable in general for your case.
See for instance
https://math.stackexchange.com/questions/1346664/how-to-find-orthogonal-vectors-in-gf2
I wrongly assumed, QR is applicable here, but it's not by theory.
If you are still interested in details about QR-algorithms, please open a new thread.

How to solve a linear system of a square matrix in C++ when the matrix has no LU-Decomposition?

I have been trying to develop a program to solve a system Ax=b for a square matrix A using LU-Decomposition. However, I realized that this decomposition does not always exist (one way to tell is if a row exchange operation is not required, then exists). However, I see from many sources that this is an excellent method in computing the solutions to Ax=b.
My question is: how often is it that one comes across a matrix that does not have an LU-decomposition? If one does encounter such a matrix, how should he handle it? Should he create a separate method such as Gaussian Elimination just in case?
Please provide me with some insight on this. Thanks in advance.
Note: I am trying to use this information to solve A^TAx=A^Tb, i.e. finding a mathematical model using least squares.
Taken from wikipedia in its most concise form
Any square matrix $A$ admits an LUP factorization. If $A$ is invertible, then it admits an LU (or LDU) factorization if and only if all its leading principal minors are non-zero. If $A$ is a singular matrix of rank $k$, then it admits an LU factorization if the first $k$ leading principal minors are non-zero, although the converse is not true.
I don't have the implementation fully written, but this looks involved. I would think depending on your matrix, there exists simpler numerical schemes that reduces your solution down.
As for often how does one come across such? Well no one has any idea what you do, so that is impossible to answer. If you encounter such, switch to another scheme.
One that I have used often in practice is Gauss-Seidel. Actually wikipedia has a completely written scheme.
The LU decompositions exists if and only if all leading principal minors of the matrix are non-zero.
From your actual question, you are solving:
A^TAx=A^T
A^TA is a square symmetric matrix. We can diagonalize the matrix as: A = R^-1 D R and you can always rearrange it to find x. You need non-zero eigen values for this to work.
A (square) matrix is invertible if and only if it does not have a zero eigenvalue.
I think inverting it via Gaussian elimination might be the best solution.

Fisher Vector with LSH?

I want to implement a system where given an input image, it returns a reasonable similar one (approximation is acceptable) in a dataset of (about) 50K images. Time performances are crucial.
I'll use a parallel version of SIFT for obtaining a matrix of descriptors D. I've read about Fisher Vector (FV) (VLfeat and Yael implementations) as a learning and much more precise alternative to Bag of Features (BoF) for representing D as a single vector v.
My question are:
What distance is used for FVs? Is it the Euclidean one? In that case I would use LSH in eucledian distance for quickly find approximate near neighbor of FVs.
There is any other FV efficient (in terms of time) C++ implementation?
Another method you could take into consideration is VLAD encoding. (Basically a non-probabilistic version of FV, replacing GMMs by k-Means clustering)
Implementation differs only slightly from standard vector quantisation, but I my experiments it showed much better performance with significantly lower codebook size.
It uses euclidean distance to find the nearest codebook vector, but instead of just counting elements, it accumulates every elements residual.
An example for image search: Link
FV / VLAD paper: Paper

Locality Sensitivy Hashing in OpenCV for image processing

This is my first image processing application, so please be kind with this filthy peasant.
THE APPLICATION:
I want to implement a fast application (performance are crucial even over accuracy) where given a photo (taken by mobile phone) containing a movie poster finds the most similar photo in a given dataset and return a similarity score. The dataset is composed by similar pictures (taken by mobile phone, containing a movie poster). The images can be of different size, resolutions and can be taken from different viewpoints (but there is no rotation, since the posters are supposed to always be right-oriented).
Any suggestion on how to implement such an application is well accepted.
FEATURE DESCRIPTIONS IN OPENCV:
I've never used OpenCV and I've read this tutorial about Feature Detection and Description by OpenCV.
From what I've understood, these algorithms are supposed to find keypoints (usually corners) and eventually define descriptors (which describe each keypoint and are used for matching two different images). I used "eventually" since some of them (eg FAST) provides only keypoints.
MOST SIMILAR IMAGE PROBLEM AND LSH:
The problems above doesn't solve the problem "given an image, how to find the most similar one in a dataset in a fast way". In order to do that, we can both use the keypoints and descriptors obtained by any of the previous algorithms. The problem stated above seems like a nearest neighbor problem and Locality Sensitive Hashing is a fast and popular solution for find an approximate solution for this problem in high-dimensionality spaces.
THE QUESTION:
What I don't understand is how to use the result of any of the previous algorithms (i.e. keypoints and descriptors) in LSH.
Is there any implementation for this problem?
I will provide a general answer, going beyond the scope of OpenCV library.
Quoting this answer:
descriptors: they are the way to compare the keypoints. They
summarize, in vector format (of constant length) some characteristics
about the keypoints.
With that said, we can imagine/treat (geometrically) a descriptor as point in a D dimensional space. So in total, all the descriptors are points in a D dimensional space. For example, for GIST, D = 960.
So actually descriptors describe the image, using less information that the whole image (because when you have 1 billion images, the size matters). They serve as the image's representatives, so we are processing them on behalf of the image (since they are easier/smaller to treat).
The problem you are mentioning is the Nearest Neighbor problem. Notice that an approximate version of this problem can lead to significant speed ups, when D is big (since the curse of dimensionality will make the traditional approaches, such as a kd-tree very slow, almost linear in N (number of points)).
Algorithms that solve the NN problem, which is a problem of optimization, are usually generic. They may not care if the data are images, molecules, etc., I for example have used my kd-GeRaF for both. As a result, the algorithms expect N points in a D dimensional space, so N descriptors you might want to say.
Check my answer for LSH here (which points to a nice implementation).
Edit:
LSH expects as input N vectors of D dimension and given a query vector (in D) and a range R, will find the vectors that lie within this range from the query vector.
As a result, we can say that every image is represented by just one vector, in SIFT format for example.
You see, LSH doesn't actually solve the k-NN problem directly, but it searches within a range (and can give you the k-NNs, if they are withing the range). Read more about R, in the Experiments section, High-dimensional approximate nearest neighbo. kd-GeRaF and FLANN solve directly the k-NN problem.

OpenCV kmean: how to choose decent values for COUNT and EPS?

I am trying to use the kmean function in OpenCV to pre-classify 36000 sample images into 100+ classes (to reduce my work to prepare train data for supervised learning). In this function there are two parameters which I do not really understand: cv::TermCriteria::EPS and cv::TermCriteria::COUNT.
cv::kmeans(dataset.t(), K, kmean_labels, cv::TermCriteria( cv::TermCriteria::EPS + cv::TermCriteria::COUNT, 10, 1.0),
3, cv::KMEANS_PP_CENTERS, kmean_centers);
In OpenCV documents, it explains that:
cv::TermCriteria::EPS: the desired accuracy or change in parameters at which the iterative algorithm stops.
cv::TermCriteria::COUNT: the maximum number of iterations or elements to compute.
The explanation above is not quite clear for me. Can anyone help to explain more and show how to find good values for COUNT and EPS?
Thank you very much.
There are no magical numbers that will fit all applications (otherwise they wouldn't be parameters).
Kmeans is an iterative algorithm, which will move towards an optimum and each iteration should get better, but you need to tell your algorithm when to stop.
Using cv::TermCriteria::COUNT, you tell the algorithm: you can perform x iterations, then stop. But this doesn't guarantee you any precision.
Using cv::TermCriteria::EPS, you tell the algorithm to continue its iterations, untill the difference between two successive iterations becomes sufficiently small. The parameter EPS tell the algorithm how small this difference should become. This depends of course on the dataset that you are feeding to the algorithm. Suppose you multiply all your data points by 10; then EPS should vary accordingly (quadratically I suppose, but not sure about that).
When you use both both parameters; you tell the algorithm to stop when one of both conditions is fullfilled; for example: stop iterating when the difference between two successive runs is smaller than 0.1, OR when you have done 10 iterations.
in conclusion: only analysis of your datasets, and trial and error can give you decent values...