Dimensionality Reduction - pca

I am trying to understand the different methods for dimensionality reduction in data analysis. In particular I am interested in Singular Value Decomposition (SVD) and Principle Component Analysis (PCA).
Can anyone please explain there terms to a layperson? - I understand the general premis of dimensionality reduction as bringing data to a lower dimension - But
a) how do SVD and PCA do this, and
b) how do they differ in their approach
OR maybe if you can explain what the results of each technique is telling me, so for
a) SVD - what are singular values
b) PCA - "proportion of variance"
Any example would be brilliant. I am not very good at maths!!
Thanks

You probably already figured this out, but I'll post a short description anyway.
First, let me describe the two techniques speaking generally.
PCA basically takes a dataset and figures out how to "transform" it (i.e. project it into a new space, usually of lower dimension). It essentially gives you a new representation of the same data. This new representation has some useful properties. For instance, each dimension of the new space is associated with the amount of variance it explains, i.e. you can essentially order the variables output by PCA by how important they are in terms of the original representation. Another property is the fact that linear correlation is removed from the PCA representation.
SVD is a way to factorize a matrix. Given a matrix M (e.g. for data, it could be an n by m matrix, for n datapoints, each of dimension m), you get U,S,V = SVD(M) where:M=USV^T, S is a diagonal matrix, and both U and V are orthogonal matrices (meaning the columns & rows are orthonormal; or equivalently UU^T=I & VV^T=I).
The entries of S are called the singular values of M. You can think of SVD as dimensionality reduction for matrices, since you can cut off the lower singular values (i.e. set them to zero), destroying the "lower parts" of the matrices upon multiplying them, and get an approximation to M. In other words, just keep the top k singular values (and the top k vectors in U and V), and you have a "dimensionally reduced" version (representation) of the matrix.
Mathematically, this gives you the best rank k approximation to M, essentially like a reduction to k dimensions. (see this answer for more).
So Question 1
I understand the general premis of dimensionality reduction as bringing data to a lower dimension - But
a) how do SVD and PCA do this, and b) how do they differ in their approach
The answer is that they are the same.
To see this, I suggest reading the following posts on the CV and math stack exchange sites:
What is the intuitive relationship between SVD and PCA?
Relationship between SVD and PCA. How to use SVD to perform PCA?
How to use SVD for dimensionality reduction to reduce the number of columns (features) of the data matrix?
How to use SVD for dimensionality reduction (in R)
Let me summarize the answer:
essentially, SVD can be used to compute PCA.
PCA is closely related to the eigenvectors and eigenvalues of the covariance matrix of the data. Essentially, by taking the data matrix, computing its SVD, and then squaring the singular values (and doing a little scaling), you end up getting the eigendecomposition of the covariance matrix of the data.
Question 2
maybe if you can explain what the results of each technique is telling me, so for a) SVD - what are singular values b) PCA - "proportion of variance"
These eigenvectors (the singular vectors of the SVD, or the principal components of the PCA) form the axes of the news space into which one transforms the data.
The eigenvalues (closely related to the squares of the data matrix SVD singular values) hold the variance explained by each component. Often, people want to retain say 95% of the variance of the original data, so if they originally had n-dimensional data, they reduce it to d-dimensional data that keeps that much of the original variance, by choosing the largest d-eigenvalues such that 95% of the variance is kept. This keeps as much information as possible, while retaining as few useless dimensions as possible.
In other words, these values (variance explained) essentially tell us the importance of each principal component (PC), in terms of their usefulness reconstructing the original (high-dimensional) data. Since each PC forms an axis in the new space (constructed via linear combinations of the old axes in the original space), it tells us the relative importance of each of the new dimensions.
For bonus, note that SVD can also be used to compute eigendecompositions, so it can also be used to compute PCA in a different way, namely by decomposing the covariance matrix directly. See this post for details.

According to your question,I only understood the topic of Principal component analysis.so that I share few below points about PCA i hope you definitely understand.
PCA:
1.PCA is a linear transformation dimensionality reduction technique.
2.It is used for operations such as noise filtering,feature extraction and data visualization.
3.The goal of PCA is to identify patterns and detecting the correlations between variables.
4.If there is a strong correlation,then we could reduce the dimensionality which PCA is intended for.
5.Eigenvector is to make linear transformation without changing the directions.
this is the sample url to understand the PCA:https://www.solver.com/xlminer/help/principal-components-analysis-example

Related

How should I compute the null space of a rectangular sparse matrix over GF(2) in C/C++?

UPDATE: I ended up not using Eigen and implementing my own GF(2) matrix representation where each row is an array of integers, and each bit of the integer represents a single entry. I then use a modified Gaussian Elimination with bit operations to obtain the desired vectors
I currently have a (large) rectangular sparse matrix that I'm storing using Eigen3 that I want to find the (right) null space over GF(2). I researched around and found some possible approaches to this:
(Modified) Gaussian Elimination
This means simply using some form of Gaussian Elimination to find a reduced form of the matrix that preserves the nullspace then extract the nullspace off of that. Though I know how I would do this by hand, I'm quite clueless as to how I would actually implement this.
SVD Decomposition
QR Decomposition
I'm not familiar with these, but I from my understanding the (orthonormal) basis vectors of the nullspace can be extracted from the decomposed form of the matrix.
Now my question is: Which approach should I use in my case (i.e. rectangular sparse matrix over GF(2)) that doesn't involve converting into a dense matrix? And if there are many approaches, what would recommended in terms of performance and ease of implementation?
I'm also open to using other libraries besides Eigen as well.
For context, I'm trying to find combine equivalence relations for factoring algorithms (e.g. as in Quadratic Sieve). Also, if possible, I would like to look into parallelising these algorithms in the future, so if there exists an approach that would allow this, that would be great!
Let's call the matrix in question M. Then (please correct me if I'm wrong):
GF(2) implies that M is a equivalent to a matrix of bits - each element can have one of two values.
Arithmetic on GF(2) is just like integer arithmetic on non-negative numbers, but done modulo 2, so addition is a bitwise XOR, and multiplication is a bitwise AND. It won't matter what exact elements the GF(2) has - they are all equivalent to bits.
Vectors in GF(2) are linearly independent as long as they are not equal, or as long as they differ by at least on bit, or v_1 + v_2 ≠ 0 (since addition in GF(2) is boolean XOR).
By definition, the (right) nullspace spans basis vectors that the matrix transforms to 0. A vector v would be in the nullspace if one multiplies each j-th column of M with the j-th bit of v, sum them, and the result is zero.
I see at least two ways of going about it.
Do dense Gaussian elimination in terms of bit operations, and organize the data and write the loops so that the compiler vectorizes everything and operates on 512-bit data types. You could use Compiler Explorer on godbolt.org to easily check that the vectorization takes place and e.g. AVX512 instructions are used. Linear gains will eventually lose out with the squared scaling of the problem, of course, but the performance increase over naive bool-based implementation will be massive and may be sufficient for your needs. The sparsity adds a possible complication: if the matrix won't comfortably fit in memory in a dense representation, then a suitable representation has to be devised that makes Gaussian elimination perform well. More is need to be known about the matrices you work on. Generally speaking, row operations will be performed at memory bandwidth if the implementation is correct, on the order of 1E10 elements/s, so a 1E3x1E3 M should process in about a second at most.
Since the problem is equivalent to a set of boolean equations, use a SAT solver (Boolean satisfiability problem solver) to incrementally generate the nullspace. The initial equation set is M × v = 0 and v ≠ 0, where v is a bit vector. Run the SAT until it finds some v, let's call it v_i. Then add a constraint v ≠ v_i, and run SAT again - adding the constraints in each iteration. That is, k-th iteration has constraints v ≠ 0, v ≠ v1, ... v ≠ v(k-1).
Since all bit vectors that are different are also linearly independent, the inequality constraints will force incremental generation of nullspace basis vectors.
Modern SAT excels at sparse problems with more boolean equations than variables, so I imagine this would work very well - the sparser the matrix, the better. The problem should be pre-processed to remove all zero columns in M to minimize the combinatorial explosion. Open source SAT solvers can easily deal with 1M variable problems - so, for a sparse problem, you could be realistically solving with 100k-1M columns in M, and about 10 "ones" in each row. So a 1Mx1M sparse matrix with 10 "ones" in each row on average would be a reasonable task for common SAT solvers, and I imagine that state of the art could deal with 10Mx10M matrices and beyond.
Furthermore, your application is ideal for incremental solvers: you find one solution, stop, add a constraint, resume, and so on. So I imagine you may get very good results, and there are several good open source solvers to choose from.
Since you use Eigen already, the problem would at least fit into the SparseMatrix representation with byte-sized elements, so it's not a very big problem as far as SAT is concerned.
I wonder whether this nullspace basis finding is a case of a cover problem, possibly relaxed. There are some nice algorithms for those, but it's always a question of whether the specialized algorithm will work better than just throwing SAT at it and waiting it out, so to speak.
Updated answer - thanks to harold: QR decomposition is not applicable in general for your case.
See for instance
https://math.stackexchange.com/questions/1346664/how-to-find-orthogonal-vectors-in-gf2
I wrongly assumed, QR is applicable here, but it's not by theory.
If you are still interested in details about QR-algorithms, please open a new thread.

library for full SVD of sparse matrices

I want to do a singular value decomposition for large matrices containing a lot of zeros. In particular I need U and S, obtained from the diagonalization of a symmetric matrix A. This means that A = U * S * transpose(U^*), where S is a diagonal matrix and U contains all eigenvectors as columns.
I searched the web for c++ librarys that combine SVD and sparse matrices, but could only find libraries that find a few, but not all eigenvectors. Does anyone know if there is such a library?
Also after obtaining U and S I need to multiply them to some dense vector.
For this problem, I am using a combination of different techniques:
Arpack can compute a set of eigenvalues and associated eigenvectors, unfortunately it is fast only for high frequencies and slow for low frequencies
but since the eigenvectors of the inverse of a matrix are the same as the eigenvectors of a matrix, one can factor the matrix (using a sparse matrix factorization routine, such as SuperLU, or Choldmod if the matrix is symmetric). The "communication protocol" with Arpack only expects you to compute a matrix-vector product, so if you do a linear system solve using the factored matrix instead, then this makes Arpack fast for the low frequencies of the spectrum (do not forget then to replace the eigenvalue lambda by 1/lambda !)
This trick can be used to explore the entire spectrum, with a generalized transform (the transform in the previous point is refered as "invert" transform). There is also a "shift-invert" transform that allows one to explore an arbitrary portion of the spectrum and have fast convergence of Arpack. Then you compute (1/lambda + sigma) instead of lambda, when sigma is a "shift" (the transform is slightly more complicated than the "invert" transform, see the references below for a full explanation).
ARPACK: http://www.caam.rice.edu/software/ARPACK/
SUPERLU: http://crd-legacy.lbl.gov/~xiaoye/SuperLU/
The numerical algorithm is explained in my article that can be downloaded here:
http://alice.loria.fr/index.php/publications.html?redirect=0&Paper=ManifoldHarmonics#2008
Sourcecode is available there:
https://gforge.inria.fr/frs/download.php/file/27277/manifold_harmonics-a4-src.tar.gz
See also my answer to this question:
https://scicomp.stackexchange.com/questions/20243/sparse-generalized-eigensolver-using-opencl/20255#20255

What exactly happens after run Algorithm PCA (Principal Component Analysis)

At this moment I'm working with an image processing project. But I have a conceptual questions regarding the PCA.
What exactly happens to the matrix after applying the PCA in the matrix of an image?
I does not understand reading the literature on this subject.
Given an M x N matrix, the result is an matrix M' x N' and where M'< M and N' < N and M' x N' is proporcional to M x N?
I'm no expert in PCA but I'll try to explain what I understand.
After apply PCA to the matrix of an image, you get the eigenvectors of that matrix, which represents a number invariant axis of the matrix. These vectors are all orthogonal to each other.
By measuring how disperse are your original data in matrix along these vectors, you can know how are they distributed. This can be useful, for example, if you wish to perform pattern categorization based on how the data are distributed along these "axis".
Although not accurate, you can imagine PCA helps you to draw "axis" along the blob of data that presence in your matrix, where the new "origin" of the axis are the center of your data.
The best part is that the data are distributed the most along the first eigenvector, follow by the second eigenvector and so on.
I hope I did not confused you.
There are a number of good reference about PCA in quora in addition to stackoverflow too.
Here are a few examples:
https://www.quora.com/What-is-an-intuitive-explanation-for-PCA
http://www.quora.com/How-to-explain-PCA-in-laymans-terms
Again, I'm no expert and welcome others to correct/educate both rwvaldivia and me.
The concept of PCA is closely related to linear algebra, which is the domain of mathematics to which matrices belong. A common way to view a matrix is as a set of vectors, a MxN matrix are just M vectors in an N-dimensional space.
Now a general concept in linear algebra is that the choice of basis vectors is pretty arbitrary. If you choose another basis, you convert your matrix by multiplying it with the old basis expressed in the new basis (an NxN dimensional matrix itself).
PCA is a method to find a basis which isn't arbitrary, but specific to your matrix. In particular, it orders base vectors by the amount in which they're present in your set of vectors. If all your vectors point roughly in the same direction, that direction will be the first basis vector. If they're all roughly in the same plane, the major basis vectors for that plane will be your first two vectors. But remember: you'll generally a full MxN basis (unless your matrix is degenerate); it's up to you to decide how many of the Components are Principal.
Now here's the real question: what exactly is "the matrix of the image"? You generally can't treat a 1024 x 768 image as a set of 1024 vectors in 768-dimensional space. Sure, you can perform the PCA operation, and you will get a 1024x768 result matrix, but what does that even mean? They're the basis vectors of your input matrix, but that output doesn't have an image meaning exactly because your input is not a set of vectors.

sparse or dense storage of a matrix

I'm working with large sparse matrices that are not exactly very sparse and I'm always wondering how much sparsity is required for storage of a matrix as sparse to be beneficial? We know that sparse representation of a reasonably dense matrix could have a larger size than the original one. So is there a threshold for the density of a matrix so that it would be better to store it as sparse? I know that the answer to this question usually depends on the structure of the sparsity, etc but I was wondering if there is just some guidelines? for example I have a very large matrix with density around 42%. should I store this matrix as dense or sparse?
scipy.coo_matrix format stores the matrix as 3 np.arrays. row and col are integer indices, data has the same data type as the equivalent dense matrix. So it should be straight forward to calculate the memory it will take as a function of overall shape and sparsity (as well as the data type).
csr_matrix may be more compact. data and indices are the same as with coo, but indptr has a value for each row plus 1. I was thinking that indptr would be shorter than the others, but I just constructed a small matrix where it was longer. An empty row, for example, requires a value in indptr, but none in data or indices. The emphasis with this format is computational efficiency.
csc is similar, but working with columns. Again you should be able to the math to calculate this size.
Brief discussion of memory advantages from MATLAB (using similar storage options)
http://www.mathworks.com/help/matlab/math/computational-advantages.html#brbrfxy
background paper from MATLAB designers
http://www.mathworks.com/help/pdf_doc/otherdocs/simax.pdf
SPARSE MATRICES IN MATLAB: DESIGN AND IMPLEMENTATION

Determinant Value For Very Large Matrix

I have a very large square matrix of order around 100000 and I want to know whether the determinant value is zero or not for that matrix.
What can be the fastest way to know that ?
I have to implement that in C++
Assuming you are trying to determine if the matrix is non-singular you may want to look here:
https://math.stackexchange.com/questions/595/what-is-the-most-efficient-way-to-determine-if-a-matrix-is-invertible
As mentioned in the comments its best to use some sort of BLAS library that will do this for you such as Boost::uBLAS.
Usually, matrices of that size are extremely sparse. Use row and column reordering algorithms to concentrate the entries near the diagonal and then use a QR decomposition or LU decomposition. The product of the diagonal entries of the second factor is - up to a sign in the QR case - the determinant. This may still be too ill-conditioned, the best result for rank is obtained by performing a singular value decomposition. However, SVD is more expensive.
There is a property that if any two rows are equal or one row is a constant multiple of another row we can say that determinant of that matrix is zero.It is applicable to columns as well.
From my knowledge your application doesnt need to calculate determinant but the rank of matrix is sufficient to check if system of equations have non-trivial solution : -
Rank of Matrix