Deciding about dimensionality reduction with PCA - pca

I have 2D data (I have a zero mean normalized data). I know the covariance matrix, eigenvalues and eigenvectors of it. I want to decide whether to reduce the dimension to 1 or not (I use principal component analysis, PCA). How can I decide? Is there any methodology for it?
I am looking sth. like if you look at this ratio and if this ratio is high than it is logical to go on with dimensionality reduction.
PS 1: Does PoV (Proportion of variation) stands for it?
PS 2: Here is an answer: https://stats.stackexchange.com/questions/22569/pca-and-proportion-of-variance-explained does it a criteria to test it?

PoV (Proportion of variation) represents how much information of data will remain relatively to using all of them. It may be used for that purpose. If POV is high than less information will be lose.

You want to sort your eigenvalues by magnitude then pick the highest 1 or 2 values. Eigenvalues with a very small relative value can be considered for exclusion. You can then translate data values and using only the top 1 or 2 eigenvectors you'll get dimensions for plotting results. This will give a visual representation of the PCA split. Also check out scikit-learn for more on PCA. Precisions, recalls, F1-scores will tell you how well it works
from http://sebastianraschka.com/Articles/2014_pca_step_by_step.html...
Step 1: 3D Example
"For our simple example, where we are reducing a 3-dimensional feature space to a 2-dimensional feature subspace, we are combining the two eigenvectors with the highest eigenvalues to construct our d×kd×k-dimensional eigenvector matrix WW.
matrix_w = np.hstack((eig_pairs[0][1].reshape(3,1),
eig_pairs[1][1].reshape(3,1)))
print('Matrix W:\n', matrix_w)
>>>Matrix W:
[[-0.49210223 -0.64670286]
[-0.47927902 -0.35756937]
[-0.72672348 0.67373552]]"
Step 2: 3D Example
"
In the last step, we use the 2×32×3-dimensional matrix WW that we just computed to transform our samples onto the new subspace via the equation
y=W^T×x
transformed = matrix_w.T.dot(all_samples)
assert transformed.shape == (2,40), "The matrix is not 2x40 dimensional."

Related

Principal component analysis on proportional data

Is it valid to run a PCA on data that is comprised of proportions? For example, I have data on the proportion of various food items in the diet of different species. Can I run a PCA on this type of data or should I transform the data or do something else beforehand?
I had a similar question. You should search for "compositional data analysis". There are transformation to apply to proportions in order to analyze them with multivariate tecniques such as PCA. You can find also "robust" PCA algorithms to run your analysis in R. Let us know if you find an appropriate solution to your specific problem.
I don't think so.
PCA will give you "impossible" answers. You might get principal components with values that proportions can't have, like negative values or values greater than 1. How would you interpret this component?
In technical terms, the support of your data is a subset of the support of PCA. Say you have $k$ classes. Then:
the support for PCA vectors is $\R^k$
the support for your proportion vectors is the $k$- dimensional simplex. By simplex I mean the set of $p$ vectors of length $k$ such that:
$0 \le p_i \le 1$ where $i = 1, ..., k$
$\sum_{i=1}^k{p_i} = 1$
One way around this is if there's a one to one mapping between the $k$-simplex to all of $\R^k$. If so, you could map from your proportions to $\R^k$, do PCA there, then map the PCA vectors to the simplex.
But I'm not sure the simplex is a self-contained linear space. If you add two elements of the simplex, you don't get an element of the simplex :/
A better approach, I think, is clustering, eg with Gaussian mixtures, or spectral clustering. This is related to PCA. But a nice property of clustering is you can express any element of your data as a "convex combination" of the clusters. If you analyze your proportion data and find clusters, they (unlike PCA vectors) will be within the simplex space, and any mixture of them will be, too.
I also recommend looking into nonnegative matrix factorization. This is like PCA but, as the name suggests, avoids negative components and also negative eigenvectors. It's very useful for inferring structure in strictly positive data, like proportions. But nmf does not give you a basis for simplex space.

PCA in thousands of dimensions

I have vectors in 4096 VLAD codes, each one of them representing an image.
I have to run PCA on them without reducing their dimension on learning datasets with less than 4096 images (e.g., the Holiday dataset with <2k images.) to obtain the rotation matrix A.
In this paper (which also explains why I need to run this PCA without dimensionality reduction) they solved the problem with this approach:
For efficient computation of A, we compute at most the first 1,024 eigenvectors by eigendecomposition of the covariance matrix, and
the remaining orthogonal complements up to D-dimensional are filled using Gram-Schmidt orthogonalization.
Now, how do I implement this using C++ libraries? I'm using OpenCV, but cv::PCA seems not to offer such a strategy. Is there any way to do this?

What does eigen value of structure tensor matrix denote?

It is known that good feature point across two images can be determined properly, if
the two eigen value of above matrix, are greater than 0. Can someone explain, what does it mean to have both eigen value greater than 0 and why the feature point is not good if either of them is approx. equal to 0.
Note that this matrix always has nonnegative eigenvalues. Basically this rule says that one should favor rapid change in all directions, that is corners are better features than edges or flat surfaces.
The biggest eigenvalue corresponds to the eigenvector pointing towards the direction of the most significant change in the image at the point u.
If the two eigenvalues are small the image at point u does not change much.
If one of the eigenvectors is large and the other is small this point might lie on an edge in the image but it will be difficult to figure out where exactly on that edge.
If both are large, the point is like a corner.
There is a nice presentation with examples in the panoramic stitching slide deck from a course taught by Rajesh Rao at the University of Washington.
Here E(u,v) denotes the Eucledian distance between the two areas in the vicinities of pixels shifted by the vector (u,v) from each other. This distance tells how easy it is to distinguish the two pixels from one another.
Edit The matrix of image derivatives is denoted H in this illustration probably because of its relation to Harris corner detection algorithm.
That is related with the concept of Texturedness in the paper of Thomasi-Shi "Good features to track".
The idea of Textureness is to provide a rating of texture to make features (within a window) identifiable and unique. For instance, lines are not good features since are not unique (see Figure 3.9a)
To solve equation an optical flow equation, it must be possible to invert J (Hessian matrix). In practice next conditions must be satisfied:
Eigenvalues of J cannot differ by several orders of magnitude.
Eigenvalues of Hessian overcome image noise levels λnoise: implies that both eigenvalues of J must be large.
For the first condition we know that the greatest eigenvalue cannot be arbitrarily large because intensity variations in a window are bounded by the maximum allowable pixel value.
Regarding to second condition, being λ1 and λ2 two eigenvalues of J, following situations may rise (See Figure 3.10):
• Two small eigenvalues λ1 and λ2: means a roughly constant intensity profile within a window (Pink region). Problem of figure 3.9-b.
• A large and a small eigenvalue: means unidirectional texture patter (Violet or gray region). Problem of figure 3.9-a.
• λ1 and λ2 are both large: can represent a corner, salt and pepper textures or any other pattern that can be tracked reliably (Green region).
Some references:
1 - ORTIZ CAYON, R. J. (2013). Online video stabilization for UAV. Motion estimation and compensation for unnamed aerial vehicles.
2 - Shi, J., & Tomasi, C. (1994, June). Good features to track. In Computer Vision and Pattern Recognition, 1994. Proceedings CVPR'94., 1994 IEEE Computer Society Conference on (pp. 593-600). IEEE.
3 - Richard Szeliski. Image alignment and stitching: a tutorial. Found.
Trends. Comput. Graph. Vis., 2(1):1–104, January 2006.

Gaussian Blur with FFT Questions

I have a current implementation of Gaussian Blur using regular convolution. It is efficient enough for small kernels, but once the kernels size gets a little bigger, the performance takes a hit. So, I am thinking to implement the convolution using FFT. I've never had any experience with FFT related image processing so I have a few questions.
Is a 2D FFT based convolution also separable into two 1D convolutions ?
If true, does it go like this - 1D FFT on every row, and then 1D FFT on every column, then multiply with the 2D kernel and then inverse transform of every column and the inverse transform of every row? Or do I have to multiply with a 1D kernel after each 1D FFT Transform?
Now I understand that the kernel size should be the same size as the image (row in case of 1D). But how will it affect the edges? Do I have to pad the image edges with zeros? If so the kernel size should be equal to the image size before or after padding?
Also, this is a C++ project, and I plan on using kissFFT, since this is a commercial project. You are welcome to suggest any better alternatives. Thank you.
EDIT: Thanks for the responses, but I have a few more questions.
I see that the imaginary part of the input image will be all zeros. But will the output imaginary part will also be zeros? Do I have to multiply the Gaussian kernel to both real and imaginary parts?
I have instances of the same image to be blurred at different scales, i.e. the same image is scaled to different sizes and blurred at different kernel sizes. Do I have to perform a FFT every time I scale the image or can I use the same FFT?
Lastly, If I wanted to visualize the FFT, I understand that a log filter has to be applied to the FFT. But I am really lost on which part should be used to visualize FFT? The real part or the imaginary part.
Also for an image of size 512x512, what will be the size of real and imaginary parts. Will they be the same length?
Thank you again for your detailed replies.
The 2-D FFT is seperable and you are correct in how to perform it except that you must multiply by the 2-D FFT of the 2D kernel. If you are using kissfft, an easier way to perform the 2-D FFT is to just use kiss_fftnd in the tools directory of the kissfft package. This will do multi-dimensional FFTs.
The kernel size does not have to be any particular size. If the kernel is smaller than the image, you just need to zero-pad up to the image size before performing the 2-D FFT. You should also zero pad the image edges since the convoulution being performed by multiplication in the frequency domain is actually circular convolution and results wrap around at the edges.
So to summarize (given that the image size is M x N):
come up with a 2-D kernel of any size (U x V)
zero-pad the kernel up to (M+U-1) x (N+V-1)
take the 2-D fft of the kernel
zero-pad the image up to (M+U-1) x (N+V-1)
take the 2-D FFT of the image
multiply FFT of kernel by FFT of image
take inverse 2-D FFT of result
trim off garbage at edges
If you are performing the same filter multiple times on different images, you don't have to perform 1-3 every time.
Note: The kernel size will have to be rather large for this to be faster than direct computation of convolution. Also, did you implement your direct convolution taking advantage of the fact that a 2-D gaussian filter is separable (see this a few paragraphs into the "Mechanics" section)? That is, you can perform the 2-D convolution as 1-D convolutions on the rows and then the columns. I have found this to be faster than most FFT-based approaches unless the kernels are quite large.
Response to Edit
If the input is real, the output will still be complex except for rare circumstances. The FFT of your gaussian kernel will also be complex, so the multiply must be a complex multiplication. When you perform the inverse FFT, the output should be real since your input image and kernel are real. The output will be returned in a complex array, but the imaginary components should be zero or very small (floating point error) and can be discarded.
If you are using the same image, you can reuse the image FFT, but you will need to zero pad based on your biggest kernel size. You will have to compute the FFTs of all of the different kernels.
For visualization, the magnitude of the complex output should be used. The log scale just helps to visualize smaller components of the output when larger components would drown them out in a linear scale. The Decibel scale is often used and is given by either 20*log10(abs(x)) or 10*log10(x*x') which are equivalent. (x is the complex fft output and x' is the complex conjugate of x).
The input and output of the FFT will be the same size. Also the real and imaginary parts will be the same size since one real and one imaginary value form a single sample.
Remember that convolution in space is equivalent to multiplication in frequency domain. This means that once you perform FFT of both image and mask (kernel), you only have to do point-by-point multiplication, and then IFFT of the result. Having said that, here are a few words of caution.
You probably know that in digital signal processing, we often use circular convolution, not linear convolution. This happens because of curious periodicity. What this means in simple terms is that DFT (and FFT which is its computationally efficient variant) assumes that you signal is periodic, and when you filter your signal in such manner -- suppose your image is N x M pixels -- that it takes pixel at (1,m) to the the neighbor or pixel at (N, m) for some m<M. You signal virtually wraps around onto itself. This means that your Gaussian mask will be averaging pixels on the far right with pixels on the far left, and same goes for top and bottom. This might or might not be desired, but in general one has to deal with edging artifacts anyway. It is however much easier to forget about this issue when dealing with FFT multiplication because the problem stops being apparent. There are many ways to take care of this problem. The best way is to simply pad your image with zeros and remove the extra pixels later.
A very neat thing about using a Gaussian filter in frequency domain is that you never really have to take its FFT. It si a well-know fact that Fourier transform of a Gaussian is a Gaussian (some technical details here). All you would have to do then is pad you image with zeros (both top and bottom), generate a Gaussian in the frequency domain, multiply them together and take IFFT. Then you're done.
Hope this helps.

CUDA cufftPlan2d plan size question

I'm studying the code behind the convolutionFFT2D example of the Nvidia CUDA sdk, but I don't get the point of this line:
cufftPlan2d(&fftPlan, fftH, fftW/2, CUFFT_C2C);
Apparently this initializes a complex plane for the FFT to be running in, but I don't see the point of dividing the plan width by 2.
Just to be precise: the fftH and fftW are rounded values for imageX+kernelX+1 and imageY+kernelY+1 dimensions (just for speed reasons). I know that in the frequency domain you usually have a positive component and a symmetric negative component of the same frequency.. but this sounds like cutting half of my image data away..
Can someone explain this to me a little better? I've never used a FFT (I just know the theory behind a fourier transformation)
When you perform a real to complex FFT half the frequency domain data is redundant due to symmetry. This is only the case in one axis of a 2D FFT though. You can think of a 2D FFT as two 1D FFT operations, the first operates on all the rows, and for a real valued image this will give you complex row values. In the second stage you apply a 1D FFT to every column, but since the row values are now complex this will be a complex to complex FFT with no redundancy in the output. Hence you only need width / 2 points in the horizontal axis, but you still need height pointe in the vertical axis.