i have a data set of eeg recordings with 5000 rows and 59 coloumns. as coloumns are channels of eeg headsets and rows represents signal amplitude at each channel. now i used princomp to reduce the dimensions. but i am confused in variable and observation as i have a lable vector of 5000 elements to classify tha data but if i used princomp at 5000x59 i got 59x59 matrix which can't be classified by given label and if i apply command on 59x5000 data i got 5000x5000 matrix it means pca increase the dimension instead of decreasing. so kindly make me understand how can i percept variables and observation in my data.
thnx
The Matlab command princomp can have multiple return values. If we denote your original 5000x59 data matrix as D, then
[C, S] = princomp(D);
gives you the principal component coefficients C (59x59), and the actual principal components S (5000x59), where C is the projection from the original space to the principal component space, and S contains the actual principal components as its columns. The relationship of the 3 matrices is
D * C = S
Btw, if you only care about the principal components S and don't need the coefficients C, you can do
[~, S] = princomp(D);
For more details, check out the official Matlab princomp doc.
Related
I am using the coefplot command in Stata to plot coefficients and confidence intervals from multiple regression models. I am plotting the same coefficient (X) from 4 different model specifications.
There is one model specification (alternative standard errors) that I cannot figure out how to estimate in Stata, but am able to estimate using R. That means that for one model specification, I have the standard error in R, but not in Stata.
Is there an easy way to manually alter the standard errors in coefplot?
My code is:
coefplot A B C D, drop(_cons) xline(0) keep(X)
How can I add to this code that the standard errors for coefficient X in model D should be Z?
You can manually edit the e(V) (variance-covariance matrix) and e(b) vectors. For this, define a program:
est restore estimates1
capture program drop changeeV
program changeeV, eclass
tempname b V
matrix `b' = e(b)
matrix `V' = e(V)
matrix `V'[1,1] = 1.1 // Add values of new variance-covariance matrix
matrix `b'[1,1] = 10 // Add new coefficient vector
ereturn post `b' `V' // Repost new vectors
ereturn local cmd "reg outcome treatment covariates"
// Repost initial command (required)
end
changeeV // Execute program
est store eaX // Store new generated estimtes
Note that, to reach the covariance matrix, you need to take the square of the standard errors from your output in R. Good luck!
After running glm I can type matrix list r(table) and see a table of all of my results. If I wish, I can write slopes and SEs to variables, e.g., gen B=_b[x1] or gen se=_se[x1]. However, this does not work with the confidence limits, ll and ul. How can I access them in a similar manner?
I am not sure if the _b[] and _se[] results are associated with r(table)--I have thought they are products of e(b) and e(V).
Anyway, since you have r(table), you can just save the results into another matrix, and then use the regular matrix operations to put the lower bounds and upper bounds into new matrices. If for some reason transformation into variables is desired (for example, plotting), there's always -svmat-.
sysuse auto,clear
glm price mpg foreign, f(gaussian)
mat r=r(table)
matrix ll=r["ll",....]' // see -help matrix extraction-; transposed for svmat
svmat ll,names(ll) // lower bounds are in variable ll1
From time to time I have to port some Matlab Code to OpenCV.
Almost always there is a way to do it and an appropriate function in OpenCV. Nevertheless its not always easy to find.
Therefore I would like to start this summary to find and gather some equivalents between Matlab and OpenCV.
I use the Matlab function as heading and append its description from Matlab help. Afterwards a OpenCV example or links to solutions are appreciated.
Repmat
Replicate and tile an array. B = repmat(A,M,N) creates a large matrix B consisting of an M-by-N tiling of copies of A. The size of B is [size(A,1)*M, size(A,2)*N]. The statement repmat(A,N) creates an N-by-N tiling.
B = repeat(A, M, N)
OpenCV Docs
Find
Find indices of nonzero elements. I = find(X) returns the linear indices corresponding to the nonzero entries of the array X. X may be a logical expression. Use IND2SUB(SIZE(X),I) to calculate multiple subscripts from the linear indices I.
Similar to Matlab's find
Conv2
Two dimensional convolution. C = conv2(A, B) performs the 2-D convolution of matrices A and B. If [ma,na] = size(A), [mb,nb] = size(B), and [mc,nc] = size(C), then mc = max([ma+mb-1,ma,mb]) and nc = max([na+nb-1,na,nb]).
Similar to Conv2
Imagesc
Scale data and display as image. imagesc(...) is the same as IMAGE(...) except the data is scaled to use the full colormap.
SO Imagesc
Imfilter
N-D filtering of multidimensional images. B = imfilter(A,H) filters the multidimensional array A with the multidimensional filter H. A can be logical or it can be a nonsparse numeric array of any class and dimension. The result, B, has the same size and class as A.
SO Imfilter
Imregionalmax
Regional maxima. BW = imregionalmax(I) computes the regional maxima of I. imregionalmax returns a binary image, BW, the same size as I, that identifies the locations of the regional maxima in I. In BW, pixels that are set to 1 identify regional maxima; all other pixels are set to 0.
SO Imregionalmax
Ordfilt2
2-D order-statistic filtering. B=ordfilt2(A,ORDER,DOMAIN) replaces each element in A by the ORDER-th element in the sorted set of neighbors specified by the nonzero elements in DOMAIN.
SO Ordfilt2
Roipoly
Select polygonal region of interest. Use roipoly to select a polygonal region of interest within an image. roipoly returns a binary image that you can use as a mask for masked filtering.
SO Roipoly
Gradient
Approximate gradient. [FX,FY] = gradient(F) returns the numerical gradient of the matrix F. FX corresponds to dF/dx, the differences in x (horizontal) direction. FY corresponds to dF/dy, the differences in y (vertical) direction. The spacing between points in each direction is assumed to be one. When F is a vector, DF = gradient(F)is the 1-D gradient.
SO Gradient
Sub2Ind
Linear index from multiple subscripts. sub2ind is used to determine the equivalent single index corresponding to a given set of subscript values.
SO sub2ind
backslash operator or mldivide
solves the system of linear equations A*x = B. The matrices A and B must have the same number of rows.
cv::solve
I am trying to do a PCA on some volatility data, and let's just say I can propose a model as the following:
volatility = bata0 + beta1*x + beta2* x^2
where x are some observations, say for example, moneyness and so on.
So in Matlab, what I did was to say Y=[ones x x^2] and then do pca(Y)
and for some reason, my first row in my coefficient matrix is always something like 0 0 1, i.e., 0 everywhere else except the last column, and output of atent always shows the highest value in the first row as well, no matter how I change the model.
Obviously, this can't be the case where the last term in every single model is explained well by the last term in the equation. And if I remove the constant term in Y (i.e., Y= [x x^2] then the first row of coefficient matrix becomes something more normal (i.e., non-zero value everywhere).
So my questions are:
is my way of doing PCA right?
Does PCA automatically rearrange the principal component and hence the first row in the coefficient matrix with all zeros except 1 at the last column may not necessarily represent the last term in the equation and
if it is wrong, what is the correct way of doing it?
From Matlab's documentation for princomp:
COEFF = princomp(X) performs principal components analysis (PCA) on
the n-by-p data matrix X, and returns the principal component
coefficients, also known as loadings. Rows of X correspond to
observations, columns to variables. COEFF is a p-by-p matrix, each
column containing coefficients for one principal component. The
columns are in order of decreasing component variance.
I am using PCA on binary attributes to reduce the dimensions (attributes) of my problem. The initial dimensions were 592 and after PCA the dimensions are 497. I used PCA before, on numeric attributes in an other problem and it managed to reduce the dimensions in a greater extent (the half of the initial dimensions). I believe that binary attributes decrease the power of PCA, but i do not know why. Could you please explain me why PCA does not work as good as in numeric data.
Thank you.
The principal components of 0/1 data can fall off slowly or rapidly,
and the PCs of continuous data too —
it depends on the data. Can you describe your data ?
The following picture is intended to compare the PCs of continuous image data
vs. the PCs of the same data quantized to 0/1: in this case, inconclusive.
Look at PCA as a way of getting an approximation to a big matrix,
first with one term: approximate A ~ c U VT, c [Ui Vj].
Consider this a bit, with A say 10k x 500: U 10k long, V 500 long.
The top row is c U1 V, the second row is c U2 V ...
all the rows are proportional to V.
Similarly the leftmost column is c U V1 ...
all the columns are proportional to U.
But if all rows are similar (proportional to each other),
they can't get near an A matix with rows or columns 0100010101 ...
With more terms, A ~ c1 U1 V1T + c2 U2 V2T + ...,
we can get nearer to A: the smaller the higher ci, the faster..
(Of course, all 500 terms recreate A exactly, to within roundoff error.)
The top row is "lena", a well-known 512 x 512 matrix,
with 1-term and 10-term SVD approximations.
The bottom row is lena discretized to 0/1, again with 1 term and 10 terms.
I thought that the 0/1 lena would be much worse -- comments, anyone ?
(U VT is also written U ⊗ V, called a "dyad" or "outer product".)
(The wikipedia articles
Singular value decomposition
and Low-rank approximation
are a bit math-heavy.
An AMS column by
David Austin,
We Recommend a Singular Value Decomposition
gives some intuition on SVD / PCA -- highly recommended.)