I've a question regarding pca variant.
Let X ∈ RD×n be a data matrix, {ui}di=1 be the d principal components of X, and where μ ∈ Rd is the sample mean vector and 1n ∈ Rn is the n-dimensional ones vector.
We can define the new PCA based coordinates asαi = uTi(X − μ1Tn ), i = 1, ..., d.
can u explain why the new PCA features αi, αj have zero mean and are uncorrelated.
Related
I am new to OpenVX, learning from the document that OpenVX uses a row-major storage. And the below matrix access example illustrate it, just like the ordinary row-major access pattern as we used in plain C code.
Then I go to the vx_matrix and vxCreateMatrix document page. The former has such statements:
VX_MATRIX_ROWS - The M dimension of the matrix [REQ-1131]. Read-only [REQ-1132]. Use a vx_size parameter.
VX_MATRIX_COLUMNS - The N dimension of the matrix [REQ-1133]. Read-only [REQ-1134]. Use a vx_size parameter.
While the latter said:
vx_matrix vxCreateMatrix(
vx_context c,
vx_enum data_type,
vx_size columns,
vx_size rows);
So according to my comprehension, in OpenVX world, when i said an MxN matrix, M refers to the row size and N refers to the column size. And the vxCreateMatrix declaration just follow what the row-major storage said, parameter column first and then row.
However, it really confuses me when i reach Warp Affine page, it said:
This kernel performs an affine transform with a 2x3 Matrix M with this method of pixel coordinate translation [REQ-0498]:
And the C declartion:
// x0 = a x + b y + c;
// y0 = d x + e y + f;
vx_float32 mat[3][2] = {
{a, d}, // 'x' coefficients
{b, e}, // 'y' coefficients
{c, f}, // 'offsets'
};
vx_matrix matrix = vxCreateMatrix(context, VX_TYPE_FLOAT32, 2, 3);
vxCopyMatrix(matrix, mat, VX_WRITE_ONLY, VX_MEMORY_TYPE_HOST);
If the M is a 2x3 matrix, according to the previous section, it should has 2 row and 3 column. Then why should it be declared as mat[3][2] and createMatrix accept column=2 and row=3 as argument? Is my comprehension totally wrong?
This would be a good start and help for your implementation
https://software.intel.com/content/www/us/en/develop/documentation/sample-color-copy/top/color-copy-pipeline/color-copy-pipeline-the-scan-pre-process-openvx-graph.html
I want use LAPACK to calculate Q * x and Q^T * x, where Q comes from the reduced QR factorization of an m by n matrix A (m > n), stored in the form of Householder reflectors and a vector tau, as obtained from DGEQRF and x is a vector of length n in the case of Q * x and length m in the case of Q^T * x.
The documentation of DORMQR states that x is overwritten with the result, which already confuses me, since x and Q * x obviuosly have different dimensions if the original matrix A and subsequently its reduced Q are not square. Furthermore it states that
"Q is of order M if SIDE = 'L' and of order N if SIDE = 'R'."
In my case, only the first half applies and M refers to the length of x. What do they mean by order? I have rarely ever heard the term "order" in the context of non-square matrices, and if so, it would be something like m by n, and not just a single number. Do they mean rank?
Can I even use DORMQR to calculate both Q * x and Q^T * x for a non-square Q, or is it not designed for this? Do I need to pad x with zeros?
DORMQR applies only to Q a square matrix. Although the input A to the procedure relates to elementary reflectors, such as output of DGEQRF which can be more general, the documentation has the additional restriction that Q "is a real orthogonal matrix".
Of course, to be orthogonal, Q must be square.
I'm trying to estimate a 3D rotation matrix between two sets of points, and I want to do that by computing the SVD of the covariance matrix, say C, as follows:
U,S,V = svd(C)
R = V * U^T
C in my case is 3x3 . I am using the Eigen's JacobiSVD module for this and I only recently found out that it stores matrices in column-major format. So that has had me confused.
So, when using Eigen, should I do:
V*U.transpose() or V.transpose()*U ?
Additionally, the rotation is accurate upto changing the sign of the column of U corresponding to the smallest singular value,such that determinant of R is positive. Let's say the index of the smallest singular value is minIndex .
So when the determinant is negative, because of the column major confusion, should I do:
U.col(minIndex) *= -1 or U.row(minIndex) *= -1
Thanks!
This has nothing to do with matrices being stored row-major or column major. svd(C) gives you:
U * S.asDiagonal() * V.transpose() == C
so the closest rotation R to C is:
R = U * V.transpose();
If you want to apply R to a point p (stored as column-vector), then you do:
q = R * p;
Now whether you are interested R or its inverse R.transpose()==V.transpose()*U is up to you.
The singular values scale the columns of U, so you should invert the columns to get det(U)=1. Again, nothing to do with storage layout.
boost::number::ublas contains the M::size_type lu_factorize(M& m) function. Its name suggests that it performs the LU decomposition of a given matrix m, i.e. should produce two matrices that m = L*U. There seems to be no documentation provided for this function.
It is easy to deduce that it returns 0 to indicate successful decomposition, and a non-zero value when the matrix is singular. However, it is completely unclear where is the result. Taking the matrix by reference suggests that it works in-place, however it should produce two matrices (L and U) not one. So what does it do?
There is no documentation in boost, but looking at the documentation of SciPy's lu_factor one can see, that it's not uncommon to return one result for the LU decomposition.
This is enough, because in a typical approach to LU decomposition, L's diagonal consists of ones only, as presented in this answer from Mathematics, for example.
So, it is possible to fit both L and U into one matrix, putting L in result's lower part, omitting the diagonal (which is assumed to contain only ones), and U in the upper part. For example, for a 3x3 problem the result is:
u11 u12 u13
m = l21 u22 u23
l31 l32 u33
which implies:
1 0 0
L = l21 1 0
l31 l32 1
and
u11 u12 u13
U = 0 u22 u23
0 0 u33
Inspecting boost's void lu_substitute(const M& m, vector_expression<E>& e) function, from the same namespace seems to confirm this. It solves the equation LUx = e, where both L and U are contained in its m argument in two steps.
First solve Lz = e for z, where z = Ux, using lower part of m:
inplace_solve(m, e, unit_lower_tag ());
then, having computed z = Ux (with e modified in place), Ux = e can be solved, using upper part of m:
inplace_solve(m, e, upper_tag ());
inplace_solve is mentioned in the documentation, and it:
Solves a system of linear equations with triangular form, i.e. A is triangular.
So everything seems to make sense.
The boost doesn't have document of LU factorization (a lower triangular matrix L and upper triangular matrix U), but the source code shared with the public.
If the code is hard to follow, please check the webpage by Nick Higham. It had an detailed explanation. Here are an example from the link:
Let's say we need to solve Ax = b.
(1) Make LU from input matrix, A
[3 -1 1 1]
[-1 3 1 -1] ->
[-1 -1 3 1]
[1 1 1 3]
Low
[1 0 0 0]
[-1/3 1 0 0]
[-1/3 -1/2 1 0]
[1/3 1/2 0 1]
Upper
[3 -1 1 1]
[0 8/3 4/3 -2/3]
[0 0 4 1]
[0 0 0 3]
This example looks straight forward to human but algorithm wise could be numerous steps. This is why LU Factorization came. Methodically, Relation with Gaussian Elimination, Schur Complements, and Block Implementations are some.
(2) Solve the triangular systems Ly = b and Ux = y, since then b = L(Ux).
I know perspective division is done by dividing x,y, and z by w, to get normalized device coordinates. But I am not able to understand the purpose of doing that. Also, does it have anything to do with clipping?
Some details that complement the general answers:
The idea is to project a point (x,y,z) on screen to have (xs,ys,d).
The next figure shows this for the y coordinate.
We know from school that
tan(alpha) = ys / d = y / z
This means that the projection is computed as
ys = d*y/z = y /w
w = z / d
This is enough to apply a projection.
However in OpenGL, you want (xs,ys,zs) to be normalized device coordinates in [-1,1] and yes this has something to do with clipping.
The extrema values for (xs,ys,zs) represent the unit cube and everything outside it will be clipped.
So a projection matrix usually takes into consideration the clipping limits (Frustum) to make a single transformation that, with the perspective division, simultaneously apply a projection and transform the projected coordinates along with the z to normalized device coordinates.
I mean why do we need that?
In layman terms: To make perspective distortion work. In a perspective projection matrix, the Z coordinate gets "mixed" into the W output component. So the smaller the value of the Z coordinate, i.e. the closer to the origin, the more things get scaled up, i.e. bigger on screen.
To really distill it to the basic concept, and why the op is division (instead of e.g. square root or some such), consider that an object twice as far should appear with dimensions exactly one half as large. Obtain 1/2 from 2 by... division.
There are many geometric ways to arrive at the same conclusion. A diagram serves as visual proof for this, really.
Dividing x, y, z by w is a "trick" you do with "homogeneous coordinates". To convert a R⁴ vector back to R³ by dividing by the 4th component (or w component as you said). A process called dehomogenizing.
Why you use homogeneous coordinate? That topic is a little bit more involved, I try to explain. I hope I do it justice.
However I will use the x1, x2, x3, x4 as the components of a vector instead of x, y, z, w:
Consider a 3x3 Matrix M and column vectors x, a, b, c of R³. x=(x1, x2, x3) and x1,x2,x3 being scalars or components of x.
With the 3x3 Matrix can do all linear transformations on a vector x you could do with the linear combination:
x' = x1*a + x2*b + x3*c (1).
(x' is the transformed vector that holds the result of transforming x).
Khan Academy on his Course Linear Algebra has a section explaining the fact that every linear transformation can be written as a matrix product with a vector.
You can try this out for example by putting the column vectors a, b, c in the columns of the Matrix M = [ a b c ].
So with the matrix product you essentially get the upper linear combination:
x' = M * x = [a b c] * x = a*x1 + b*x2 + c*x3 (2).
However this operation only accounts for rotation, scaling and shearing transformations. The origin (0, 0, 0) will always stay at (0, 0, 0).
For this you need another kind of transformation named "translation" (moving a vector or adding a vector to the vector).
Consider the translation column vector t = (t1, t2, t3) and the linear combination
x' = x1*a + x2*b + x3*c + t (3).
With this linear combination you can translate, rotate, scale and shear a vector. As you can see this Linear Combination does actually move the origin vector (0, 0, 0) to (0+t1, 0+t2, 0+t3).
However you can't put this translation into a 3x3 Matrix.
So what Graphics Programmers or Mathematicians came up with is adding another dimension to the Matrix and Vectors like this:
M is 4x4 Matrix, x~ vector in R⁴ with x~=(x1, x2, x3, x4). a, b, c, t also being column vectors of R⁴ (last components of a,b,c being 0 and last component for t being 1 - I keep the names the same to later show the similarity between homogeneous linear combination and (3) ). x~ is the homogeneous coordinate of x.
Now watch what happens if we take a vector x of R³ and put it into x~ of R⁴.
This vector will be in homogeneous coordinates in R⁴ x~=(x1, x2, x3, 1). The last component simply being 1 if it is a point and 0 if it's simply a direction (which couldn't be translated anyway).
So you have the linear combination:
x~' = M * x = [a b c t] * x = x1*a + x2*b + x3*c + x4*t (4).
(x~' is the result vector when transforming the homogeneous vector x~)
Since we took a vector from R³ and put it into R⁴ our x4 component is 1 we have:
x~' = x1*a + x2*b + x3*c + 1*t
<=> x~' = x1*a + x2*b + x3*c + t (5).
which is exactly the upper linear transformation (3) with the translation t. This is called an affine transformation (linear transf. + translation).
So with a 3x3 Matrix and a vector of R³ you couldn't do translations. However adding another dimension having a vector in R⁴ and a Matrix in R^4x4 you actually can do it.
However when you want to return to R³ you have to divide the first components with the last one. This is called "dehomogenizing". Which is the the x4 component or in your variable naming the w-component. So x is the original coordinate in R³. Be x~ in R⁴ and the homogenized vector of x. And x' in R³ of x~.
x' = (x1/x4, x2/x4, x3/x4) (6).
Then x' is the dehomogenized vector of the vector x~.
Coming back to perspective division:
(I will leave it out, because many here have explained the divide by z already. It's because of the relationship of a right triangle, being similar which leads you to simplify that with a given focal length f a z values with y coordinate lands at y' = f*y/z. Also since you stated [I hope I didn't misread that you already know why this is done I simply leave a link to a YT-Video here, I find it very well explained on the course lecture CMU 15-462/662 ).
When dehomogenizing the division by the w-component is a pretty handy property when returning to R³. When you apply homogeneous perspective Matrix of 4x4 on a vector you simply put the z component into the w component and let the dehomogenizing process (as in (6) ) perform the perspective divide. So you can setup the w-Component in a way that the division by w divides by z and also maps the values from 0 to 1 (basically you put the range of z-near to z-far values into a range floating points are precise at).
This is also described by Ravi Ramamoorthi in his Course CSE167 when he explains how to set up the perspective projection matrix.
I hope this helped to understand the rational of putting z into the w component. Sorry for my horrible formatting and lengthy text. Yet I hope it helped more than it confused.
Best of luck!
Actually, via standard notational convention from a 4x4 perspective matrix with sightline along a 'z' direction, 'w' differs by 1 from the distance ratio. Also that ratio, though interpreted correctly, is normally expressed as -z/d where 'z' is negative (therefore producing the correct ratio) because, again, in common notational convention, the camera is looking in the negative 'z' direction.
The reason for the offset by 1 needs to be explained. Many references put the origin at the image plane rather than the center of projection. With that convention (again with the camera looking along the negative 'z' direction) the distance labeled 'z' in the similar triangles diagram is thereby replaced by (d-z). Then substituting that for 'z' the expression for 'w' becomes, instead of 'z/d', (d-z)/d = [1-z/d]. To some these conventions may seem unorthodox but they are quite popular among analysts.