Why use a Matrix for 3D Projection? - opengl

After searching up the calculations for a Projection Matrix (atleast in OpenGL),
Why bother using a Matrix when we have so many empty values? I count 9 entries marked as 0, and only 7 containing useful data. Why not just use a similar 1D array, and just store the data in a list-like shape? Wouldn't this save memory and time creating functions which can manipulate matrices? I'm sure that this entire argument can be used in other topics as-well, which makes me think,
What is the specific reason for using Matrices in projecting 3D environments?

The projection of a 3D point (x,y,z) to the 2D image coordinates (X,Y) can be calculated as a vector-matrix multiplication in homogeneous coordinates:
[ a_00 a_01 a_02 a_03 ] [ x ] [ X W ]
[ a_10 a_11 a_12 a_13 ] * [ y ] = [ Y W ]
[ a_20 a_21 a_22 a_23 ] [ z ] [ Z W ]
[ a_30 a_31 a_32 a_33 ] [ 1 ] [ W ]
with
[ X W ] [ x a_00 + y a_01 + z a_02 + a_03 ]
[ Y W ] [ x a_10 + y a_11 + z a_12 + a_13 ]
[ Z W ] = [ x a_20 + y a_21 + z a_22 + a_23 ]
[ W ] [ x a_30 + y a_31 + z a_32 + a_33 ]
And the pixel coordinates (X,Y) are obtained by dividing the first and second rows by the fourth row. This step is the conversion from homogeneous to cartesian coordinates.
The third row of the OpenGL projection matrix is set up in a way that Z becomes the projected depth, which is such that z values between n and f (near and far planes) are mapped to -1...1. It is the used for depth test/clipping. Because the fourth row is [0 0 -1 0], the conversion from homogeneous to cartesian coordinates corresponds to a division by -z, which results in the perspective transformation (with inverted depth).
Any other way of expressing the projection would involve the same steps, namely the linear transformation, followed by the division by Z for the perspective foreshortening. Matrices are the usual representation in linear algebra to for these operations.
This is not specific for perspective projections, but many 3D transformatios can be expressed using a 4x4 matrix, including rotations, translations, scalings, shearings, reflections, perspective projection, orthogonal projection, and others.
Multiple transformations that should be applied after one another can also be combined into a single 4x4 matrix by matrix multiplication. For example rotations around the X, Y and Z axis, or the MVP matrix. This is the model-view-projection matrix, which translates a 3D point in the local coordinate system of one object in the 3D scene, into its final pixel coordinate on the screen. On these combined matrices all components can be non-zero.
So the advantage is that a single operation, the vector-matrix multiplication is useable for all these cases, instead of several different operations. It is performed in an efficient way on GPU hardware.

It's not just about the single values, it's also about the mathematical properties of a matrix. And the zeros are just as important as the nonzero values! The very layout of the values has meaning!
Specifically the first three columns of a homogenous transformation matrix (like a 3D projection matrix) form the base vectors of local coordinate space, the 4th column defines a translation (which in case of a perspective projection moves the base away from the singularity point at the origin).
So in 3D space you have 3 values per position: You have to translate these three values into 3 values on your screen (the third value translates to a value that's used for depth comparison) and the 4th value (of the position and the destination) is used for perspective distortion. So for each of the 4 values in the original position you must know, how much it contributes to each of the 4 values in the output. If it doesn't contribute (and that's just as important) this is 0. So you need 4 · 4 = 16 values in total. Hence a 4×4 matrix.

It's probably quite rare that the projection matrix would get used as-is. Typically, you're more likely to concatenate the projection matrix with the world and view matrices and multiply by the world-view-proj matrix all in one go.
Also, GPUs are powerful and flexible, but if there's one thing they're best at doing, it's a series of multiply-adds on vectors (although newer hardware is just as efficient with scalar multiply-adds as vector multiply-adds). Matrix-vector multiplies are just a series of vector multiply-adds, and a more compact structure might be less efficient.
That said, your point is not without merit, I am aware of one successful fixed-function based games console which had limited hardware registers for the projection matrix to take advantage of your exact point that most of the entries in the projection matrix are typically unused.

Related

Place a billboard at a given distance so that it occupies a certain size on screen

I have a rectangle that I place on the screen using a simple scale matrix (S). Now I would like to place this rectangle into "3D space", but have it appear just like before on the screen. I found that I can do so by applying the view and projection matrices in inverse. Something like:
S' = (V⁻¹ P⁻¹ S)
Matrix = P V (V⁻¹ P⁻¹ S)
This works fine so far. My rectangle is like a billboard now and I can treat it like any other object, apply P and V and it will show up correctly. However, there is a degeneracy here: I don't specify at which depth the object is placed. It could be twice as far away but x times bigger!
The reason that this is important is that I want to animate the rectangle, say rotate it around the Z axis or move in 3D space. Then I want it to come to a stop and be positioned pixel-perfect on the screen.
How can I place a flat object at a given z distance, such that it appears on screen in a certain way? I already have with the scale matrix that I need to display it in OpenGL without any 3D transformation, that is the matrix for displaying it in NDC or screen coordinates. I also have the projection and view matrices I want to use. How can I go from this to the desired model matrix?
How can I place a flat object at a given z distance, such that it appears on screen in a certain way [...]
Actually you want to draw the object in view space. Define a model matrix for the object that contains only one translation component (0, 0, -z) and skip the view matrix when drawing the object.
Usually the order to transform 3D vertex v to 2D point p is written as a matrix multiplication. [Depending on the API you are using, the order might be reversed. The notation I used here is glsl - friendly]
p = P * V * M * S * v
v = vertex (usually 3D of the form x,y,z,1)
P = projection matrix
V = view (camera) matrix
M = model matrix (or world transformation)
S is a 4x4 object-scaling matrix
matrices are usually 4x4 with the last line/column 0,0,0,1
The model matrix M can be decomposed into a number of sub components such as T translation, S scale and R rotation. Of course here the order matters.
To rotate an object or vertex around itself, first rotate it (as if it is at the origin already) and then translate it to the position it needs to be, for example using vector v with coordinates (x,y,z).
R is a typical 3x3 rotation matrix embedded in a 4x4 with 0,0,0,1 on the last line
v is a 3D vector with coordinates (x,y,z)
T is a 4x4 translation matrix, all zeroes except the last column where it has x,y,z,1
M = T * R (first R, then T)
To rotate an object around an arbitrary point q, first translate it so that it is relative to that point (i.e. q is at the origin, so you subtracted q from v). Then rotate around q (at the origin) by applying R. Lastly place the object it back where it should be (so add q again).
R is your typical 3x3 rotation matrix embedded in a 4x4 with 0,0,0,1 on the last line
v is a 3D vector with coordinates (x,y,z)
T is a 4x4 translation matrix, all zeroes except the last column where it has x,y,z,1
L is another 4x4 translation matrix, but now with q instead of v
L' is the inverse transformation of L
M = T * L * R * L'
Also, scaling your object first (i.e. S is at the end of the multiplications) before translations will keep the translation in world units. Scale after all transformations, in fact scales all translations too, and the object will move over scaled distances.

Understand Translation Matrix in OpenGL

Assume we want to translate a point p(1, 2, 3, w=1) with a vector v(a, b, c, w=0) to a new point p'
Note: w=0 represents a vector and w=1 represent a point in OpenGL, please correct me if I'm wrong.
In Affine transformation definition, we have:
p + v = p'
=> p(1, 2, 3, 1) + v(a, b, c, 0) = p(1 + a, 2 + b, 3 + c, 1)
=> point + vector = point (everything works as expected)
In OpenGL, the translation matrix is as following:
1 0 0 a
0 1 0 b
0 0 1 c
0 0 0 1
I assume (a, b, c, 1) is the vector from Affine transformation definition
why we have w=1, but not w=0 such as
1 0 0 a
0 1 0 b
0 0 1 c
0 0 0 0
Note: w=0 represents a vector and w=1 represent a point in OpenGL, please correct me if I'm wrong.
You are wrong. First of all, this hasn't really anything to do with OpenGL. This is about homogenous coordinates, which is a purely mathematical concept. It works by embedding an n-dimensional vector space into an n+1 dimensional vector space. In the 3D case, we use 4D homogenous coordinates, with the definition that the homogenous vector (x, y, z, w) represents the 3D point (x/w, y/w, z/w) in cartesian coordinates.
As a result, for any w != 0, you get a certain finite point, and for w = 0, you are discribing an infinitely far away point into a specific direction. This means that the homogenous coordinates are more powerful in the regard that they can actually describe infinitely far away points with finite coordinates (which is something which comes very handy for perspective transformations, where infinitely far away points are mapped to finite points, and vice versa).
You can, as a shortcut, imagine (x,y,z,0) as some direction vector. But for a point, it is not just w=1, but any w value unequal 0. Conceptually, this means that any cartesian 3D point is represented by a line in homogenous space (we did go up one dimension, so this actually makes sense).
I assume (a, b, c, 1) is the vector from Affine transformation definition why we have w=1, but not w=0?
Your assumption is wrong. One thing about homogenous coordinates is that we do not apply a translation in the 4D space. We get the effect of the translation in the 3D space by actually doing a shearing operation in 4D space.
So what we really want to do in homogenous space is
(x + w *a, y + w*b, z+ w*c, w)
since the 3D interpretation of the resulting vector will then be
(x + w*a) / w == x/w + a
(y + w*b) / w == y/w + b
(z + w*c) / w == z/w + c
which will represent the translation that we were after.
So to try to make this even more clear:
What you wrote in your question:
p(1, 2, 3, 1) + v(a, b, c, 0) = p(1 + a, 2 + b, 3 + c, 1)
Is explicitely not what we want to do. What you describe is an affine translation with respect to the 4D vector space.
But what we actually want is a translation in the 3D cartesian coordinates, so
(1, 2, 3) + (a, b, c) = (1 + a, 2 + b, 3 + c)
Applying your formula would actually mean doing a translation in the homogenous space, which would have the effect of doing a translation which is scaled by the w coordinate, while the formula I gave will always translate the point by (a,b,c), no matter what w we chose for the point.
This is of course not true if we chose w=0. Then, we will get no change at all, which is also correct because a translation will never change directions - your formula would change the direction. Your formula is correct only for w=1, which is aonly a special case. But the key point here is that we are not doing a vector addition after all, but a matrix * vector multiplication. And homogenous coordinates just allow us (among other, more powerful things), to represent a translation via matrix multiplication. But this does not mean that we can just interpret the last column as a translation vector as if we did vector addition.
Simple Answer
The reason is the way how matrix multiplications work. If you multiply a matrix by a vector then the w-component of the result is the inner product of the 4th line of the matrix with the vector. After applying the transformation, a point should still be a point and a direction should be a direction. If you would set that to a 0-vector, the result will always be 0 and thus, the resulting vector will have changed from position (w=1) to direction (w=0).
More detailed answer
The definition of a affine transformation is:
x' = A * x + t,
where is a A is a linear map and t a translation. Traditionally, linear maps are written by mathematicians in matrix form. Note, that t is here, similar to x, a 3-dimensional vector. It would now be cumbersome (and less general, thinking of projective mappings), if we would always have to handle the linear mapping matrix and the translation vector. This can be solved by introducing an additional dimension to the mapping, the so-called homogeneous coordinate, which allows us to store the linear mapping as well as the translation vector in a combined 4x4 matrix. This is called augmented matrix and by definition,
x' A | t x
[ ] = [ | ] * [ ]
1 0 | 1 1
It should also be noted, that affine transformations can now be combined very easily by just multiplying there augmented matrices, which would be hard to do in matrix plus vector notation.
One should also note, that the bottom-right 1 is not part of the translation vector, which is still 3-dimensional, but of the matrix augmentation.
You might also want to read the section about "Augmented matrix" here: https://en.wikipedia.org/wiki/Affine_transformation#Augmented_matrix

Get the angle from a rotated matrix using glm?

I want to know how to get the angle from a rotated matrix, how do I calculate that? I am using glm c++.
For example
how do I get the angle from this matrix using c++?
[-0.01458][-2.26652][0][0]
[-1.27492][-0.02596][0][0]
[ 0 ][ 0 ][1][0]
[ x ][ y ][z][1]
This looks like an identity matrix that was rotated around Z axis. If that's always the case, you can get the angle back by applying glm::atan function on the first two elements of the first column:
float get_angle_in_rad(const glm::mat4 &matrix) {
return glm::atan(matrix[0][1], matrix[0][0]);
}
See Rotation matrix for explanation.
Note that if the matrix represents more complicated tranformation than just rotation around Z axis, the value returned by this function will be bogus. Depending on your usecase you may want to keep the euler rotation angles separately, in addition to the transformation matrix.

Why do we need perspective division?

I know perspective division is done by dividing x,y, and z by w, to get normalized device coordinates. But I am not able to understand the purpose of doing that. Also, does it have anything to do with clipping?
Some details that complement the general answers:
The idea is to project a point (x,y,z) on screen to have (xs,ys,d).
The next figure shows this for the y coordinate.
We know from school that
tan(alpha) = ys / d = y / z
This means that the projection is computed as
ys = d*y/z = y /w
w = z / d
This is enough to apply a projection.
However in OpenGL, you want (xs,ys,zs) to be normalized device coordinates in [-1,1] and yes this has something to do with clipping.
The extrema values for (xs,ys,zs) represent the unit cube and everything outside it will be clipped.
So a projection matrix usually takes into consideration the clipping limits (Frustum) to make a single transformation that, with the perspective division, simultaneously apply a projection and transform the projected coordinates along with the z to normalized device coordinates.
I mean why do we need that?
In layman terms: To make perspective distortion work. In a perspective projection matrix, the Z coordinate gets "mixed" into the W output component. So the smaller the value of the Z coordinate, i.e. the closer to the origin, the more things get scaled up, i.e. bigger on screen.
To really distill it to the basic concept, and why the op is division (instead of e.g. square root or some such), consider that an object twice as far should appear with dimensions exactly one half as large. Obtain 1/2 from 2 by... division.
There are many geometric ways to arrive at the same conclusion. A diagram serves as visual proof for this, really.
Dividing x, y, z by w is a "trick" you do with "homogeneous coordinates". To convert a R⁴ vector back to R³ by dividing by the 4th component (or w component as you said). A process called dehomogenizing.
Why you use homogeneous coordinate? That topic is a little bit more involved, I try to explain. I hope I do it justice.
However I will use the x1, x2, x3, x4 as the components of a vector instead of x, y, z, w:
Consider a 3x3 Matrix M and column vectors x, a, b, c of R³. x=(x1, x2, x3) and x1,x2,x3 being scalars or components of x.
With the 3x3 Matrix can do all linear transformations on a vector x you could do with the linear combination:
x' = x1*a + x2*b + x3*c (1).
(x' is the transformed vector that holds the result of transforming x).
Khan Academy on his Course Linear Algebra has a section explaining the fact that every linear transformation can be written as a matrix product with a vector.
You can try this out for example by putting the column vectors a, b, c in the columns of the Matrix M = [ a b c ].
So with the matrix product you essentially get the upper linear combination:
x' = M * x = [a b c] * x = a*x1 + b*x2 + c*x3 (2).
However this operation only accounts for rotation, scaling and shearing transformations. The origin (0, 0, 0) will always stay at (0, 0, 0).
For this you need another kind of transformation named "translation" (moving a vector or adding a vector to the vector).
Consider the translation column vector t = (t1, t2, t3) and the linear combination
x' = x1*a + x2*b + x3*c + t (3).
With this linear combination you can translate, rotate, scale and shear a vector. As you can see this Linear Combination does actually move the origin vector (0, 0, 0) to (0+t1, 0+t2, 0+t3).
However you can't put this translation into a 3x3 Matrix.
So what Graphics Programmers or Mathematicians came up with is adding another dimension to the Matrix and Vectors like this:
M is 4x4 Matrix, x~ vector in R⁴ with x~=(x1, x2, x3, x4). a, b, c, t also being column vectors of R⁴ (last components of a,b,c being 0 and last component for t being 1 - I keep the names the same to later show the similarity between homogeneous linear combination and (3) ). x~ is the homogeneous coordinate of x.
Now watch what happens if we take a vector x of R³ and put it into x~ of R⁴.
This vector will be in homogeneous coordinates in R⁴ x~=(x1, x2, x3, 1). The last component simply being 1 if it is a point and 0 if it's simply a direction (which couldn't be translated anyway).
So you have the linear combination:
x~' = M * x = [a b c t] * x = x1*a + x2*b + x3*c + x4*t (4).
(x~' is the result vector when transforming the homogeneous vector x~)
Since we took a vector from R³ and put it into R⁴ our x4 component is 1 we have:
x~' = x1*a + x2*b + x3*c + 1*t
<=> x~' = x1*a + x2*b + x3*c + t (5).
which is exactly the upper linear transformation (3) with the translation t. This is called an affine transformation (linear transf. + translation).
So with a 3x3 Matrix and a vector of R³ you couldn't do translations. However adding another dimension having a vector in R⁴ and a Matrix in R^4x4 you actually can do it.
However when you want to return to R³ you have to divide the first components with the last one. This is called "dehomogenizing". Which is the the x4 component or in your variable naming the w-component. So x is the original coordinate in R³. Be x~ in R⁴ and the homogenized vector of x. And x' in R³ of x~.
x' = (x1/x4, x2/x4, x3/x4) (6).
Then x' is the dehomogenized vector of the vector x~.
Coming back to perspective division:
(I will leave it out, because many here have explained the divide by z already. It's because of the relationship of a right triangle, being similar which leads you to simplify that with a given focal length f a z values with y coordinate lands at y' = f*y/z. Also since you stated [I hope I didn't misread that you already know why this is done I simply leave a link to a YT-Video here, I find it very well explained on the course lecture CMU 15-462/662 ).
When dehomogenizing the division by the w-component is a pretty handy property when returning to R³. When you apply homogeneous perspective Matrix of 4x4 on a vector you simply put the z component into the w component and let the dehomogenizing process (as in (6) ) perform the perspective divide. So you can setup the w-Component in a way that the division by w divides by z and also maps the values from 0 to 1 (basically you put the range of z-near to z-far values into a range floating points are precise at).
This is also described by Ravi Ramamoorthi in his Course CSE167 when he explains how to set up the perspective projection matrix.
I hope this helped to understand the rational of putting z into the w component. Sorry for my horrible formatting and lengthy text. Yet I hope it helped more than it confused.
Best of luck!
Actually, via standard notational convention from a 4x4 perspective matrix with sightline along a 'z' direction, 'w' differs by 1 from the distance ratio. Also that ratio, though interpreted correctly, is normally expressed as -z/d where 'z' is negative (therefore producing the correct ratio) because, again, in common notational convention, the camera is looking in the negative 'z' direction.
The reason for the offset by 1 needs to be explained. Many references put the origin at the image plane rather than the center of projection. With that convention (again with the camera looking along the negative 'z' direction) the distance labeled 'z' in the similar triangles diagram is thereby replaced by (d-z). Then substituting that for 'z' the expression for 'w' becomes, instead of 'z/d', (d-z)/d = [1-z/d]. To some these conventions may seem unorthodox but they are quite popular among analysts.

Why does sign matter in opengl projection matrix

I'm working on a computer vision problem which requires rendering a 3d model using a calibrated camera. I'm writing a function that breaks the calibrated camera matrix into a modelview matrix and a projection matrix, but I've run into an interesting phenomenon in opengl that defies explanation (at least by me).
The short description is that negating the projection matrix results in nothing being rendered (at least in my experience). I would expect that multiplying the projection matrix by any scalar would have no effect, because it transforms homogeneous coordinates, which are unaffected by scaling.
Below is my reasoning why I find this to be unexpected; maybe someone can point out where my reasoning is flawed.
Imagine the following perspective projection matrix, which gives correct results:
[ a b c 0 ]
P = [ 0 d e 0 ]
[ 0 0 f g ]
[ 0 0 h 0 ]
Multiplying this by camera coordinates gives homogeneous clip coordinates:
[x_c] [ a b c 0 ] [X_e]
[y_c] = [ 0 d e 0 ] * [Y_e]
[z_c] [ 0 0 f g ] [Z_e]
[w_c] [ 0 0 h 0 ] [W_e]
Finally, to get normalized device coordinates, we divide x_c, y_c, and z_c by w_c:
[x_n] [x_c/w_c]
[y_n] = [y_c/w_c]
[z_n] [z_c/w_c]
Now, if we negate P, the resulting clip coordinates should be negated, but since they are homogeneous coordinates, multiplying by any scalar (e.g. -1) shouldn't have any affect on the resulting normalized device coordinates. However, in openGl, negating P results in nothing being rendered. I can multiply P by any non-negative scalar and get the exact same rendered results, but as soon as I multiply by a negative scalar, nothing renders. What is going on here??
Thanks!
Well, the gist of it is that clipping testing is done through:
-w_c < x_c < w_c
-w_c < y_c < w_c
-w_c < z_c < w_c
Multiplying by a negative value breaks this test.
I just found this tidbit, which makes progress toward an answer:
From Red book, appendix G:
Avoid using negative w vertex coordinates and negative q texture coordinates. OpenGL might not clip such coordinates correctly and might make interpolation errors when shading primitives defined by such coordinates.
Inverting the projection matrix will result in negative W clipping coordinate, and apparently opengl doesn't like this. But can anyone explain WHY opengl doesn't handle this case?
reference: http://glprogramming.com/red/appendixg.html
Reasons I can think of:
By inverting the projection matrix, the coordinates will no longer be within your zNear and zFar planes of the view frustum (necessarily greater than 0).
To create window coordinates, the normalized device coordinates are translated/scaled by the viewport. So, if you've used a negative scalar for the clip coordinates, the normalized device coordinates (now inverted) translate the viewport to window coordinates that are... off of your window (to the left and below, if you will)
Also, since you mentioned using a camera matrix and that you have inverted the projection matrix, I have to ask... to which matrices are you applying what from the camera matrix? Operating on the projection matrix save near/far/fovy/aspect causes all sorts of problems in the depth buffer including anything that uses z (depth testing, face culling, etc).
The OpenGL FAQ section on transformations has some more details.