in opengl why do we have to do gluPerspective before gluLookAt? - opengl

so under GL_PROJECTION I did
glu.gluPerspective(90,aspect,1,10);
glu.gluLookAt(0,0,3,0,0,0,0,1,0);
this works fine but when I switch the order I don't get any object on my screen, I rotated my camera and theres nothing.
I know that switching the two changes the order of matrix multiplication but I want to know why the first case works but the second doesn't. Thanks

To see an object on screen you need it to fall within the canonical view volume, which is, for OpenGL, [−1, 1] in all three dimensions. To transform an object, you roughly do
P' = Projection × View × Model × P
where P' is the final point which needs to be in the canonical view volume and P is the initial point in model space. P is transformed by the model matrix followed by view and then projection.
The order I've followed is column vector based, where each further transform is pre/left-multiplied. Another way to read the same formula is to read it from left-to-right where instead of transforming the point, the coordinate system is transformed and interpreting P in the transformed system spatially represents P' in the original system. This is just another way to see it, the result is the same in both; both numerically and spatially.
why do we have to do gluPerspective before gluLookAt?
The older, fixed-function pipeline OpenGL post/right-multiplies and thus the order needs to be reversed to get the same effect. So when we need LookAt first and Perspective next, we do the reverse to get the expected result.
Giving the two in right order leads to
P' = View × Projection × Model × P
since matrix multiplication is anti-commutative, you don't get the right P' which falls within the canonical view volume and hence black screen.
See the Chapter 3, Red Book, under the section General-Purpose Transformation Commands which explains the order followed by OpenGL. Excerpt:
Note: All matrix multiplication with OpenGL occurs as follows: Suppose the current matrix is C and the matrix specified with glMultMatrix*() or any of the transformation commands is M. After multiplication, the final matrix is always CM. Since matrix multiplication isn't generally commutative, the order makes a difference.
I want to know why the first case works but the second doesn't.
To know what really happens with the matrix formed of incorrect order, lets do a small workout in 2D. Lets say the canonical view region is [−100, 100] in both X and Y; anything outside this is clipped out. The origin of this imaginary square screen is at the centre, X goes right, Y goes up. When no transform is applied calling DrawImage draws the image at the origin. You've an image which is 1 × 1; its model matrix is scaling by 200 so that it becomes a 200 × 200 image; one that fills the entire screen. Since origin is at centre of the screen, to draw the image such that it fills the screen, we need a view matrix that translates (moves) the image by (−100, −100). Formulating this
P' = View × Model = Translate−100, −100 × Scale200, 200
[ 200, 0, −100 ]
[ 0, 200, −100 ]
[ 0, 0, 1 ]
However, the result of
Model × View = S200, 200 × T−100, −100
[ 200, 0, −20000 ]
[ 0, 200, −20000 ]
[ 0, 0, 1 ]
Multiplying the former matrix with points (0, 0) and (1, 1) would result in (−100, −100) and (100, 100) as expected. The image corners would be aligned to the screen corners. However, multiplying the latter matrix with them would result in (−20000, −20000) and (−19800, −19800); well outside the viewable region. This is because, geometrically, the latter matrix first translates and then scales as opposed to scaling and then translating. The translated scale leads to a point that is completely off.

In the
glu.gluPerspective(90,aspect,1,10);
glu.gluLookAt(0,0,3,0,0,0,0,1,0);
case, first model/world coordinates (in R^3) are transformed into view coordinates (also R^3). Then the projection maps the view coordinates to a perspective space (P^4), which is then reduced by the perspective divide to NDC coordinates. This is in general how it should work.
Now have a look at:
glu.gluLookAt(0,0,3,0,0,0,0,1,0);
glu.gluPerspective(90,aspect,1,10);
Here, world coordinates are projected directly in projective space (P^4). Since the lookAt matrix is a mapping from R^3 -> R^3 and we are already in P^4, this is not going to work. Even if it would be possible to rotate the P^4, the parameters of gluLookAt would have to be adapted to fit to the ranges of the projective space.
Note: In general one should never add gluLookAt to the GL_PROJECTION stack. Since it describes the view matrix it better fits to the GL_MODELVIEW stack. For reference have a look here.

Related

What is the role of gl_Position.w in Vulkan?

Variable gl_Position output from a GLSL vertex shader must have 4 coordinates. In OpenGL, it seems w coordinate is used to scale the vector, by dividing the other coordinates by it. What is the purpose of w in Vulkan?
Shaders and projections in Vulkan behave exactly the same as in OpenGL. There are small differences in depth ranges ([-1, 1] in OpenGL, [0, 1] in Vulkan) or in the origin of the coordinate system (lower-left in OpenGL, upper-left in Vulkan), but the principles are exactly the same. The hardware is still the same and it performs calculations in the same way both in OpenGL and in Vulkan.
4-component vectors serve multiple purposes:
Different transformations (translation, rotation, scaling) can be
represented in the same way, with 4x4 matrices.
Projection can also be represented with a 4x4 matrix.
Multiple transformations can be combined into one 4x4 matrix.
The .w component You mention is used during perspective projection.
All this we can do with 4x4 matrices and thus we need 4-component vectors (so they can be multiplied by 4x4 matrices). Again, I write about this because the above rules apply both to OpenGL and to Vulkan.
So for purpose of the .w component of the gl_Position variable - it is exactly the same in Vulkan. It is used to scale the position vector - during perspective calculations (projection matrix multiplication) original depth is modified by the original .w component and stored in the .z component of the gl_Position variable. And additionally, original depth is also stored in the .w component. After that (as a fixed-function step) hardware performs perspective division and divides position stored in the gl_Position variable by its .w component.
In orthographic projection steps performed by the hardware are exactly the same, but values used for calculations are different. So the perspective division step is still performed by the hardware but it does nothing (position is dived by 1.0).
gl_Position is a Homogeneous coordinates. The w component plays a role at perspective projection.
The projection matrix describes the mapping from 3D points of the view on a scene, to 2D points on the viewport. It transforms from eye space to the clip space, and the coordinates in the clip space are transformed to the normalized device coordinates (NDC) by dividing with the w component of the clip coordinates (Perspective divide).
At Perspective Projection the projection matrix describes the mapping from 3D points in the world as they are seen from of a pinhole camera, to 2D points of the viewport. The eye space coordinates in the camera frustum (a truncated pyramid) are mapped to a cube (the normalized device coordinates).
Perspective Projection Matrix:
r = right, l = left, b = bottom, t = top, n = near, f = far
2*n/(r-l) 0 0 0
0 2*n/(t-b) 0 0
(r+l)/(r-l) (t+b)/(t-b) -(f+n)/(f-n) -1
0 0 -2*f*n/(f-n) 0
When a Cartesian coordinate in view space is transformed by the perspective projection matrix, then the the result is a Homogeneous coordinates. The w component grows with the distance to the point of view. This cause that the objects become smaller after the Perspective divide, if they are further away.
In computer graphics, transformations are represented with matrices. If you want something to rotate, you multiply all its vertices (a vector) by a rotation matrix. Want it to move? Multiply by translation matrix, etc.
tl;dr: You can't describe translation along the z-axis with 3D matrices and vectors. You need at least 1 more dimension, so they just added a dummy dimension w. But things break if it's not 1, so keep it at 1 :P.
Anyway, now we begin with a quick review on matrix multiplication:
You basically put x above a, y above b, z above c. Multiply the whole column by the variable you just moved, and sum up everything in the row.
So if you were to translate a vector, you'd want something like:
See how x and y is now translated by az and bz? That's pretty awkward though:
You'd have to account for how big z is whenever you move things (what if z was negative? You'd have to move in opposite directions. That's cumbersome as hell if you just want to move something an inch over...)
You can't move along the z axis. You'll never be able to fly or go underground
But, if you can make sure z = 1 at all times:
Now it's much clearer that this matrix allows you to move in the x-y plane by a, and b amounts. Only problem is that you're conceptually levitating all the time, and you still can't go up or down. You can only move in 2D.
But you see a pattern here? With 3D matrices and 3D vectors, you can describe all the fundamental movements in 2D. So what if we added a 4th dimension?
Looks familiar. If we keep w = 1 at all times:
There we go, now you get translation along all 3 axis. This is what's called homogeneous coordinates.
But what if you were doing some big & complicated transformation, resulting in w != 1, and there's no way around it? OpenGL (and basically any other CG system I think) will do what's called normalization: divide the resultant vector by the w component. I don't know enough to say exactly why ('cause scaling is a linear transformation?), but it has favorable implications (can be used in perspective transforms). Anyway, the translation matrix would actually look like:
And there you go, see how each component is shrunken by w, then it's translated? That's why w controls scaling.

Why is the OpenGL projection matrix needlessly complicated?

The following image shows the main values used in calculating the perspective projection matrix in OpenGL. They are labelled "HALFFOV", "RIGHT", "LEFT", "NEAR" AND "NEAR x 2":
Now, as you'll see in the following picture, to figure out the x value after projection supposedly it does 2 x NEAR divided by RIGHT - LEFT. The fact is that 2 x NEAR divided by RIGHT - LEFT is the same as simply doing NEAR / RIGHT. In both cases you're simply doubling, doubling the NEAR, and doubling the RIGHT, so the fraction is the same.
Also, in the 3rd column there are operations where there should be zeroes, for example: RIGHT + LEFT divided by RIGHT - LEFT always ends up being 0 / RIGHT - LEFT, which is always zero.
When the GLM math library makes a perspective projection matrix for me those two that always end up zero are always zero.
Why is it that the matrix is written like this? Are there certain cases for which my assumptions are wrong?
Why is it that the matrix is written like this?
Because a symmetrical, view centered projection is just one of many possibilities. Sometimes you want to skew and/or shift the planes for certain effects or rendering techniques.
Are there certain cases for which my assumptions are wrong?
For example plane parallel shifting the view frustum is required for tiled rendering (not to be confused with a tiled rasterizer) where the image to be rendered is split up into a grid of tiles, each one rendered individually and then merged later. This is needed if the desired output images resolution exceeds the maximum viewport/renderbuffer size limits of the used OpenGL implementation.
Other cases are if you want to simulate tilt-shift photography.
And last but not least a shifted projection matrix is required for stereoscopic rendering targeting a fixed position screen display device, that's viewed using 3D glasses.
(Rendering for headmounted displays requires a slightly different projection setup).

Calculating the perspective projection matrix according to the view plane

I'm working with openGL but this is basically a math question.
I'm trying to calculate the projection matrix, I have a point on the view plane R(x,y,z) and the Normal vector of that plane N(n1,n2,n3).
I also know that the eye is at (0,0,0) which I guess in technical terms its the Perspective Reference Point.
How can I arrive the perspective projection from this data? I know how to do it the regular way where you get the FOV, aspect ration and near and far planes.
I think you created a bit of confusion by putting this question under the "opengl" tag. The problem is that in computer graphics, the term projection is not understood in a strictly mathematical sense.
In maths, a projection is defined (and the following is not the exact mathematical definiton, but just my own paraphrasing) as something which doesn't further change the results when applied twice. Think about it. When you project a point in 3d space to a 2d plane (which is still in that 3d space), each point's projection will end up on that plane. But points which already are on this plane aren't moving at all any more, so you can apply this as many times as you want without changing the outcome any further.
The classic "projection" matrices in computer graphics don't do this. They transfrom the space in a way that a general frustum is mapped to a cube (or cuboid). For that, you basically need all the parameters to describe the frustum, which typically is aspect ratio, field of view angle, and distances to near and far plane, as well as the projection direction and the center point (the latter two are typically implicitely defined by convention). For the general case, there are also the horizontal and vertical asymmetries components (think of it like "lens shift" with projectors). And all of that is what the typical projection matrix in computer graphics represents.
To construct such a matrix from the paramters you have given is not really possible, because you are lacking lots of parameters. Also - and I think this is kind of revealing - you have given a view plane. But the projection matrices discussed so far do not define a view plane - any plane parallel to the near or far plane and in front of the camera can be imagined as the viewing plane (behind the camere would also work, but the image would be mirrored), if you should need one. But in the strict sense, it would only be a "view plane" if all of the projected points would also end up on that plane - which the computer graphics perspective matrix explicitely does'nt do. It instead keeps their 3d distance information - which also means that the operation is invertible, while a classical mathematical projection typically isn't.
From all of that, I simply guess that what you are looking for is a perspective projection from 3D space onto a 2D plane, as opposed to a perspective transformation used for computer graphics. And all parameters you need for that are just the view point and a plane. Note that this is exactly what you have givent: The projection center shall be the origin and R and N define the plane.
Such a projection can also be expressed in terms of a 4x4 homogenous matrix. There is one thing that is not defined in your question: the orientation of the normal. I'm assuming standard maths convention again and assume that the view plane is defined as <N,x> + d = 0. From using R in that equation, we can get d = -N_x*R_x - N_y*R_y - N_z*R_z. So the projection matrix is just
( 1 0 0 0 )
( 0 1 0 0 )
( 0 0 1 0 )
(-N_x/d -N_y/d -N_z/d 0 )
There are a few properties of this matrix. There is a zero column, so it is not invertible. Also note that for every point (s*x, s*y, s*z, 1) you apply this to, the result (after division by resulting w, of course) is just the same no matter what s is - so every point on a line between the origin and (x,y,z) will result in the same projected point - which is what a perspective projection is supposed to do. And finally note that w=(N_x*x + N_y*y + N_z*z)/-d, so for every point fulfilling the above plane equation, w= -d/-d = 1 will result. In combination with the identity transform for the other dimensions, which just means that such a point is unchanged.
Projection matrix must be at (0,0,0) and viewing in Z+ or Z- direction
this is a must because many things in OpenGL depends on it like FOG,lighting ... So if your direction or position is different then you need to move this to camera matrix. Let assume your focal point is (0,0,0) as you stated and the normal vector is (0,0,+/-1)
Z near
is the distance between focal point and projection plane so znear is perpendicular distance of plane and (0,0,0). If assumption is correct then
znear=R.z
otherwise you need to compute that. I think you got everything you need for it
cast line from R with direction N
find closest point to focal point (0,0,0)
and then the z near is the distance of that point to R
Z far
is determined by the depth buffer bit width and z near
zfar=znear*(1<<(cDepthBits-1))
this is the maximal usable zfar (for mine purposes) if you need more precision then lower it a bit do not forget precision is higher near znear and much much worse near zfar. The zfar is usually set to the max view distance and znear computed from it or set to min focus range.
view angle
I use mostly 60 degree view. zang=60.0 [deg]
Common males in my region can see up to 90 degrees but that is peripherial view included the 60 degree view is more comfortable to view.
Females have a bit wider view ... but I did not heard any complains from them on 60 degree views ever so let assume its comfortable for them too...
Aspect
aspect ratio is determined by your OpenGL window dimensions xs,ys
aspect=(xs/ys)
This is how I set the projection matrix:
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
gluPerspective(zang/aspect,aspect,znear,zfar);
// gluPerspective has inacurate tangens so correct perspective matrix like this:
double perspective[16];
glGetDoublev(GL_PROJECTION_MATRIX,perspective);
perspective[ 0]= 1.0/tan(0.5*zang*deg);
perspective[ 5]=aspect/tan(0.5*zang*deg);
glLoadMatrixd(perspective);
deg = M_PI/180.0
perspective is projection matrix copy I use it for mouse position conversions etc ...
If you do not correct the matrix then you will be off when using advanced things like overlapping more frustrum to get high precision depth range. I use this to obtain <0.1m,1000AU> frustrum with 24bit depth buffer and the inaccuracy would cause the images will not fit perfectly ...
[Notes]
if the focal point is not really (0,0,0) or you are not viewing in Z axis (like you do not have camera matrix but instead use projection matrix for that) then on basic scenes/techniques you will see no problem. They starts with use of advanced graphics. If you use GLSL then you can handle this without problems but fixed OpenGL function can not handle this properly. This is also called PROJECTION_MATRIX abuse
[edit1] few links
If your view is standard frustrum then write the matrix your self gluPerspective otherwise look here Projections for some ideas how to construct it
[edit2]
From your comment I see it like this:
f is your viewing point (axises are the global world axises)
f' is viewing point if R would be the center of screen
so create projection matrix for f' position (as explained above), create transform matrix to transform f' to f. The transformed f must have Z axis the same as in f' the other axises can be obtained by cross product and use that as camera or multiply booth together and use as abused Projection matrix
How to construct the matrix is explained in the Understanding transform matrices link from my earlier comments

why the camera backward is equivalent to moving the whole world forward?

Opengl superbible 4th Edition.page 164
To apply a camera transformation, we take the camera’s actor transform and flip it so that
moving the camera backward is equivalent to moving the whole world forward. Similarly,
turning to the left is equivalent to rotating the whole world to the right.
I can't understand why?
Image yourself placed within a universe that also contains all other things. In order for your viewpoint to appear to move in a forwardly direction, you have two options...
You move yourself forward.
You move everything else in the universe in the opposite direction to 1.
Because you defining everything in OpenGL in terms of the viewer (you're ultimately rendering a 2D image of a particular viewpoint of the 3D world), it can often make more sense from both a mathematical and programatic sense to take the 2nd approach.
Mathematically there is only one correct answer. It is defined that after transforming to eye-space by multiplying a world-space position by the view-matrix, the resulting vector is interpreted relative to the origin (the position in space where the camera conceptually is located relative to the aforementioned point).
What SuperBible states is mathematically just a negation of translation in some direction, which is what you will automatically get when using functions that compute a view-matrix like gluLookAt() or glmLookAt() (although GLU is a lib layered on legacy GL stuff, mathematically the two are identical).
Have a look at the API ref for gluLookAt(). You'll see that the first step is setting up an ortho-normal base of the eye-space which first results in a 4x4 matrix basically only encoding the upper 3x3 rotation matrix. The second is multiplying the former matrix by a translation matrix. In terms of legacy functions, this can be expressed as
glMultMatrixf(M); // where M encodes the eye-space basis
glTranslated(-eyex, -eyey, -eyez);
You can see, the vector (eyex, eyey, eyez) which specifies where the camera is located in world-space is simply multiplied by -1. Now assume we don't rotate the camera at all, but assume it to be located at world-space position (5, 5, 5). The appropriate view-matrix View would be
[1 0 0 -5
0 1 0 -5
0 0 1 -5
0 0 0 1]
Now take a world-space vertex position P = (0, 0, 0, 1) transformed by that matrix: P' = View * P. P' will then simply be P'=(-5, -5, -5, 1).
When thinking in world-space, the camera is at (5, 5, 5) and the vertex is at (0, 0, 0). When thinking in eye-space, the camera is at (0, 0, 0) and the vertex is at (-5, -5, -5).
So in conclusion: Conceptually, it's a matter of how you're looking at things. You can either think of it as transforming the camera relative to the world, or you think of it as transform the world relative to the camera.
Mathematically, and in terms of the OpenGL transformation pipeline, there is only one answer, and that is: the camera in eye-space (or view-space or camera-space) is always at the origin and world-space positions transformed to eye-space will always be relative to the coordinate system of the camera.
EDIT: Just to clarify, although the transformation pipeline and involved vector spaces are well defined, you can still use world-space positions of everything, even the camera, for instance in a fragment shader for lighting computation. The important thing here is to know never to mix entities from different spaces, e.g. don't compute stuff based on a world-space and and eye-space position and so on.
EDIT2: Nowadays, in a time that we all use shaders *cough and roll-eyes*, you're pretty flexible and theoretically you can pass any position you like to gl_Position in a vertex shader (or the geometry shader or tessellation stages). However, since the subsequent computations are fixed, i.e. clipping, perspective division and viewport transformation the resulting position will simply be clipped if its not inside [-gl_Position.w, gl_Position.w] in x, y and z.
There is a lot to this to really get it down. I suggest you read the entire article on the rendering pipeline in the official GL wiki.

How perspective matrix works?

I started to read lesson 1 in learningwebgl blog, and I noticed this part:
var pMatrix = mat4.create();
mat4.perspective(45, gl.viewportWidth / gl.viewportHeight, 0.1, 100.0, pMatrix);
I roughly understand how matrices (translation/rotation/multiple) works, but I have no idea what mat4.perspective(...) means. What is it used for? What is the result, if I multiply a vector with this matrix?
The perspective matrix is used to scale, and possibly translate or flip the coordinate system in preparation for the perspective divide. Since the the perspective projection operation involves a divide, it cannot be represented by a linear matrix transformation alone.
In a programmable graphics pipeline (see pixel shaders) you cannot see the divide operation - it is still one of the fixed-function parts. The programmer controls it by tweaking the variables involved in the operation. In the case of the perspective divide it is the projection matrix that gives you this control.
The projection matrix is used to convert world-coordinates to screen coordinates.
The positions in your three-dimensional virtual world are triplets of x, y and z coordinates. When you want to draw something (or rather tell OpenGL to draw something) it needs to calculation where these coordinates are on the users screen.
This calculation is implemented with matrix multiplication.
A vector consisting of x, y and z (and a fourth value of 1 which is necessary to allow the matrix to do some operations like scaling) is multiplied with a matrix to receive a new set of x, y and z coordinates (4th value is discarded) which represent where this point is on the users screen (the z-coordinate is required to determine which objects are in front of others).
The function mat4.perspective generates a projection matrix which generates a matrix which does exactly that. The arguments are:
The field-of-view in degree (45)
the aspect ratio of the field of view (the aspect ratio of the viewport)
the minimal distance from the viewer which is still drawn (0.1 world units)
the maximum distance from the viewer which is still drawn (100.0 world units)
the array in which the generated matrix is stored (pMatrix)
When a point is multiplied with this matrix, the result are the screen coordinates where this point has to be drawn.