Matrix multiplication for autorotation and different screen sizes - opengl

Whether you use fixed or programmable shader pipeline, a common vertex pipeline consists of this matrix multiplication (either custom coded or behind the scenes):
Projection * Modelview * Position
Lots of tutorials note items such as an object's rotation should go into the Modelview matrix.
I created a standard rotation matrix function based on degrees and then added to the degrees parameter the proper multiple of 90 to account for the screen's autorotation orientation. Works.
For different screen sizes (different pixel widths and heights of screen), I could also factor a Scale multiplier in there so that a Modelview matrix might incorporate a lot of these.
But what I've settled on is a much more verbose matrix math and since I'm new to this stuff, I'd appreciate feedback on whether this is smart.
I simply add independent matrices for the screensize scaling as well as the screen orientation, in addition to object manipulation such as scale and rotation. I end up with this:
Projection * ScreenRotation * ScreenScale * Translate * Rotate * Scale * Position
Some of these are interchangeable order, such as Rotate and Scale could be switched, I find.
This gives me more fine-tuned control and segregation of code so I can concentrate on just an object's rotation without thinking of the screen's orientation at the same time, for example.
Is this is a common or acceptable strategy to organize matrix math appropriately? It seems to work fine, but are there any pitfalls to such verbosity?

The main issue with such verbosity is, that it wastes precious computation cycles if performed on the GPU. Each matrix would be supplied as a uniform, thus forcing the GPU into computing for each and every vertex, while it would be actually a constant across the whole shader. The nice thing about matrices is, that a single matrix can hold the whole chain of transformations and transformation can be done by a single vector-matrix multiplication.
The typical stanza
Projection · Modelview · Position
of using two matrices comes from, that usually one needs the intermediary result of Modelview · Position for some calculations. In theory you could contract the whole thing down to
ProjectionViewModel · Position
Now you're proposing this matrix expression
Projection * ScreenRotation * ScreenScale * Translate * Rotate * Scale * Position
Ugh… this whole thing is the pinnacle of unflexibility. You want flexibility? This thing is rigid, what if you wat to apply some nonuniform scaling onto already rotated geometry. The order of operations in matrix math matters and you can not freely mix them. Assume you're drawing a sphere
Rotate(45, 0, 0, 1) · Scale(1,2,1) · SphereVertex
looks totally different than
Scale(1,2,1) · Rotate(45, 0, 0, 1) · SphereVertex
Screen scale and rotation can, and should be, applied directly into the Projection matrix, no need for extra matrices. The key understanding is, that you can compose every linear transformation chain into a single matrix. And for practical reasons you want to apply screen pixel aspect scaling as last step in the chain, and screen rotation as the second to last step in the chain.
So you can build your projection matrix, not in the shader, but in your display routines frame setup code. Assume you're using my linmath.h it would look like the following
mat4x4 projection;
mat4x4_set_identity(projection);
mat4x4_mul_scale_aniso(projection, …);
mat4x4_mul_rotate_Z(projection, …);
if(using_perspective)
mat4x4_mul_frustum(projection, …);
else
mat4x4_mul_ortho(projection, …);
The resulting matrix projection you'd then set as the projection matrix uniform.

Related

Asymmetric Projection Matrix

I have a little "2 1/2-D" Engine I'm working on that targets multiple platforms, but currently I am upgrading my DirectX 11 version.
One of the things that is really important for my engine to do is to be able to adjust the horizon point so that the perspective below the horizon moves into the distance at a different angle than the perspective above the horizon.
In a normal 3D environment this would be typically accomplished by tilting the camera up above the horizon, however, in my engine, which makes heavy use of 2D sprites, tilting the camera in the traditional sense would also tilt the sprites... something I don't want to do (It ruins the 16-bit-arcade-style of the effect)
I had this working at one point by manually doing the perspective divide in the CPU using a center-point that was off-center, but I'd like to do this with a special projection matrix if possible. Right now I'm using a stock matrix that uses FOV, Near-Plane, and Far-Plane arguments.
Any ideas? Is this even possible with a matrix? Isn't the perpective divide automatic in the DX11 pipeline? How do I control when or how the perspective divide is performed? Am I correct in assuming that the perspective divide cannot be accomplished with a matrix alone? It requires each vertex to be manually divided by Z, correct?
What you are looking for is an off center perspective projection matrix, instead of a fov and aspect ratio, you provide left/right/top/bottom has tan(angle). The result is more or less the same as with a symmetric projection matrix with the addition of two extra non zero values.
You are right also, the GPU is hard wired to perform the w divide, and it is not a good idea to do it in the vertex shader, it will mess with perspective correction for the texture coordinates and clipping ( either not a big deal with the sprite special case ).
You can find an example of such a matrix here : https://msdn.microsoft.com/en-us/library/windows/desktop/bb205353(v=vs.85).aspx

Why do modeview and camera matrices use RUB orientation

I usually find matrix libraries building both modelview and cameras matrices from the RUB (right-up-back) vectors, as depicted in these pages:
http://3dengine.org/Right-up-back_from_modelview
http://3dengine.org/Modelview_matrix
Is the RUB tuple just a common standard?
Otherwise, is there a reason the RUB vectors are preferred over any other orientation (such as forward-up-right)?
Particularly if you're using the programmable pipeline, you have almost complete freedom about the coordinate system you work in, and how you transform your geometry. But once all your transformations are applied in the vertex shader (resulting in the vector assigned to gl_Position), there is still a fixed function block in the pipeline between the vertex shader and fragment shader. That fixed function block relies on the transformed vertices being in a well defined coordinate system.
gl_Position is in a coordinate system called "clip coordinates", which then turns into "normalized device coordinates" (NDC) after dividing by the w coordinate of the vector.
Based on the vector in NDC, the fixed function rasterization block generates pixels. It will use the first coordinate to map to the horizontal window direction, and the second coordinate to map to the vertical window direction. The third coordinate will be used to calculate the depth, which can be used for depth testing.
This means that after all transformations are applied, the first coordinate has to be left-right, the second coordinate has to be bottom-up, and the third coordinate has to be front-back (well, it could be back-front if you change the depth test).
If you use a classic setup with modelview and projection matrix, it makes sense to use the modelview matrix to transform the original geometry into this orientation, and then use the projection matrix to apply e.g. a perspective.
I don't think there's anything stopping you from using a different orientation as the result of the modelview transformation, and then include a rotation in the projection matrix to transform the whole thing into the correct clip coordinate space. But I don't see a benefit, and it looks like it would just add unnecessary confusion.

Confused about OpenGL transformations

In opengl there is one world coordinate system with origin (0,0,0).
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move
objects in world coordinates, or do they move the camera? As you know, the same movement can be achieved by either moving objects or camera.
I am guessing that glTranslate, glRotate, change objects, and gluLookAt changes the camera?
In opengl there is one world coordinate system with origin (0,0,0).
Well, technically no.
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move objects in world coordinates, or do they move the camera?
Neither. OpenGL doesn't know objects, OpenGL doesn't know a camera, OpenGL doesn't know a world. All that OpenGL cares about are primitives, points, lines or triangles, per vertex attributes, normalized device coordinates (NDC) and a viewport, to which the NDC are mapped to.
When you tell OpenGL to draw a primitive, each vertex is processed according to its attributes. The position is one of the attributes and usually a vector with 1 to 4 scalar elements within local "object" coordinate system. The task at hand is to somehow transform the local vertex position attribute into a position on the viewport. In modern OpenGL this happens within a small program, running on the GPU, called a vertex shader. The vertex shader may process the position in an arbitrary way. But the usual approach is by applying a number of nonsingular, linear transformations.
Such transformations can be expressed in terms of homogenous transformation matrices. For a 3 dimensional vector, the homogenous representation in a vector with 4 elements, where the 4th element is 1.
In computer graphics a 3-fold transformation pipeline has become sort of the standard way of doing things. First the object local coordinates are transformed into coordinates relative to the virtual "eye", hence into eye space. In OpenGL this transformation used to be called the modelview transformaion. With the vertex positions in eye space several calculations, like illumination can be expressed in a generalized way, hence those calculations happen in eye space. Next the eye space coordinates are tranformed into the so called clip space. This transformation maps some volume in eye space to a specific volume with certain boundaries, to which the geometry is clipped. Since this transformation effectively applies a projection, in OpenGL this used to be called the projection transformation.
After clip space the positions get "normalized" by their homogenous component, yielding normalized device coordinates, which are then plainly mapped to the viewport.
To recapitulate:
A vertex position is transformed from local to clip space by
vpos_eye = MV · vpos_local
eyespace_calculations(vpos_eye);
vpos_clip = P · vpos_eye
·: inner product column on row vector
Then to reach NDC
vpos_ndc = vpos_clip / vpos_clip.w
and finally to the viewport (NDC coordinates are in the range [-1, 1]
vpos_viewport = (vpos_ndc + (1,1,1,1)) * (viewport.width, viewport.height) / 2 + (viewport.x, viewport.y)
*: vector component wise multiplication
The OpenGL functions glRotate, glTranslate, glScale, glMatrixMode merely manipulate the transformation matrices. OpenGL used to have four transformation matrices:
modelview
projection
texture
color
On which of them the matrix manipulation functions act on can be set using glMatrixMode. Each of the matrix manipulating functions composes a new matrix by multiplying the transformation matrix they describe on top of the select matrix thereby replacing it. The functions glLoadIdentity replace the current matrix with identity, glLoadMatrix replaces it with a user defined matrix, and glMultMatrix multiplies a user defined matrix on top of it.
So how does the modelview matrix then emulate both object placement and a camera. Well, as you already stated
As you know, the same movement can be achieved by either moving objects or camera.
You can not really discern between them. The usual approach is by splitting the object local to eye transformation into two steps:
Object to world – OpenGL calls this the "model transform"
World to eye – OpenGL calls this the "view transform"
Together they form the model-view, in fixed function OpenGL described by the modelview matrix. Now since the order of transformations is
local to world, Model matrix vpos_world = M · vpos_local
world to eye, View matrix vpos_eye = V · vpos_world
we can substitute by
vpos_eye = V · ( M · vpos_local ) = V · M · vpos_local
replacing V · M by the ModelView matrix =: MV
vpos_eye = MV · vpos_local
Thus you can see that what's V and what's M of the compund matrix M is only determined by the order of operations in which you multiply onto the modelview matrix, and at which step you decide to "call it the model transform from here on".
I.e. right after a
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
the view is defined. But at some point you'll start applying model transformations and everything after is model.
Note that in modern OpenGL all the matrix manipulation functions have been removed. OpenGL's matrix stack never was feature complete and no serious application did actually use it. Most programs just glLoadMatrix-ed their self calculated matrices and didn't bother with the OpenGL built-in matrix maniupulation routines.
And ever since shaders were introduced, the whole OpenGL matrix stack got awkward to use, to say it nicely.
The verdict: If you plan on using OpenGL the modern way, don't bother with the built-in functions. But keep in mind what I wrote, because what your shaders do will be very similar to what OpenGL's fixed function pipeline did.
OpenGL is a low-level API, there is no higher-level concepts like an "object" and a "camera" in the "scene", so there are only two matrix modes: MODELVIEW (a multiplication of "camera" matrix by the "object" transformation) and PROJECTION (the projective transformation from world-space to post-perspective space).
Distinction between "Model" and "View" (object and camera) matrices is up to you. glRotate/glTranslate functions just multiply the currently selected matrix by the given one (without even distinguishing between ModelView and Projection).
Those functions multiply (transform) the current matrix set by glMatrixMode() so it depends on the matrix you're working on. OpenGL has 4 different types of matrices; GL_MODELVIEW, GL_PROJECTION, GL_TEXTURE, and GL_COLOR, any one of those functions can change any of those matrices. So, basically, you don't transform objects you just manipulate different matrices to "fake" that effect.
Note that glulookat() is just a convenient function equivalent to a translation followed by some rotations, there's nothing special about it.
All transformations are transformations on objects. Even gluLookAt is just a transformation to transform the objects as if the camera was where you tell it to be. Technically they are transformations on the vertices, but that's just semantics.
That's true, glTranslate, glRotate change the object coordinates before rendering and gluLookAt changes the camera coordinate.

Does the OpenGL fixed function pipeline compute lighting in view-space?

Does the OpenGL fixed function pipeline compute lighting in view-space?
If the answer is yes, then how does it cope with view transformations with non-uniform scale? Actually, how does it cope with view transformations incorporating any scale at all?
If this is true then scaling the view space will result in different light-to-vertex distances, meaning the lighting intensity for point-lights will change as the view matrix is scaled.
Lighting in world-space would make the computed point-light intensity independent of view space scaling, but would require:
That an object-to-world matrix is supplied to the API (such as in DirectX, where the light positions are specified in world-space).
That the API transform all geometry twice when drawing. Once by world*view*proj into clip space, and again by world, in order to compute lighting at the vertices in world space.
Points awarded for a good answer with any additional background info you can dig up on fixed-function lighting pipelines in general.
I think your question comes from a confusion of "view space" with "post-projection space". They are not the same.
View space, or camera space, is the space of the scene relative to the camera. Thus, the camera is sitting at the origin, looking down the -Z axis, with +Y being up. In terms of OpenGL fixed-function, camera space is the space after multiplying positions and normals by the GL_MODELVIEW matrix.
Post-projection space is what you get after multiplying camera space values by the GL_PROJECTION matrix. This is in fact why there are two separate matrices. You do lighting in camera space, and you send the post-projection positions off for rasterization.
OpenGL does not do lighting in post-projection space. So the aspect ratio, camera zoom, and so forth does not affect lighting. Nor does the perspective divide.
Does the OpenGL fixed function pipeline compute lighting in view-space?
Yes, and so should you.
If the answer is yes, then how does it cope with view transformations with non-uniform scale? Actually, how does it cope with view transformations incorporating any scale at all?
The exact same way that it copes with the model-to-world transform incorporating scale.
It's just a matrix. The math neither knows nor cares where a particular scale transform happens to be, whether it is in the model-to-world part or the world-to-camera part. All that matters is that a scale is present. Or a skew or any other form of transform.
And remember: it is far more likely that the model-to-world transform uses scales than the world-to-camera transform does. You are more likely to need to rescale geometry to fit into the world than you are to need to rescale geometry for the camera matrix. The scaling for camera zooms, aspect ratio, and the like is a part of the perspective matrix, not the camera matrix.
It "copes" with this in the usual way: normals are transformed by the inverse-transpose of the model-to-view matrix. This alters the normals (full disclosure: that's my eBook tutorial) so that they still fit the model after the scaling. This is necessary regardless of what space you're in.
If this is true then scaling the view space will result in different light-to-vertex distances, meaning the lighting intensity for point-lights will change as the view matrix is scaled.
... and? Since all of the objects are transformed by the same camera matrix (within a single scene), all of the objects will have the same scale applied. Therefore, if they were all in the same scale in world-space, they will all be in the same scale in camera-space.
So what's the problem? Yes, the attenuation changes, but it changes equally for all objects. Thus, there isn't a problem, so long as your attenuation factors are designed for this camera space.

Why would it be beneficial to have a separate projection matrix, yet combine model and view matrix?

When you are learning 3D programming, you are taught that it's easiest think in terms of 3 transformation matrices:
The Model Matrix. This matrix is individual to every single model and it rotates and scales the object as desired and finally moves it to its final position within your 3D world. "The Model Matrix transforms model coordinates to world coordinates".
The View Matrix. This matrix is usually the same for a large number of objects (if not for all of them) and it rotates and moves all objects according to the current "camera position". If you imaging that the 3D scene is filmed by a camera and what is rendered on the screen are the images that were captured by this camera, the location of the camera and its viewing direction define which parts of the scene are visible and how the objects appear on the captured image. There are little reasons for changing the view matrix while rendering a single frame, but those do in fact exists (e.g. by rendering the scene twice and changing the view matrix in between, you can create a very simple, yet impressive mirror within your scene). Usually the view matrix changes only once between two frames being drawn. "The View Matrix transforms world coordinates to eye coordinates".
The Projection Matrix. The projection matrix decides how those 3D coordinates are mapped to 2D coordinates, e.g. if there is a perspective applied to them (objects get smaller the farther they are away from the viewer) or not (orthogonal projection). The projection matrix hardly ever changes at all. It may have to change if you are rendering into a window and the window size has changed or if you are rendering full screen and the resolution has changed, however only if the new window size/screen resolution has a different display aspect ratio than before. There are some crazy effects for that you may want to change this matrix but in most cases its pretty much constant for the whole live of your program. "The Projection Matrix transforms eye coordinates to screen coordinates".
This makes all a lot of sense to me. Of course one could always combine all three matrices into a single one, since multiplying a vector first by matrix A and then by matrix B is the same as multiplying the vector by matrix C, where C = B * A.
Now if you look at the classical OpenGL (OpenGL 1.x/2.x), OpenGL knows a projection matrix. Yet OpenGL does not offer a model or a view matrix, it only offers a combined model-view matrix. Why? This design forces you to permanently save and restore the "view matrix" since it will get "destroyed" by model transformations applied to it. Why aren't there three separate matrices?
If you look at the new OpenGL versions (OpenGL 3.x/4.x) and you don't use the classical render pipeline but customize everything with shaders (GLSL), there are no matrices available any longer at all, you have to define your own matrices. Still most people keep the old concept of a projection matrix and a model-view matrix. Why would you do that? Why not using either three matrices, which means you don't have to permanently save and restore the model-view matrix or you use a single combined model-view-projection (MVP) matrix, which saves you a matrix multiplication in your vertex shader for ever single vertex rendered (after all such a multiplication doesn't come for free either).
So to summarize my question: Which advantage has a combined model-view matrix together with a separate projection matrix over having three separate matrices or a single MVP matrix?
Look at it practically. First, the fewer matrices you send, the fewer matrices you have to multiply with positions/normals/etc. And therefore, the faster your vertex shaders.
So point 1: fewer matrices is better.
However, there are certain things you probably need to do. Unless you're doing 2D rendering or some simple 3D demo-applications, you are going to need to do lighting. This typically means that you're going to need to transform positions and normals into either world or camera (view) space, then do some lighting operations on them (either in the vertex shader or the fragment shader).
You can't do that if you only go from model space to projection space. You cannot do lighting in post-projection space, because that space is non-linear. The math becomes much more complicated.
So, point 2: You need at least one stop between model and projection.
So we need at least 2 matrices. Why model-to-camera rather than model-to-world? Because working in world space in shaders is a bad idea. You can encounter numerical precision problems related to translations that are distant from the origin. Whereas, if you worked in camera space, you wouldn't encounter those problems, because nothing is too far from the camera (and if it is, it should probably be outside the far depth plane).
Therefore: we use camera space as the intermediate space for lighting.
In most cases your shader will need the geometry in world or eye coordinates for shading so you have to seperate the projection matrix from the model and view matrices.
Making your shader multiply the geometry with two matrices hurts performance. Assuming each model have thousends (or more) vertices it is more efficient to compute a model view matrix in the cpu once, and let the shader do one less mtrix-vector multiplication.
I have just solved a z-buffer fighting problem by separating the projection matrix. There is no visible increase of the GPU load. The two folowing screenshots shows the two results - pay attention to the green and white layers fighting.