What do we use matrices for in openGL? - opengl

I'm new to OpenGL and was wondering what do we use matrices for, if someone could explain me in abstract intuitive way, because when reading references or any tutorial, all of these take matrices as known mechanism. I've learned matrices in maths, but as English is not my native language, it's hard to figure out what does some stuff mean.
I found good example at www.learnopengl.com which says:
The model matrix. This matrix is used to place a model somewhere in the “world”. For example, if you have a model of a car and you want it located 1000 meters to the east, you will use the model matrix to do this.
The view matrix. This matrix represents the camera. If we want to view our car which is 1000 meters to the east, we’ll have to move ourselves 1000 meters to the east as well (another way of thinking about it is that we remain stationary, and the rest of the world moves 1000 meters to the west). We use the view matrix to do this.
The projection matrix. Since our screens are flat, we need to do a final transformation to “project” our view onto our screen and get that nice 3D perspective. This is what the projection matrix is used for.
This explains it pretty good. But, how do we build them? How large are they?
Also, I've read in this question:
What does glLoadIdentity() do in OpenGL?
that:
glMatrixMode(GL_PROJECTION) deals with the matrices used by
perspective transformation or orthogonal transformation.
glMatrixMode(GL_MODELVIEW) deals with matrices used by model-view
transformation. That is, to transform your object (aka model) to the
view coordinate space (or camera space).
What those transformation mean and how do they get calculated?
I know that here are many question, but I'm trying to make better conception of all of these to get better view on OpenGL. That's why I need some abstract explanation to dive into all details with understanding of conception beyond.

Translation, rotation, and scaling are all affine transforms, which can be implemented using matrix multiplication and addition. Actually, by augmenting the vector with a w element that's always one, the affine transform in 3 dimensions becomes a linear transformation in 4 dimensions and all you need is a matrix multiply.
Doing it with a matrix multiply is very nice because (1) it's fast and (2) you don't need special logic for any of the operations -- you can even compose as many of these affine operations as you want and still represent it with a single matrix.
Having multiple matrix modes is useful when composing. If you had only one matrix, you could add new operations at either end but not in the middle. By having 3 matrices multiplied together later, you can insert new operations at four different points in the order.
The matrix stack is also very useful because it allows you to do an operation for a few primitives and then remove it. If you tried to undo the operation by doing the inverse operation, eventually rounding errors would get out of control. But by remembering the previous matrix, you can just get it back as if the rotation or whatever never happened.

OpenGL is nice in that rather than working with matrices directly, you can call functions that will manipulate them.
So under the hood (what really happens), is that there are several matrices that transform your objects (a model-view matrix that transforms object to camera space, and projection matrix for perspective / orthogonal transformations).
glMatrixMode is like a switch that allows you to choose which type of matrix to use and manipulate, and you specify using the arguments. So glMatrixMode(GL_PROJECTION) means that you will be manipulating the projection matrix.

Related

Asymmetric Projection Matrix

I have a little "2 1/2-D" Engine I'm working on that targets multiple platforms, but currently I am upgrading my DirectX 11 version.
One of the things that is really important for my engine to do is to be able to adjust the horizon point so that the perspective below the horizon moves into the distance at a different angle than the perspective above the horizon.
In a normal 3D environment this would be typically accomplished by tilting the camera up above the horizon, however, in my engine, which makes heavy use of 2D sprites, tilting the camera in the traditional sense would also tilt the sprites... something I don't want to do (It ruins the 16-bit-arcade-style of the effect)
I had this working at one point by manually doing the perspective divide in the CPU using a center-point that was off-center, but I'd like to do this with a special projection matrix if possible. Right now I'm using a stock matrix that uses FOV, Near-Plane, and Far-Plane arguments.
Any ideas? Is this even possible with a matrix? Isn't the perpective divide automatic in the DX11 pipeline? How do I control when or how the perspective divide is performed? Am I correct in assuming that the perspective divide cannot be accomplished with a matrix alone? It requires each vertex to be manually divided by Z, correct?
What you are looking for is an off center perspective projection matrix, instead of a fov and aspect ratio, you provide left/right/top/bottom has tan(angle). The result is more or less the same as with a symmetric projection matrix with the addition of two extra non zero values.
You are right also, the GPU is hard wired to perform the w divide, and it is not a good idea to do it in the vertex shader, it will mess with perspective correction for the texture coordinates and clipping ( either not a big deal with the sprite special case ).
You can find an example of such a matrix here : https://msdn.microsoft.com/en-us/library/windows/desktop/bb205353(v=vs.85).aspx

How to scale a matrix obtained with glm::lookat()?

I am trying to create an orthographic view (separate from the final perspective view). I know the view's scale, position, it's up vector, and the point it's looking at. So it seems easiest to simply use glm::lookAt() to create the view matrix, except there's no way to specify scale.
Is there a way to introduce scaling with glm::lookAt()? I don't think it's possible to scale the matrix made by lookAt, since scaling needs to happen first to get the 'expected' result (objects appearing larger or smaller, like zooming a camera in or out). I may be wrong there though.
If it's not possible while using lookAt, is there a series of glm functions that would be equivalent?
In terms of matrix transformations, you can start with a scale and apply the glm::lookAt() matrix afterwards. I'm not familiar with glm, but I have used GLUT before. You should be able to do a matrix multiplication like this (I'm assuming that you've already transformed everything into scene space, defined all the variables etc.)
TransformedVector = glm::lookAt(cameraPosition,
cameraTarget,
upVector) *
glm::scale(scalingFactor,
scalingFactor,
scalingFactor) *
OriginalVector;
If this behaves peculiarly try swapping the glm::lookAt() and glm::scale() order.

Why are model and view matrices specifically often combined together?

I'm having trouble understanding why the model and view matrices are traditionally combined together. I know that the less matrix multiplying you do in the vertex shader the better, but it makes much more sense to me to combine the projection and view matrices.
This is because they are both intrinsically camera properties. It makes sense to me to first transform vertices into world space with the model matrix, perform lighting etc., then use your combined camera matrix to translate to normalised clip space.
I know that I can do it that way if I want in a programmable pipeline, but I want to know why historically people combined the model and view matrices.
In graphics programming, camera doesn't exist. It's always fixed at (0,0,0) and looking towards (0,0,-1). Camera as everyone knows it is totally artificial and mimics the way we are used to observing objects, as humans, and by that I mean: moving around, pivoting our head and such. To mimic that, cg introduces the concept of camera. It is interesting and well known that it is the same thing whether will you move to camera to the right or to move all other objects in the scene to the left. That invariance is then transffered onto modelMatrix by combining all transformations on object in one matrix - MVMatrix.
View and Projection matrices are seperated because those matrices do very different transformations. One is very similar to modelMatrix and represents 3d, in-space transformations, and the other is used for calculating angles from which the objects are viewed.

Why would it be beneficial to have a separate projection matrix, yet combine model and view matrix?

When you are learning 3D programming, you are taught that it's easiest think in terms of 3 transformation matrices:
The Model Matrix. This matrix is individual to every single model and it rotates and scales the object as desired and finally moves it to its final position within your 3D world. "The Model Matrix transforms model coordinates to world coordinates".
The View Matrix. This matrix is usually the same for a large number of objects (if not for all of them) and it rotates and moves all objects according to the current "camera position". If you imaging that the 3D scene is filmed by a camera and what is rendered on the screen are the images that were captured by this camera, the location of the camera and its viewing direction define which parts of the scene are visible and how the objects appear on the captured image. There are little reasons for changing the view matrix while rendering a single frame, but those do in fact exists (e.g. by rendering the scene twice and changing the view matrix in between, you can create a very simple, yet impressive mirror within your scene). Usually the view matrix changes only once between two frames being drawn. "The View Matrix transforms world coordinates to eye coordinates".
The Projection Matrix. The projection matrix decides how those 3D coordinates are mapped to 2D coordinates, e.g. if there is a perspective applied to them (objects get smaller the farther they are away from the viewer) or not (orthogonal projection). The projection matrix hardly ever changes at all. It may have to change if you are rendering into a window and the window size has changed or if you are rendering full screen and the resolution has changed, however only if the new window size/screen resolution has a different display aspect ratio than before. There are some crazy effects for that you may want to change this matrix but in most cases its pretty much constant for the whole live of your program. "The Projection Matrix transforms eye coordinates to screen coordinates".
This makes all a lot of sense to me. Of course one could always combine all three matrices into a single one, since multiplying a vector first by matrix A and then by matrix B is the same as multiplying the vector by matrix C, where C = B * A.
Now if you look at the classical OpenGL (OpenGL 1.x/2.x), OpenGL knows a projection matrix. Yet OpenGL does not offer a model or a view matrix, it only offers a combined model-view matrix. Why? This design forces you to permanently save and restore the "view matrix" since it will get "destroyed" by model transformations applied to it. Why aren't there three separate matrices?
If you look at the new OpenGL versions (OpenGL 3.x/4.x) and you don't use the classical render pipeline but customize everything with shaders (GLSL), there are no matrices available any longer at all, you have to define your own matrices. Still most people keep the old concept of a projection matrix and a model-view matrix. Why would you do that? Why not using either three matrices, which means you don't have to permanently save and restore the model-view matrix or you use a single combined model-view-projection (MVP) matrix, which saves you a matrix multiplication in your vertex shader for ever single vertex rendered (after all such a multiplication doesn't come for free either).
So to summarize my question: Which advantage has a combined model-view matrix together with a separate projection matrix over having three separate matrices or a single MVP matrix?
Look at it practically. First, the fewer matrices you send, the fewer matrices you have to multiply with positions/normals/etc. And therefore, the faster your vertex shaders.
So point 1: fewer matrices is better.
However, there are certain things you probably need to do. Unless you're doing 2D rendering or some simple 3D demo-applications, you are going to need to do lighting. This typically means that you're going to need to transform positions and normals into either world or camera (view) space, then do some lighting operations on them (either in the vertex shader or the fragment shader).
You can't do that if you only go from model space to projection space. You cannot do lighting in post-projection space, because that space is non-linear. The math becomes much more complicated.
So, point 2: You need at least one stop between model and projection.
So we need at least 2 matrices. Why model-to-camera rather than model-to-world? Because working in world space in shaders is a bad idea. You can encounter numerical precision problems related to translations that are distant from the origin. Whereas, if you worked in camera space, you wouldn't encounter those problems, because nothing is too far from the camera (and if it is, it should probably be outside the far depth plane).
Therefore: we use camera space as the intermediate space for lighting.
In most cases your shader will need the geometry in world or eye coordinates for shading so you have to seperate the projection matrix from the model and view matrices.
Making your shader multiply the geometry with two matrices hurts performance. Assuming each model have thousends (or more) vertices it is more efficient to compute a model view matrix in the cpu once, and let the shader do one less mtrix-vector multiplication.
I have just solved a z-buffer fighting problem by separating the projection matrix. There is no visible increase of the GPU load. The two folowing screenshots shows the two results - pay attention to the green and white layers fighting.

The purpose of Model View Projection Matrix

For what purposes are we using Model View Projection Matrix?
Why do shaders require Model View Projection Matrix?
The model, view and projection matrices are three separate matrices. Model maps from an object's local coordinate space into world space, view from world space to camera space, projection from camera to screen.
If you compose all three, you can use the one result to map all the way from object space to screen space, making you able to work out what you need to pass on to the next stage of a programmable pipeline from the incoming vertex positions.
In the fixed functionality pipelines of old, you'd apply model and view together, then work out lighting using another result derived from them (with some fixes so that e.g. normals are still unit length even if you've applied some scaling to the object), then apply projection. You can see that reflected in OpenGL, which never separates the model and view matrices — keeping them as a single modelview matrix stack. You therefore also sometimes see that reflected in shaders.
So: the composed model view projection matrix is often used by shaders to map from the vertices you loaded for each model to the screen. It's not required, there are lots of ways of achieving the same thing, it's just usual because it allows all possible linear transforms. Because of that, a lesser composed version of it was also the norm in ye olde fixed pipeline world.
Because matrices are convenient. Matrices help to convert locations/directions with respect to different spaces (A space can be defined by 3 perpendicular axes and an origin).
Here is an example from a book specified by #legends2k in comments.
The residents of Cartesia use a map of their city with the origin
centered quite sensibly at the center of town and axes directed along
the cardinal points of the compass. The residents of Dyslexia use a
map of their city with the coordinates centered at an arbitrary point
and the axes running in some arbitrary directions that probably seemed
a good idea at the time. The citizens of both cities are quite happy
with their respective maps, but the State Transportation Engineer
assigned a task of running up a budget for the first highway between
Cartesia and Dyslexia needs a map showing the details of both cities,
which therefore introduces a third coordinate system that is superior
to him, though not necessarily to anybody else.
Here is another example,
Assume that you have created a car object in a game with it's vertex positions using world's co-ordinates. Suppose you have to use this same car in some other game in an entirely different world, you have to define the positions again and the calculations will go complex. This is because you again have to calculate the positions of window, hood, headlight, wheels etc., in the car with respect to new world.
See this video to understand the concepts of model, view and projection. (highly recommended)
Then see this to understand how the vertices in the world are represented as Matrices and how they are transformed.