Why are model and view matrices specifically often combined together? - opengl

I'm having trouble understanding why the model and view matrices are traditionally combined together. I know that the less matrix multiplying you do in the vertex shader the better, but it makes much more sense to me to combine the projection and view matrices.
This is because they are both intrinsically camera properties. It makes sense to me to first transform vertices into world space with the model matrix, perform lighting etc., then use your combined camera matrix to translate to normalised clip space.
I know that I can do it that way if I want in a programmable pipeline, but I want to know why historically people combined the model and view matrices.

In graphics programming, camera doesn't exist. It's always fixed at (0,0,0) and looking towards (0,0,-1). Camera as everyone knows it is totally artificial and mimics the way we are used to observing objects, as humans, and by that I mean: moving around, pivoting our head and such. To mimic that, cg introduces the concept of camera. It is interesting and well known that it is the same thing whether will you move to camera to the right or to move all other objects in the scene to the left. That invariance is then transffered onto modelMatrix by combining all transformations on object in one matrix - MVMatrix.
View and Projection matrices are seperated because those matrices do very different transformations. One is very similar to modelMatrix and represents 3d, in-space transformations, and the other is used for calculating angles from which the objects are viewed.

Related

OpenGL spectrum between perspective and orthogonal projection

In OpenGL (all versions, though I happen to be working in OpenGL ES 2.0) there is the option of using a perspective projection versus an orthogonal one. Is there a way to control the degree of orthogonality?
For the sake of picturing the issue (and please don't take this as the actual question, I am well aware there is no camera in OpenGL) assume that a scene is rendered with the viewport "looking" down the -z axis. Two parallel lines extending a finite distance down the -z axis at (x,y)=1,1 and (x,y)=-1,1 will appear as points in orthogonal projection, or as two lines that eventually converge to a single pixel in perspective projection. Is there a way to have the x- and y- values represented by the outer edges of the screen remain the same as in projection space - I assume this requires not changing the frustum - but have the lines only converge part of the way to a single pixel?
Is there a way to control the degree of orthogonality?
Either something is orthogonal, or it is not. There's no such thing like "just a little orthogonal".
Anyway, from a mathematical point of view, a perspective projection with an infinitely narrow field of view is orthogonal. So you can use glFrustum with a very large near and far plane distance, together with a countering translation in modelview to bring the far away viewing volume back to the origin.

What do we use matrices for in openGL?

I'm new to OpenGL and was wondering what do we use matrices for, if someone could explain me in abstract intuitive way, because when reading references or any tutorial, all of these take matrices as known mechanism. I've learned matrices in maths, but as English is not my native language, it's hard to figure out what does some stuff mean.
I found good example at www.learnopengl.com which says:
The model matrix. This matrix is used to place a model somewhere in the “world”. For example, if you have a model of a car and you want it located 1000 meters to the east, you will use the model matrix to do this.
The view matrix. This matrix represents the camera. If we want to view our car which is 1000 meters to the east, we’ll have to move ourselves 1000 meters to the east as well (another way of thinking about it is that we remain stationary, and the rest of the world moves 1000 meters to the west). We use the view matrix to do this.
The projection matrix. Since our screens are flat, we need to do a final transformation to “project” our view onto our screen and get that nice 3D perspective. This is what the projection matrix is used for.
This explains it pretty good. But, how do we build them? How large are they?
Also, I've read in this question:
What does glLoadIdentity() do in OpenGL?
that:
glMatrixMode(GL_PROJECTION) deals with the matrices used by
perspective transformation or orthogonal transformation.
glMatrixMode(GL_MODELVIEW) deals with matrices used by model-view
transformation. That is, to transform your object (aka model) to the
view coordinate space (or camera space).
What those transformation mean and how do they get calculated?
I know that here are many question, but I'm trying to make better conception of all of these to get better view on OpenGL. That's why I need some abstract explanation to dive into all details with understanding of conception beyond.
Translation, rotation, and scaling are all affine transforms, which can be implemented using matrix multiplication and addition. Actually, by augmenting the vector with a w element that's always one, the affine transform in 3 dimensions becomes a linear transformation in 4 dimensions and all you need is a matrix multiply.
Doing it with a matrix multiply is very nice because (1) it's fast and (2) you don't need special logic for any of the operations -- you can even compose as many of these affine operations as you want and still represent it with a single matrix.
Having multiple matrix modes is useful when composing. If you had only one matrix, you could add new operations at either end but not in the middle. By having 3 matrices multiplied together later, you can insert new operations at four different points in the order.
The matrix stack is also very useful because it allows you to do an operation for a few primitives and then remove it. If you tried to undo the operation by doing the inverse operation, eventually rounding errors would get out of control. But by remembering the previous matrix, you can just get it back as if the rotation or whatever never happened.
OpenGL is nice in that rather than working with matrices directly, you can call functions that will manipulate them.
So under the hood (what really happens), is that there are several matrices that transform your objects (a model-view matrix that transforms object to camera space, and projection matrix for perspective / orthogonal transformations).
glMatrixMode is like a switch that allows you to choose which type of matrix to use and manipulate, and you specify using the arguments. So glMatrixMode(GL_PROJECTION) means that you will be manipulating the projection matrix.

Why do you use camera space instead of model space for normals?

I am learning OpenGL graphics, and am getting into shadows. The tutorials that I am reading are telling me to transform my normals and light vector to camera space. Why is this? Why can't you just keep the coords in model space?
A follow up question to this is how to handle model transformations. I am unable to find a definitive answer. I currently have this code:
vec3 normCamSpace = normalize(mat3(V) * normal);"
vec3 dirToLight = (V*vec4(lightPos, 0.0)).xyz;"
float cosTheta = clamp(dot(normCamSpace, dirToLight),0,1);"
V is the view matrix, or the camera matrix. I am unsure how to move or edit the light when the model changes in position, rotation, and scale.
The main reason is, that usually your light positions will not be given in model space, but world space. However for illumination to work efficiently all calculations must happen in a common space. In your usual transformation chain, model local coordinates are transformed by the modelview matrix directly into view space
p_view = MV · p_local
Since you normally have only one modelview matrix it would be cumbersome to separate this steap into something like
p_world = M · p_local
p_view = V · p_world
For that you required MV to be separated.
Since the projection transformation traditionally happens as a separate step, view space is the natural "common lower ground" on which illumination calculation to base on. It just involves transforming transforming your light positions from world to view space, and since light positions are not very complex, this is done on the CPU and the pretransformed light positions given as shader.
Note that nothing is stopping you from performing illumination calculations in world space, or model local space. It just takes transforming the light positions correctly.
I am learning OpenGL graphics, and am getting into shadows. The tutorials that I am reading are telling me to transform my normals and light vector to camera space. Why is this? Why can't you just keep the coords in model space?
Actually if you're the one writing the shader, you can use whatever coordinate space you want. IMO calculating lighting in world space feels more "natural", but that's matter of taste.
However, there are two small details:
You cannot "naturally" calculate lighting in object space, if your object is a skinned mesh (character model animated by bones). Such model will require world space or view space. If your object can be only translated and rotated (affine transforms only), then lighting can be easily calculated in model/object space. I think some game engines actualy worked this way.
If you use camera space, you can drop one subtraction when calculating specular highlights. Blinn/phong specular models require vector to(or from) eye to calculate specular factor. In camera space vector from eye to point is equal to point position. This is a very small optimization and it probably isn't worth the effort.

Why would it be beneficial to have a separate projection matrix, yet combine model and view matrix?

When you are learning 3D programming, you are taught that it's easiest think in terms of 3 transformation matrices:
The Model Matrix. This matrix is individual to every single model and it rotates and scales the object as desired and finally moves it to its final position within your 3D world. "The Model Matrix transforms model coordinates to world coordinates".
The View Matrix. This matrix is usually the same for a large number of objects (if not for all of them) and it rotates and moves all objects according to the current "camera position". If you imaging that the 3D scene is filmed by a camera and what is rendered on the screen are the images that were captured by this camera, the location of the camera and its viewing direction define which parts of the scene are visible and how the objects appear on the captured image. There are little reasons for changing the view matrix while rendering a single frame, but those do in fact exists (e.g. by rendering the scene twice and changing the view matrix in between, you can create a very simple, yet impressive mirror within your scene). Usually the view matrix changes only once between two frames being drawn. "The View Matrix transforms world coordinates to eye coordinates".
The Projection Matrix. The projection matrix decides how those 3D coordinates are mapped to 2D coordinates, e.g. if there is a perspective applied to them (objects get smaller the farther they are away from the viewer) or not (orthogonal projection). The projection matrix hardly ever changes at all. It may have to change if you are rendering into a window and the window size has changed or if you are rendering full screen and the resolution has changed, however only if the new window size/screen resolution has a different display aspect ratio than before. There are some crazy effects for that you may want to change this matrix but in most cases its pretty much constant for the whole live of your program. "The Projection Matrix transforms eye coordinates to screen coordinates".
This makes all a lot of sense to me. Of course one could always combine all three matrices into a single one, since multiplying a vector first by matrix A and then by matrix B is the same as multiplying the vector by matrix C, where C = B * A.
Now if you look at the classical OpenGL (OpenGL 1.x/2.x), OpenGL knows a projection matrix. Yet OpenGL does not offer a model or a view matrix, it only offers a combined model-view matrix. Why? This design forces you to permanently save and restore the "view matrix" since it will get "destroyed" by model transformations applied to it. Why aren't there three separate matrices?
If you look at the new OpenGL versions (OpenGL 3.x/4.x) and you don't use the classical render pipeline but customize everything with shaders (GLSL), there are no matrices available any longer at all, you have to define your own matrices. Still most people keep the old concept of a projection matrix and a model-view matrix. Why would you do that? Why not using either three matrices, which means you don't have to permanently save and restore the model-view matrix or you use a single combined model-view-projection (MVP) matrix, which saves you a matrix multiplication in your vertex shader for ever single vertex rendered (after all such a multiplication doesn't come for free either).
So to summarize my question: Which advantage has a combined model-view matrix together with a separate projection matrix over having three separate matrices or a single MVP matrix?
Look at it practically. First, the fewer matrices you send, the fewer matrices you have to multiply with positions/normals/etc. And therefore, the faster your vertex shaders.
So point 1: fewer matrices is better.
However, there are certain things you probably need to do. Unless you're doing 2D rendering or some simple 3D demo-applications, you are going to need to do lighting. This typically means that you're going to need to transform positions and normals into either world or camera (view) space, then do some lighting operations on them (either in the vertex shader or the fragment shader).
You can't do that if you only go from model space to projection space. You cannot do lighting in post-projection space, because that space is non-linear. The math becomes much more complicated.
So, point 2: You need at least one stop between model and projection.
So we need at least 2 matrices. Why model-to-camera rather than model-to-world? Because working in world space in shaders is a bad idea. You can encounter numerical precision problems related to translations that are distant from the origin. Whereas, if you worked in camera space, you wouldn't encounter those problems, because nothing is too far from the camera (and if it is, it should probably be outside the far depth plane).
Therefore: we use camera space as the intermediate space for lighting.
In most cases your shader will need the geometry in world or eye coordinates for shading so you have to seperate the projection matrix from the model and view matrices.
Making your shader multiply the geometry with two matrices hurts performance. Assuming each model have thousends (or more) vertices it is more efficient to compute a model view matrix in the cpu once, and let the shader do one less mtrix-vector multiplication.
I have just solved a z-buffer fighting problem by separating the projection matrix. There is no visible increase of the GPU load. The two folowing screenshots shows the two results - pay attention to the green and white layers fighting.

The purpose of Model View Projection Matrix

For what purposes are we using Model View Projection Matrix?
Why do shaders require Model View Projection Matrix?
The model, view and projection matrices are three separate matrices. Model maps from an object's local coordinate space into world space, view from world space to camera space, projection from camera to screen.
If you compose all three, you can use the one result to map all the way from object space to screen space, making you able to work out what you need to pass on to the next stage of a programmable pipeline from the incoming vertex positions.
In the fixed functionality pipelines of old, you'd apply model and view together, then work out lighting using another result derived from them (with some fixes so that e.g. normals are still unit length even if you've applied some scaling to the object), then apply projection. You can see that reflected in OpenGL, which never separates the model and view matrices — keeping them as a single modelview matrix stack. You therefore also sometimes see that reflected in shaders.
So: the composed model view projection matrix is often used by shaders to map from the vertices you loaded for each model to the screen. It's not required, there are lots of ways of achieving the same thing, it's just usual because it allows all possible linear transforms. Because of that, a lesser composed version of it was also the norm in ye olde fixed pipeline world.
Because matrices are convenient. Matrices help to convert locations/directions with respect to different spaces (A space can be defined by 3 perpendicular axes and an origin).
Here is an example from a book specified by #legends2k in comments.
The residents of Cartesia use a map of their city with the origin
centered quite sensibly at the center of town and axes directed along
the cardinal points of the compass. The residents of Dyslexia use a
map of their city with the coordinates centered at an arbitrary point
and the axes running in some arbitrary directions that probably seemed
a good idea at the time. The citizens of both cities are quite happy
with their respective maps, but the State Transportation Engineer
assigned a task of running up a budget for the first highway between
Cartesia and Dyslexia needs a map showing the details of both cities,
which therefore introduces a third coordinate system that is superior
to him, though not necessarily to anybody else.
Here is another example,
Assume that you have created a car object in a game with it's vertex positions using world's co-ordinates. Suppose you have to use this same car in some other game in an entirely different world, you have to define the positions again and the calculations will go complex. This is because you again have to calculate the positions of window, hood, headlight, wheels etc., in the car with respect to new world.
See this video to understand the concepts of model, view and projection. (highly recommended)
Then see this to understand how the vertices in the world are represented as Matrices and how they are transformed.