Why is there a glMatrixMode in OpenGL? - opengl

I just don't understand what OpenGL's glMatrixMode is for.
As far as I can see, when glMatrixMode(GL_MODELVIEW) is called, it
is followed by glVertex, glTranslate, glRotate and the like,
that is, OpenGL commands that place some objects somewhere in
the space. On the other hand, if glOrtho or glFrustum or gluProjection
is called (ie how the placed objects are rendered), it has a preceeding call of glMatrixMode(GL_PROJECTION).
I guess what I have written so far is an assumption on which someone will prove
me wrong, but is not the point of using different Matrix Modes exactly
because there are different kinds of gl-functions: those concerned with
placing objects and those with how the objects are rendered?

This is simple and can be answered very briefly:
Rendering vertices (as in glVertex ) depends on the current state of matrices called "model-view matrix" and "projection matrix";
The commands glTranslatef, glPushMatrix, glLoadIdentity, glLoadMatrix, glOrtho, gluPerspective and the whole family affect the current matrix (which is either of the above);
The command glMatrixMode selects the matrix (model-view or projection) which is affected by the forementioned commands.
(There's also the texture matrix used for texture coordinates, but it's seldomly used.)
So the common use case is:
Have the model-view matrix active most of the time;
Whenever you have to initialize the projection matrix (usually at the beginning or when the window is resized, perhaps), switch the active to projection, set up a perspective, and revert back to model-view.

You can use glRotate and glTranslate for projection matrices as well.
Also: OpenGL supports transforms of textures and colors. If you active this feature you can for example modify the texture coordinates of an object without rewriting the texture coordinates each frame (slow).
This is a very useful feature if you want to scroll a texture across an object. All you have to do for this is to draw the textured object, set the matrix mode to GL_TEXTURE and call glTranslate to set the offset into the texture.

As Nils pointed out, you do have more to matrices than just what you mentioned.
I'll add a couple thoughts:
OpenGL core (from 3.1 onwards) does away with all the matrix stuff completely, so does GL ES 2.0. This is simply due to the fact that shader programs removed much of the requirement of having them exposed at the GL level (it's still a convenience, though). You then only have uniforms, and you have to compute their value completely on the client side.
There are more matrix manipulation entrypoints than the ones you mention. Some of them apply equally well to projection/modelview (glLoadIdentity/glLoadMatrix/glMultMatrix, Push/Pop), They are very useful if you want to perform the matrix computation yourself (say because you need them somewhere else in your application).

All geometry coordinates undergo several linear transformations in sequence. While any linear transformation can be expressed by a single matrix, often you want to think of a sequence of transformations and edit the sequence, and if you have only a single matrix you could only change the ends of that sequence. By providing several transformation steps, OpenGL gives you several places in the middle where you can change the transformation as well.
Calling glMatrixMode before emitting geometry has no effect at all. You call glMatrixMode before editing the transform matrix, to determine where in the overall sequence those edits appear.
(NB: Looking at the sequence makes a lot more sense if you remember that translation and rotation are not commutative, because translation changes the center of rotation. Similarly translation and scaling are not commutative.)

Related

Does OpenGL change vertices in memory when you apply transformations?

So let's say I have a single vertex (to make things easy) in my program, (0, 0, 0). Right at the origin. I render a single frame with a simple translation matrix, moving the vertex two units down the x-axis. The vertex is rendered accordingly. Does the same vertex now show up in the VRAM as
(2, 0, 0)? I've read that it's important to load all the respective identity matrices in OpenGL every time a frame is rendered--and I assume that's because everything would continually move, rotate, etc. further and further, implying that applying transformations DOES modify actual data, not just the appearance onscreen.
Strictly speaking, OpenGL is just an API definition. An implementation can do whatever it wants as long as it meets the specifications.
That being said, the answer to your question is generally: NO. It's hard to picture how storing transformed vertices back into the memory that also contained the original vertices would ever make sense.
The original vertex positions are passed into the vertex shader, where they are processed, which can include transformations. Once they exit the vertex shader, the transformed positions will most likely be stored in some kind of cache or dedicated on-chip GPU memory until they are processed by the next steps of the pipeline, which includes perspective division, application of the viewport transform, and rasterization. Once those vertex processing steps are completed, the transformed vertices can be discarded. They may stay in a cache for a little longer, for possible reuse of the processed vertex in case the same original vertex is used again. But they are not stored in any persistent way.
The way I interpret it, what you heard about having to reset the matrices for each frame was probably a misunderstanding. If you want to apply the same matrices in the next frame, you don't have to do anything at all.
What they were most likely talking about is related to how the matrix stack in legacy OpenGL works. Most calls that modify the current matrix, like glTranslatef(), glRotatef(), etc, are applied incrementally to the current matrix. For example, if you call glRotatef(), the rotation is combined with the transformation that was already on the matrix stack. The result it that your newly specified rotation is applied to the vertices first, followed by the transformations that were already on the matrix stack.
Based on this, if you want to specify transformations from scratch at the start of each frame, you will call glLoadIdentity() to reset the current transformation on the matrix stack before you start specifying your new transformations. Or you can use glPushMatrix()/glPopMatrix() to save and restore the desired state of the matrix stack.
If you use what many people call "modern OpenGL", meaning that you don't use the legacy fixed pipeline functionality, you don't have to worry about any of that. The matrix stack is gone for good, and you get to calculate your own transformation matrices, and pass them to your shader code.
Here is a link on wiki about the mathematics involved with Transformation Matrices. https://en.wikipedia.org/wiki/Transformation_matrix this will give you an understanding of the math behind the scenes. Another way to look at this is also on the lines of linear or vector algebra. So what happens under the hood when you render a scene is that all of the vertex (pixel) data is sent from the CPU to the GPU to be rasterized and drawn to the screen. This is your batch process or render call, now you also have a frame function that will happen x amount of times per second which will give you your frames per second. So if you are rendering at say 60 FPS then these pixels, vertices, triangles etc., will be drawn 60 times each second. When you apply a transformation to this set of vertices what happens here is you have a transformation matrix that is being multiplied to your model view projection matrix. MVP * T which this will be saved back into your existing MVP matrix if this is how you have your calculations set up. There are some differences between which version of OpenGL you are using as you go from OpenGL v1.0 Pure CPU calls up to v4.5. As far as I know after version 3.2 or 3.3 I don't remember which version off hand you have to implement the MVP yourself where versions greater than v1.5 where shaders were first introduced was handled for you already. Here is the documentation on OpenGL https://www.opengl.org/ and on the main page there will be a topic that says documentation from there you can either select OpenGL Registry or which ever specific version you want to look at. From here you can read their documentation about the OpenGL API since this site covers everything that is available in their API. So as you begin to understand this process, yes the actual coordinate data for these vertices does change, however it will not continuously change unless you are incrementing a static type variable with a factor of time thus giving you some kind of simulation of movement or animation. If you apply only a single transformation then these pixels, vertices, triangles, etc., will either Rotate, Translate, Scale, or Shear depending on which Transformation you are applying. I will tell you that the order of these operations does matter, but I will not tell you which order they are, that will be for you to read up on and to figure out. These reason this does matter is due to the fact that not every Matrix Multiplication has a valid Inverse Matrix. The Identity is used for reasons such as round off errors and floating point precision, so that if you happen to apply say 1,000 transformations in a matter of about 10 seconds, you do not have astronomical errors. This should be enough to point you in the right direction and also serve as a guide as to how the OpenGL API works.

How do I access a transformed openegl modelview matrix [tried glGetFloatv()]?

I am trying to rotate over the 'x' axis and save the transformed matrix so that I can use it to rotate further later; or over another axis from the already rotated perspective.
//rotate
glRotatef(yROT,model[0],model[4],model[8]);//front over right axis
//save model
glGetFloatv(GL_MODELVIEW_MATRIX, model);
Unfortunately I noticed that openGL must buffer the transformations because the identity matrix is loaded to model. Is there a work-around?
Why, oh God, would you do this?
I have been toying around with attempting to understand quaternions, euler, or axis rotation. The concepts are not difficult but I have been having trouble with the math even after looking at examples *edit[and most of the open classes I have found either are not well documented for simpleton users or have restrictions on movement].
I decided to find a way to cheat.
edit*
By 'further later' I mean in the next loop of code. In other words, yRot is the number of degrees I want my view to rotate from the saved perspective.
My suggestion: Don't bother with glRotate at all, they were never very pleasant to work with in the first place and no serious program did use them ever.
If you want to use the fixed function pipeline (= no shaders), use glLoadMatrix to load whatever transformation you currently need. With shaders you have to do the conceptually same with glUniform anyway.
Use a existing matrix math library, like GLM, Eigen or linmath.h to construct the transformation matrices. The nice benefit is, that you can make copies of a matrix at any point, so instead of fiddling with glLoadIdentity, glPushMatrix and glPopMatrix you just make copies where you need them and work from them.
BTW: There is no such thing as "models" in OpenGL. That's not how OpenGL works. OpenGL draws points, lines or triangles, one at a time, where each such called primitive is transformed individually to a position on the (screen) framebuffer and turned into pixels. Once a primitive has been processed OpenGL already forgot about it.

Why would it be beneficial to have a separate projection matrix, yet combine model and view matrix?

When you are learning 3D programming, you are taught that it's easiest think in terms of 3 transformation matrices:
The Model Matrix. This matrix is individual to every single model and it rotates and scales the object as desired and finally moves it to its final position within your 3D world. "The Model Matrix transforms model coordinates to world coordinates".
The View Matrix. This matrix is usually the same for a large number of objects (if not for all of them) and it rotates and moves all objects according to the current "camera position". If you imaging that the 3D scene is filmed by a camera and what is rendered on the screen are the images that were captured by this camera, the location of the camera and its viewing direction define which parts of the scene are visible and how the objects appear on the captured image. There are little reasons for changing the view matrix while rendering a single frame, but those do in fact exists (e.g. by rendering the scene twice and changing the view matrix in between, you can create a very simple, yet impressive mirror within your scene). Usually the view matrix changes only once between two frames being drawn. "The View Matrix transforms world coordinates to eye coordinates".
The Projection Matrix. The projection matrix decides how those 3D coordinates are mapped to 2D coordinates, e.g. if there is a perspective applied to them (objects get smaller the farther they are away from the viewer) or not (orthogonal projection). The projection matrix hardly ever changes at all. It may have to change if you are rendering into a window and the window size has changed or if you are rendering full screen and the resolution has changed, however only if the new window size/screen resolution has a different display aspect ratio than before. There are some crazy effects for that you may want to change this matrix but in most cases its pretty much constant for the whole live of your program. "The Projection Matrix transforms eye coordinates to screen coordinates".
This makes all a lot of sense to me. Of course one could always combine all three matrices into a single one, since multiplying a vector first by matrix A and then by matrix B is the same as multiplying the vector by matrix C, where C = B * A.
Now if you look at the classical OpenGL (OpenGL 1.x/2.x), OpenGL knows a projection matrix. Yet OpenGL does not offer a model or a view matrix, it only offers a combined model-view matrix. Why? This design forces you to permanently save and restore the "view matrix" since it will get "destroyed" by model transformations applied to it. Why aren't there three separate matrices?
If you look at the new OpenGL versions (OpenGL 3.x/4.x) and you don't use the classical render pipeline but customize everything with shaders (GLSL), there are no matrices available any longer at all, you have to define your own matrices. Still most people keep the old concept of a projection matrix and a model-view matrix. Why would you do that? Why not using either three matrices, which means you don't have to permanently save and restore the model-view matrix or you use a single combined model-view-projection (MVP) matrix, which saves you a matrix multiplication in your vertex shader for ever single vertex rendered (after all such a multiplication doesn't come for free either).
So to summarize my question: Which advantage has a combined model-view matrix together with a separate projection matrix over having three separate matrices or a single MVP matrix?
Look at it practically. First, the fewer matrices you send, the fewer matrices you have to multiply with positions/normals/etc. And therefore, the faster your vertex shaders.
So point 1: fewer matrices is better.
However, there are certain things you probably need to do. Unless you're doing 2D rendering or some simple 3D demo-applications, you are going to need to do lighting. This typically means that you're going to need to transform positions and normals into either world or camera (view) space, then do some lighting operations on them (either in the vertex shader or the fragment shader).
You can't do that if you only go from model space to projection space. You cannot do lighting in post-projection space, because that space is non-linear. The math becomes much more complicated.
So, point 2: You need at least one stop between model and projection.
So we need at least 2 matrices. Why model-to-camera rather than model-to-world? Because working in world space in shaders is a bad idea. You can encounter numerical precision problems related to translations that are distant from the origin. Whereas, if you worked in camera space, you wouldn't encounter those problems, because nothing is too far from the camera (and if it is, it should probably be outside the far depth plane).
Therefore: we use camera space as the intermediate space for lighting.
In most cases your shader will need the geometry in world or eye coordinates for shading so you have to seperate the projection matrix from the model and view matrices.
Making your shader multiply the geometry with two matrices hurts performance. Assuming each model have thousends (or more) vertices it is more efficient to compute a model view matrix in the cpu once, and let the shader do one less mtrix-vector multiplication.
I have just solved a z-buffer fighting problem by separating the projection matrix. There is no visible increase of the GPU load. The two folowing screenshots shows the two results - pay attention to the green and white layers fighting.

OpenGL define vertex position in pixels

I've been writing a 2D basic game engine in OpenGL/C++ and learning everything as I go along. I'm still rather confused about defining vertices and their "position". That is, I'm still trying to understand the vertex-to-pixels conversion mechanism of OpenGL. Can it be explained briefly or can someone point to an article or something that'll explain this. Thanks!
This is rather basic knowledge that your favourite OpenGL learning resource should teach you as one of the first things. But anyway the standard OpenGL pipeline is as follows:
The vertex position is transformed from object-space (local to some object) into world-space (in respect to some global coordinate system). This transformation specifies where your object (to which the vertices belong) is located in the world
Now the world-space position is transformed into camera/view-space. This transformation is determined by the position and orientation of the virtual camera by which you see the scene. In OpenGL these two transformations are actually combined into one, the modelview matrix, which directly transforms your vertices from object-space to view-space.
Next the projection transformation is applied. Whereas the modelview transformation should consist only of affine transformations (rotation, translation, scaling), the projection transformation can be a perspective one, which basically distorts the objects to realize a real perspective view (with farther away objects being smaller). But in your case of a 2D view it will probably be an orthographic projection, that does nothing more than a translation and scaling. This transformation is represented in OpenGL by the projection matrix.
After these 3 (or 2) transformations (and then following perspective division by the w component, which actually realizes the perspective distortion, if any) what you have are normalized device coordinates. This means after these transformations the coordinates of the visible objects should be in the range [-1,1]. Everything outside this range is clipped away.
In a final step the viewport transformation is applied and the coordinates are transformed from the [-1,1] range into the [0,w]x[0,h]x[0,1] cube (assuming a glViewport(0, w, 0, h) call), which are the vertex' final positions in the framebuffer and therefore its pixel coordinates.
When using a vertex shader, steps 1 to 3 are actually done in the shader and can therefore be done in any way you like, but usually one conforms to this standard modelview -> projection pipeline, too.
The main thing to keep in mind is, that after the modelview and projection transforms every vertex with coordinates outside the [-1,1] range will be clipped away. So the [-1,1]-box determines your visible scene after these two transformations.
So from your question I assume you want to use a 2D coordinate system with units of pixels for your vertex coordinates and transformations? In this case this is best done by using glOrtho(0.0, w, 0.0, h, -1.0, 1.0) with w and h being the dimensions of your viewport. This basically counters the viewport transformation and therefore transforms your vertices from the [0,w]x[0,h]x[-1,1]-box into the [-1,1]-box, which the viewport transformation then transforms back to the [0,w]x[0,h]x[0,1]-box.
These have been quite general explanations without mentioning that the actual transformations are done by matrix-vector-multiplications and without talking about homogenous coordinates, but they should have explained the essentials. This documentation of gluProject might also give you some insight, as it actually models the transformation pipeline for a single vertex. But in this documentation they actually forgot to mention the division by the w component (v" = v' / v'(3)) after the v' = P x M x v step.
EDIT: Don't forget to look at the first link in epatel's answer, which explains the transformation pipeline a bit more practical and detailed.
It is called transformation.
Vertices are set in 3D coordinates which is transformed into a viewport coordinates (into your window view). This transformation can be set in various ways. Orthogonal transformation can be easiest to understand as a starter.
http://www.songho.ca/opengl/gl_transform.html
http://www.opengl.org/wiki/Vertex_Transformation
http://www.falloutsoftware.com/tutorials/gl/gl5.htm
Firstly be aware that OpenGL not uses standard pixel coordinates. I mean by that for particular resolution, ie. 800x600 you dont have horizontal coordinates in range 0-799 or 1-800 stepped by one. You rather have coordinates ranged from -1 to 1 later send to graphic card rasterizing unit and after that matched to particular resolution.
I ommited one step here - before all that you have an ModelViewProjection matrix (or viewProjection matrix in some simple cases) which before all that will cast coordinates you use to an projection plane. Default use of that is to implement a camera which converts 3D space of world (View for placing an camera into right position and Projection for casting 3d coordinates into screen plane. In ModelViewProjection it's also step of placing a model into right place in world).
Another case (and you can use Projection matrix this way to achieve what you want) is to use these matrixes to convert one range of resolutions to another.
And there's a trick you will need. You should read about modelViewProjection matrix and camera in openGL if you want to go serious. But for now I will tell you that with proper matrix you can just cast your own coordinate system (and ie. use ranges 0-799 horizontaly and 0-599 verticaly) to standarized -1:1 range. That way you will not see that underlying openGL api uses his own -1 to 1 system.
The easiest way to achieve this is glOrtho function. Here's the link to documentation:
http://www.opengl.org/sdk/docs/man/xhtml/glOrtho.xml
This is example of proper usage:
glMatrixMode (GL_PROJECTION)
glLoadIdentity ();
glOrtho (0, 800, 600, 0, 0, 1)
glMatrixMode (GL_MODELVIEW)
Now you can use own modelView matrix ie. for translation (moving) objects but don't touch your projection example. This code should be executed before any drawing commands. (Can be after initializing opengl in fact if you wont use 3d graphics).
And here's working example: http://nehe.gamedev.net/tutorial/2d_texture_font/18002/
Just draw your figures instead of drawing text. And there is another thing - glPushMatrix and glPopMatrix for choosen matrix (in this example projection matrix) - you wont use that until you combining 3d with 2d rendering.
And you can still use model matrix (ie. for placing tiles somewhere in world) and view matrix (in example for zooming view, or scrolling through world - in this case your world can be larger than resolution and you could crop view by simple translations)
After looking at my answer I see it's a little chaotic but If you confused - just read about Model, View, and Projection matixes and try example with glOrtho. If you're still confused feel free to ask.
MSDN has a great explanation. It may be in terms of DirectX but OpenGL is more-or-less the same.
Google for "opengl rendering pipeline". The first five articles all provide good expositions.
The key transition from vertices to pixels (actually, fragments, but you won't be too far off if you think "pixels") is in the rasterization stage, which occurs after all vertices have been transformed from world-coordinates to screen coordinates and clipped.

OpenGL glMatrixMode help

I'm starting to work a little on OpenGL stuff, and I'm seeing a lot of examples that make calls to the glMatrixMode function.
From what I've gathered, setting this to either GL_MODELVIEW or GL_PROJECTION (etc) will activate that specific transformation matrix and all subsequent calls to matrix transformation functions (glTranslatef, glPushMatrix, glLoadIdentity, glLoadMatrix etc) will affect the active matrix only.
What I don't get is why are there 3 (4 in some cases) different matrices? Which one should I use? (I'm probably going to get a lot of "Use Shaders", but I can't. Limited by school...) When should I switch and activate a different matrix? What is the benefit of utilizing all of them as opposed to only using one?
Thanks for any help :)
glMatrixMode doesn't "activate" matrices. The OpenGL fixed pipeline uses 3 matrices (sometimes 4): Two are responsible for transforming the geometry, one for transforming texture space (and some implementations one for color adjustments). Those matrices are used all the time.
The modelview matrix is used to move geoemtry around. Since OpenGL doesn't have a "camera" the viewer is positioned by moving all the geometry in the opposite (=inverse) of the movements of the "camera".
The projection matrix is used to transform the geometry from modelview space into clip space, i.e. it projects the transformed geometry into the viewport.
The texture matrix transforms the texture coordinates. In case (s,t,r,q) are directly given the benefit of this matrix isn't clear at first. But OpenGL also allows to generate texture coordinates from the vertex positions. This together with the texture matrix allows to implement projection textures.
The color matrix is seldomly used and not even available in all implementations (it's part of an extension). If available it transforms the incoming vertex colours. Since there are not implicit color generators what use is then? Well, it can be used to transform between linear colourspaces, e.g. RGB->XYZ or any other colour space conversion that can be expressed as a matrix of scalars. Nobody used the color matrix these days, shaders do the job much better.
glMatrixMode is there so that there's no bloat of functions. Otherwise you'd need:
glModelviewLoadIdentity
glModelviewLoadMatrix
glModelviewMultMatrix
glModelviewRotate
glModelviewTranslate
glModelviewScale
glModelviewPushMatrix
glModelviewPopMatrix
glProjectionLoadIdentity
glProjectionLoadMatrix
glProjectonMultMatrix
glProjectionRotate
glProjectionTranslate
glProjectionScale
glProjectionPushMatrix
glProjectionPopMatrix
and so on. Also you couldn't use functions like glFrutum in both projection and texture matrices. You'd need two of those, too.
And last but not least one important hint. Setting the viewport and the projection matrix belongs in the rendering function. Most tutorials you'll see out there place them in the window resizing handler, which is the totally wrong place for that. Don't immitate this bad habit.
You will be using all of these if you use the fixed function pipeline (i.e. "no shaders"). In fact, you'll use them with shaders too, but you'll implement them yourself in that case.
This part of OpenGL can be hard to grasp at first, although it is actually quite simple. When you select a particular matrix, then this does not turn on or off anything. All it does is that it makes functions like glTranslatef work with the one specific matrix that you've selected.
OpenGL works like this everywhere (except for direct state access), for example with textures and buffers, in the same way.
EDIT: As for why there are several matrices, they all do different things. Your models are usually in their "own space", which means they need to be somehow transferred to the "world", by scaling them appropriately and translating them to the right location. Then everything (the whole "world") needs to be transformed in a way according to your "eye position", and it must be transformed into a "normalized" clip space, because that is how the hardware can be implemented in the most efficient manner, etc, etc, etc.
All those matrices are usually multiplied together (without you knowing that this happens).If you google for opengl transform pipeline, you will get a lot of good resources that explain what happens when in detail, for example this.
Or, read the specification (freely available at opengl.org), it contains very explicit (and in my opinion easy) information on how all the seemingly complicated matrix stuff is intended.
You will probably just use glMatrixMode(GL_PROJECTION) once, followed by gluPerspective() (or maybe glOrtho()), so that you will set how the projection will be done. It is not that very common to change the perspective once set.
Then you change to GL_MODELVIEW and just use it to rotate/translate/scale stuff around. Both matrix are used on redering, being final position on screen = GL_PROJECTION * GL_MODELVIEW * your vertex. But since GL_MODELVIEW should change much more often than GL_PROJECTION, they are separated by the specification.
It's because matrix multiplies are not commutative. If you only had one matrix mode, the new transformation would always occur after all existing ones, and that's not always desirable.
By keeping several matrices which effectively get multiplied together for each piece of geometry, it's possible to e.g. have both translation to render the different bits of an object and also a point-of-view transformation, and be able to adjust these independently of each other.