Derive euler rotation from model view matrix - opengl

I'm implementing a widget tool kit wich requires some symbols to be aligned with the Y axis.
These symbols are drawn using the model view matrix of the parent objects, causing them to be rotated as well.
The solution would be to apply a counter-rotation before rendering those symbols, but I have to keep track of every rotation (in my case only on Z axis), in order to apply the correct counter-rotation. Sadly, every rotation "out of control" will cause a misalignment between the real model view matrix rotation and the "global" rotation kept.
How would it possible to derive eurler rotation angles directly from the model view matrix?

Just clear the rotational part (the upper left 3x3) of the modelview matrix to identity. This removes any rotation, yet retains the translation.

Related

Camera to world transformation and world to camera transformation

I am kind of confused by the camera to world vs. world to camera transformation.
In the openGL rendering pipeline, The transformation is from the world to the camera, right?
Basically, a view coordinate frame is constructed at the camera, then the object in the world is first translated relative to the camera and then rotated with the camera coordinate frame. is this what gluLookat is performing?
If I want to go from camera view back to world view, how it should be done?
Mathematically, I am thinking finding the inverse of the translate and rotate matrices, then apply the rotation before the translate, right?
Usually there are several transformations that map world positions to the screen. The following ones are the most common ones:
World transformation: Can be applied to objects in order to realign them relatively to other objects.
View transformation: After this transformation the camera is at O and looks in the z direction.
Projection transformation: Performs e.g. perspective transformations to simulate a real camera
Viewport adaption: This is basically a scaling and translation that maps the positions from range [-1, 1] to the viewport. This is usually the screen width and height.
gluLookAt is used to create a view transformation. You can imagine it as follows: Place the camera somewhere in your scene. Now transform the whole scene (with the camera) so that the camera is at the origin, it faces in the z direction and the y axis represents the up direction. This is a simple rigid body transformation that can be represented as an arbitrary rotation (with three degrees of freedom) and an arbitrary translation (with another three degrees of freedom). Every rigid body transformation can be split into separate rotations and translation. Even the sequence of evaluation can vary, if you choose the correct values. Transformations can be interpreted in different ways. I wrote a blog entry on that topic a while ago. If you're interested, take a look at it. Although it is for DirectX, the maths is pretty much the same for OpenGL. You just have to watch out for transposed matrices.
For the second question: Yes, you are right. You need to find the inverse transformation. This can be easily achieved with the inverse matrix. If you specified the view matrix V as follows:
V = R_xyz * T_xyz
then the inverse transformation V^-1 is
V^-1 = T_xyz^-1 * R_xyz^-1
However, this does not map screen positions to world positions because there is more transformation going on. I hope, that answers your questions.
Here is another interesting point. The view matrix is the inverse of the transformation that would align a camera model (at the origin, facing in z direction) at the specified position. This relation is called system transformation vs. model transformation.

Confused about OpenGL transformations

In opengl there is one world coordinate system with origin (0,0,0).
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move
objects in world coordinates, or do they move the camera? As you know, the same movement can be achieved by either moving objects or camera.
I am guessing that glTranslate, glRotate, change objects, and gluLookAt changes the camera?
In opengl there is one world coordinate system with origin (0,0,0).
Well, technically no.
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move objects in world coordinates, or do they move the camera?
Neither. OpenGL doesn't know objects, OpenGL doesn't know a camera, OpenGL doesn't know a world. All that OpenGL cares about are primitives, points, lines or triangles, per vertex attributes, normalized device coordinates (NDC) and a viewport, to which the NDC are mapped to.
When you tell OpenGL to draw a primitive, each vertex is processed according to its attributes. The position is one of the attributes and usually a vector with 1 to 4 scalar elements within local "object" coordinate system. The task at hand is to somehow transform the local vertex position attribute into a position on the viewport. In modern OpenGL this happens within a small program, running on the GPU, called a vertex shader. The vertex shader may process the position in an arbitrary way. But the usual approach is by applying a number of nonsingular, linear transformations.
Such transformations can be expressed in terms of homogenous transformation matrices. For a 3 dimensional vector, the homogenous representation in a vector with 4 elements, where the 4th element is 1.
In computer graphics a 3-fold transformation pipeline has become sort of the standard way of doing things. First the object local coordinates are transformed into coordinates relative to the virtual "eye", hence into eye space. In OpenGL this transformation used to be called the modelview transformaion. With the vertex positions in eye space several calculations, like illumination can be expressed in a generalized way, hence those calculations happen in eye space. Next the eye space coordinates are tranformed into the so called clip space. This transformation maps some volume in eye space to a specific volume with certain boundaries, to which the geometry is clipped. Since this transformation effectively applies a projection, in OpenGL this used to be called the projection transformation.
After clip space the positions get "normalized" by their homogenous component, yielding normalized device coordinates, which are then plainly mapped to the viewport.
To recapitulate:
A vertex position is transformed from local to clip space by
vpos_eye = MV · vpos_local
eyespace_calculations(vpos_eye);
vpos_clip = P · vpos_eye
·: inner product column on row vector
Then to reach NDC
vpos_ndc = vpos_clip / vpos_clip.w
and finally to the viewport (NDC coordinates are in the range [-1, 1]
vpos_viewport = (vpos_ndc + (1,1,1,1)) * (viewport.width, viewport.height) / 2 + (viewport.x, viewport.y)
*: vector component wise multiplication
The OpenGL functions glRotate, glTranslate, glScale, glMatrixMode merely manipulate the transformation matrices. OpenGL used to have four transformation matrices:
modelview
projection
texture
color
On which of them the matrix manipulation functions act on can be set using glMatrixMode. Each of the matrix manipulating functions composes a new matrix by multiplying the transformation matrix they describe on top of the select matrix thereby replacing it. The functions glLoadIdentity replace the current matrix with identity, glLoadMatrix replaces it with a user defined matrix, and glMultMatrix multiplies a user defined matrix on top of it.
So how does the modelview matrix then emulate both object placement and a camera. Well, as you already stated
As you know, the same movement can be achieved by either moving objects or camera.
You can not really discern between them. The usual approach is by splitting the object local to eye transformation into two steps:
Object to world – OpenGL calls this the "model transform"
World to eye – OpenGL calls this the "view transform"
Together they form the model-view, in fixed function OpenGL described by the modelview matrix. Now since the order of transformations is
local to world, Model matrix vpos_world = M · vpos_local
world to eye, View matrix vpos_eye = V · vpos_world
we can substitute by
vpos_eye = V · ( M · vpos_local ) = V · M · vpos_local
replacing V · M by the ModelView matrix =: MV
vpos_eye = MV · vpos_local
Thus you can see that what's V and what's M of the compund matrix M is only determined by the order of operations in which you multiply onto the modelview matrix, and at which step you decide to "call it the model transform from here on".
I.e. right after a
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
the view is defined. But at some point you'll start applying model transformations and everything after is model.
Note that in modern OpenGL all the matrix manipulation functions have been removed. OpenGL's matrix stack never was feature complete and no serious application did actually use it. Most programs just glLoadMatrix-ed their self calculated matrices and didn't bother with the OpenGL built-in matrix maniupulation routines.
And ever since shaders were introduced, the whole OpenGL matrix stack got awkward to use, to say it nicely.
The verdict: If you plan on using OpenGL the modern way, don't bother with the built-in functions. But keep in mind what I wrote, because what your shaders do will be very similar to what OpenGL's fixed function pipeline did.
OpenGL is a low-level API, there is no higher-level concepts like an "object" and a "camera" in the "scene", so there are only two matrix modes: MODELVIEW (a multiplication of "camera" matrix by the "object" transformation) and PROJECTION (the projective transformation from world-space to post-perspective space).
Distinction between "Model" and "View" (object and camera) matrices is up to you. glRotate/glTranslate functions just multiply the currently selected matrix by the given one (without even distinguishing between ModelView and Projection).
Those functions multiply (transform) the current matrix set by glMatrixMode() so it depends on the matrix you're working on. OpenGL has 4 different types of matrices; GL_MODELVIEW, GL_PROJECTION, GL_TEXTURE, and GL_COLOR, any one of those functions can change any of those matrices. So, basically, you don't transform objects you just manipulate different matrices to "fake" that effect.
Note that glulookat() is just a convenient function equivalent to a translation followed by some rotations, there's nothing special about it.
All transformations are transformations on objects. Even gluLookAt is just a transformation to transform the objects as if the camera was where you tell it to be. Technically they are transformations on the vertices, but that's just semantics.
That's true, glTranslate, glRotate change the object coordinates before rendering and gluLookAt changes the camera coordinate.

Why would it be beneficial to have a separate projection matrix, yet combine model and view matrix?

When you are learning 3D programming, you are taught that it's easiest think in terms of 3 transformation matrices:
The Model Matrix. This matrix is individual to every single model and it rotates and scales the object as desired and finally moves it to its final position within your 3D world. "The Model Matrix transforms model coordinates to world coordinates".
The View Matrix. This matrix is usually the same for a large number of objects (if not for all of them) and it rotates and moves all objects according to the current "camera position". If you imaging that the 3D scene is filmed by a camera and what is rendered on the screen are the images that were captured by this camera, the location of the camera and its viewing direction define which parts of the scene are visible and how the objects appear on the captured image. There are little reasons for changing the view matrix while rendering a single frame, but those do in fact exists (e.g. by rendering the scene twice and changing the view matrix in between, you can create a very simple, yet impressive mirror within your scene). Usually the view matrix changes only once between two frames being drawn. "The View Matrix transforms world coordinates to eye coordinates".
The Projection Matrix. The projection matrix decides how those 3D coordinates are mapped to 2D coordinates, e.g. if there is a perspective applied to them (objects get smaller the farther they are away from the viewer) or not (orthogonal projection). The projection matrix hardly ever changes at all. It may have to change if you are rendering into a window and the window size has changed or if you are rendering full screen and the resolution has changed, however only if the new window size/screen resolution has a different display aspect ratio than before. There are some crazy effects for that you may want to change this matrix but in most cases its pretty much constant for the whole live of your program. "The Projection Matrix transforms eye coordinates to screen coordinates".
This makes all a lot of sense to me. Of course one could always combine all three matrices into a single one, since multiplying a vector first by matrix A and then by matrix B is the same as multiplying the vector by matrix C, where C = B * A.
Now if you look at the classical OpenGL (OpenGL 1.x/2.x), OpenGL knows a projection matrix. Yet OpenGL does not offer a model or a view matrix, it only offers a combined model-view matrix. Why? This design forces you to permanently save and restore the "view matrix" since it will get "destroyed" by model transformations applied to it. Why aren't there three separate matrices?
If you look at the new OpenGL versions (OpenGL 3.x/4.x) and you don't use the classical render pipeline but customize everything with shaders (GLSL), there are no matrices available any longer at all, you have to define your own matrices. Still most people keep the old concept of a projection matrix and a model-view matrix. Why would you do that? Why not using either three matrices, which means you don't have to permanently save and restore the model-view matrix or you use a single combined model-view-projection (MVP) matrix, which saves you a matrix multiplication in your vertex shader for ever single vertex rendered (after all such a multiplication doesn't come for free either).
So to summarize my question: Which advantage has a combined model-view matrix together with a separate projection matrix over having three separate matrices or a single MVP matrix?
Look at it practically. First, the fewer matrices you send, the fewer matrices you have to multiply with positions/normals/etc. And therefore, the faster your vertex shaders.
So point 1: fewer matrices is better.
However, there are certain things you probably need to do. Unless you're doing 2D rendering or some simple 3D demo-applications, you are going to need to do lighting. This typically means that you're going to need to transform positions and normals into either world or camera (view) space, then do some lighting operations on them (either in the vertex shader or the fragment shader).
You can't do that if you only go from model space to projection space. You cannot do lighting in post-projection space, because that space is non-linear. The math becomes much more complicated.
So, point 2: You need at least one stop between model and projection.
So we need at least 2 matrices. Why model-to-camera rather than model-to-world? Because working in world space in shaders is a bad idea. You can encounter numerical precision problems related to translations that are distant from the origin. Whereas, if you worked in camera space, you wouldn't encounter those problems, because nothing is too far from the camera (and if it is, it should probably be outside the far depth plane).
Therefore: we use camera space as the intermediate space for lighting.
In most cases your shader will need the geometry in world or eye coordinates for shading so you have to seperate the projection matrix from the model and view matrices.
Making your shader multiply the geometry with two matrices hurts performance. Assuming each model have thousends (or more) vertices it is more efficient to compute a model view matrix in the cpu once, and let the shader do one less mtrix-vector multiplication.
I have just solved a z-buffer fighting problem by separating the projection matrix. There is no visible increase of the GPU load. The two folowing screenshots shows the two results - pay attention to the green and white layers fighting.

Preserving the original axis system while rotating with openGL

I'm implementing an arcball with openGL (on cpp).
Say, I have an object in the center of the axes system and i want to rotate in several times acording to the original (world) axes.
But, after the first rotation, the axes are changed and all further rotations goes wrong.
Any ideas?
Thanks.
Supply the object with it's own orientation axes (modelview matrix), and then multiply that by the rotation matrices. Check Wikipedia for info on how to construct rotation matrices.
I had to do the same thing myself in an OpenGL ES application, which I describe in a writeup about it here. The original crude approach read the current model view matrix and manipulated it to produce the desired effect:
GLfloat currentModelViewMatrix[16];
glGetFloatv(GL_MODELVIEW_MATRIX, currentModelViewMatrix);
glRotatef(xRotation, currentModelViewMatrix[1], currentModelViewMatrix[5], currentModelViewMatrix[9]);
glGetFloatv(GL_MODELVIEW_MATRIX, currentModelViewMatrix);
glRotatef(yRotation, currentModelViewMatrix[0], currentModelViewMatrix[4], currentModelViewMatrix[8]);
This will work, but be aware that the two glGetFloatv() calls will slow your rendering by halting the pipeline. I've since replaced this code with calculations that I perform on my own internal copy of the model view matrix, then I simply write the internally manipulated model view matrix after each rotation. This removes the need to do the expensive matrix read operations.
Add xAngle and yAngle to the current matrix.
Matrix.rotateM(matrix, 0, xAngleADD, matrix[1], matrix[5], matrix[9]);
Matrix.rotateM(matrix, 0, yAngleADD, matrix[0], matrix[4], matrix[8]);
gl.glMultMatrixf(matrix, 0);

orientation in openGl

Could someone explain to me what are the up front and right vectors of an object and how are they used ?
Are you referring to how vectors in Object or Model space are used? Each object or model has its own coordinate space. This is necessary since the points in the model will be relative to the models origin. This makes it possible to work with arbitrary models in larger worlds. You would perform certain operations on the model (like Rotation) before moving the model in the World (translation). If I understand your question correctly, you are referring to a set of vectors that define the models position in the world. These up, front and right vectors would be what you would use to possibly determine which way the model was facing or moving.
I hope this helps if anything to formulate your question a bit more.
This Gamedev question might be of help glMultMatrix, how does it work?
Those vectors usually refer to world-space transformations of the local body axes of the model in question.
Usually a model is defined with respect to some local coordinate system whose origin is at the center of mass, centroid, or some other convenient location from which to construct the object's geometry. This local coordinate system has its own x, y, and z axes with x = [1, 0, 0]', y = [0, 1, 0]', and z = [0, 0, 1]'. The coordinates of each vertex in the model are then defined with respect to this local frame. Usually the origin is chosen so that the "forward" direction of the model is aligned with this local x, the "left" direction is aligned with local y, and "up" is aligned with local z (though any right-handed system will do.
The model is placed into the world via the modelview matrix in OpenGL. When the model's vertices are sent to the GPU, they are transformed from their local space (aka "object" space or "model space" or "body space") to world space by multiplying them by the modelview matrix. Ignoring scaling, the upper left 3x3 block in the modelview matrix is an orthonormal rotation matrix that defines the projection of the body axes into the world frame, assuming the model is placed at the world origin. The modelview matrix is augmented into a 4x4 by adding the translation between the model and world origins in the upper right 3x1 block of the modelview matrix.