There are two viewpoints to understand model transformations,I read from Red book 7th edition,[ Grand, Fixed Coordinate System ] and [ Moving a Local Coordinate System ].
My Question is:
What is the difference between the two viewpoints and when to use them for a specified situation ?
additional context info:
I'd like to give some context for you to help me,Or you can just ignore below details.
I understood these two viewpoints in the following way.Think I have following code:
(functions like glTranslatef are deprecated,replaced by math library ,but theory may keep helpful.)
//render the sence,and use orthogonal projection
void display( void )
{
glClear(GL_COLOR_BUFFER_BIT|GL_DEPTH_BUFFER_BIT);
glLoadIdentity();
drawAixs(4.8f);//draw x y z aixs,4.8 is axis length
glRotatef(45.0,0.0,0.0,1.0);
glTranslatef(3.0,0.0,0.0);
glutSolidCube(2.0);
glutSwapBuffers();
}
From the local coordinate view :
In this viewpoint,we can understand it like :
And the current transformation matrix(CTM) is:
From global fixed coordinate view :
In this viewpoint,we can get :
The two coordinate systems are just a convention. There is always one global coordinate system, but there may be many local systems. The notion of a local system is only a convention. A general transformation matrix M transforms vertices from one space to another:
v' = M * v
So that v is in one coordinate space and v' is in another. In case M describes position of an object, we say that it is putting vertices from local object coordinate frame to the global world coordinate frame. On the other hand, if it describes camera position and projection, it puts vertices from the world coordinate frame to another local eye coordinate frame.
A complex objects with joints (such as humanoid characters, or mechanisms with hinges) may actually have several intermediate local spaces, where the transformations are chained, depending on the structure of the skeleton of the object.
Usually, one uses object space where the vertices of the models are defined, world space which is a global space where the coordinates can be related to each other, and camera space or eye space which is space of screen coordinates. But with the introduction of OpenGL 3 shaders, these are completely arbitrary: you can write your shader so that it uses a single matrix that transforms the vertices from object space directly to screen space. So do not worry about coordinate frames, just focus on the task at hand - what it is you want to display and how the objects should be moving (relative to each other or relative to some common reference).
Related
Quoting "https://learnopengl.com/Getting-started/Coordinate-Systems"
Next we need to create a view matrix. We want to move slightly backwards in the scene so the object becomes visible (when in world space we're located at the origin (0,0,0)). To move around the scene, think about the following:
This confused me, because before I know any concepts about these space, I can simply draw a triangle which all vertices has a z-coordinates equals to 0, and I still can see it(If I was located at (0,0,0), I suppose not be able to see it)
One of the aspects that seems to confuse a lot of beginners of OpenGL (or computer graphics in general) is, that there not really actually is such a thing as a camera.
The effects of a camera are implemented by simple geometrical transformation. The effect of perspective, i.e. lines vanishing in the distances is implemented through what's called the "homogenous division": The vectors used in 3D graphics contain actually 4 values, the 4th usually being 1. This allows to coalesce translations and rotations into a single 4×4 transformation matrix.
What it also enables is, to have the transformed Z value also be mapped into the 4th, the W coordinate.
As a last step, that's hardwired in fact, all the coordinates of the transformed vector are divided by the W element. Hence if you have Z values other than 1, this creates a scaling on the X and Y components, that looks like vanishing lines.
To enable this, the transformation matrix must have nonzero elements in the W-row, Z-column, in order to get values other than 1 in the W element.
The default identity transformation will keep the transformed W at 1, hence no homogenous divide happens.
Last but not least, what's visible is everything inside the clip space volume. And with identity transformation, that's the volume [-1;1]³
I am reading a book about 3D concepts and OpenGL. The book always talks about world space, eye space, and so on.
What exactly is a world inside the computer monitor screen?
What is the world space?
What is eye space? Is it synonymous to projection?
World space
World space is the (arbitrarily chosen) frame of reference in which everything within the world is located in absolute coordinates.
Local space
Local space is space relative to another local frame of reference, in coordinates relative to the local frame.
For example, the mesh of a model will be constructed in relation to a coordinate system local to the model. When you move around the model in the world, the relative positions to each other of the points making up the model don't change. But they change within the world.
Hence there exists a model-to-world transformation from local to world space.
Eye (or view) space
Eye (or view) space is the world as seen by the viewer, i.e. all the positions of things in the world are no longer in relation to the (arbitrary) world coordinate system, but in relation to the viewer.
View space is somewhat special, because it not arbitrarily chosen. The coordinate (0, 0, 0) in view space is the position of the viewer and a certain direction (usually parallel to Z) is the direction of viewing.
So there exists a transformation world-to-view. Now because the viewer is always at the origin of the view space, setting the viewpoint is done by defining the world-to-view transformation.
Since for the purposes of rendering the graphics world space is of little use, you normally coalesce model-to-world and world-to-view transformations into a single model-to-view transformation.
Note that eye (or view) space is not the projection. Projection happens by a separate projection transform that transforms view-to-clip space.
You should read this: http://www.opengl-tutorial.org/beginners-tutorials/tutorial-3-matrices/
That tutorial uses term "camera space" instead of "eye space" but they are the same.
In opengl there is one world coordinate system with origin (0,0,0).
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move
objects in world coordinates, or do they move the camera? As you know, the same movement can be achieved by either moving objects or camera.
I am guessing that glTranslate, glRotate, change objects, and gluLookAt changes the camera?
In opengl there is one world coordinate system with origin (0,0,0).
Well, technically no.
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move objects in world coordinates, or do they move the camera?
Neither. OpenGL doesn't know objects, OpenGL doesn't know a camera, OpenGL doesn't know a world. All that OpenGL cares about are primitives, points, lines or triangles, per vertex attributes, normalized device coordinates (NDC) and a viewport, to which the NDC are mapped to.
When you tell OpenGL to draw a primitive, each vertex is processed according to its attributes. The position is one of the attributes and usually a vector with 1 to 4 scalar elements within local "object" coordinate system. The task at hand is to somehow transform the local vertex position attribute into a position on the viewport. In modern OpenGL this happens within a small program, running on the GPU, called a vertex shader. The vertex shader may process the position in an arbitrary way. But the usual approach is by applying a number of nonsingular, linear transformations.
Such transformations can be expressed in terms of homogenous transformation matrices. For a 3 dimensional vector, the homogenous representation in a vector with 4 elements, where the 4th element is 1.
In computer graphics a 3-fold transformation pipeline has become sort of the standard way of doing things. First the object local coordinates are transformed into coordinates relative to the virtual "eye", hence into eye space. In OpenGL this transformation used to be called the modelview transformaion. With the vertex positions in eye space several calculations, like illumination can be expressed in a generalized way, hence those calculations happen in eye space. Next the eye space coordinates are tranformed into the so called clip space. This transformation maps some volume in eye space to a specific volume with certain boundaries, to which the geometry is clipped. Since this transformation effectively applies a projection, in OpenGL this used to be called the projection transformation.
After clip space the positions get "normalized" by their homogenous component, yielding normalized device coordinates, which are then plainly mapped to the viewport.
To recapitulate:
A vertex position is transformed from local to clip space by
vpos_eye = MV · vpos_local
eyespace_calculations(vpos_eye);
vpos_clip = P · vpos_eye
·: inner product column on row vector
Then to reach NDC
vpos_ndc = vpos_clip / vpos_clip.w
and finally to the viewport (NDC coordinates are in the range [-1, 1]
vpos_viewport = (vpos_ndc + (1,1,1,1)) * (viewport.width, viewport.height) / 2 + (viewport.x, viewport.y)
*: vector component wise multiplication
The OpenGL functions glRotate, glTranslate, glScale, glMatrixMode merely manipulate the transformation matrices. OpenGL used to have four transformation matrices:
modelview
projection
texture
color
On which of them the matrix manipulation functions act on can be set using glMatrixMode. Each of the matrix manipulating functions composes a new matrix by multiplying the transformation matrix they describe on top of the select matrix thereby replacing it. The functions glLoadIdentity replace the current matrix with identity, glLoadMatrix replaces it with a user defined matrix, and glMultMatrix multiplies a user defined matrix on top of it.
So how does the modelview matrix then emulate both object placement and a camera. Well, as you already stated
As you know, the same movement can be achieved by either moving objects or camera.
You can not really discern between them. The usual approach is by splitting the object local to eye transformation into two steps:
Object to world – OpenGL calls this the "model transform"
World to eye – OpenGL calls this the "view transform"
Together they form the model-view, in fixed function OpenGL described by the modelview matrix. Now since the order of transformations is
local to world, Model matrix vpos_world = M · vpos_local
world to eye, View matrix vpos_eye = V · vpos_world
we can substitute by
vpos_eye = V · ( M · vpos_local ) = V · M · vpos_local
replacing V · M by the ModelView matrix =: MV
vpos_eye = MV · vpos_local
Thus you can see that what's V and what's M of the compund matrix M is only determined by the order of operations in which you multiply onto the modelview matrix, and at which step you decide to "call it the model transform from here on".
I.e. right after a
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
the view is defined. But at some point you'll start applying model transformations and everything after is model.
Note that in modern OpenGL all the matrix manipulation functions have been removed. OpenGL's matrix stack never was feature complete and no serious application did actually use it. Most programs just glLoadMatrix-ed their self calculated matrices and didn't bother with the OpenGL built-in matrix maniupulation routines.
And ever since shaders were introduced, the whole OpenGL matrix stack got awkward to use, to say it nicely.
The verdict: If you plan on using OpenGL the modern way, don't bother with the built-in functions. But keep in mind what I wrote, because what your shaders do will be very similar to what OpenGL's fixed function pipeline did.
OpenGL is a low-level API, there is no higher-level concepts like an "object" and a "camera" in the "scene", so there are only two matrix modes: MODELVIEW (a multiplication of "camera" matrix by the "object" transformation) and PROJECTION (the projective transformation from world-space to post-perspective space).
Distinction between "Model" and "View" (object and camera) matrices is up to you. glRotate/glTranslate functions just multiply the currently selected matrix by the given one (without even distinguishing between ModelView and Projection).
Those functions multiply (transform) the current matrix set by glMatrixMode() so it depends on the matrix you're working on. OpenGL has 4 different types of matrices; GL_MODELVIEW, GL_PROJECTION, GL_TEXTURE, and GL_COLOR, any one of those functions can change any of those matrices. So, basically, you don't transform objects you just manipulate different matrices to "fake" that effect.
Note that glulookat() is just a convenient function equivalent to a translation followed by some rotations, there's nothing special about it.
All transformations are transformations on objects. Even gluLookAt is just a transformation to transform the objects as if the camera was where you tell it to be. Technically they are transformations on the vertices, but that's just semantics.
That's true, glTranslate, glRotate change the object coordinates before rendering and gluLookAt changes the camera coordinate.
I've been writing a 2D basic game engine in OpenGL/C++ and learning everything as I go along. I'm still rather confused about defining vertices and their "position". That is, I'm still trying to understand the vertex-to-pixels conversion mechanism of OpenGL. Can it be explained briefly or can someone point to an article or something that'll explain this. Thanks!
This is rather basic knowledge that your favourite OpenGL learning resource should teach you as one of the first things. But anyway the standard OpenGL pipeline is as follows:
The vertex position is transformed from object-space (local to some object) into world-space (in respect to some global coordinate system). This transformation specifies where your object (to which the vertices belong) is located in the world
Now the world-space position is transformed into camera/view-space. This transformation is determined by the position and orientation of the virtual camera by which you see the scene. In OpenGL these two transformations are actually combined into one, the modelview matrix, which directly transforms your vertices from object-space to view-space.
Next the projection transformation is applied. Whereas the modelview transformation should consist only of affine transformations (rotation, translation, scaling), the projection transformation can be a perspective one, which basically distorts the objects to realize a real perspective view (with farther away objects being smaller). But in your case of a 2D view it will probably be an orthographic projection, that does nothing more than a translation and scaling. This transformation is represented in OpenGL by the projection matrix.
After these 3 (or 2) transformations (and then following perspective division by the w component, which actually realizes the perspective distortion, if any) what you have are normalized device coordinates. This means after these transformations the coordinates of the visible objects should be in the range [-1,1]. Everything outside this range is clipped away.
In a final step the viewport transformation is applied and the coordinates are transformed from the [-1,1] range into the [0,w]x[0,h]x[0,1] cube (assuming a glViewport(0, w, 0, h) call), which are the vertex' final positions in the framebuffer and therefore its pixel coordinates.
When using a vertex shader, steps 1 to 3 are actually done in the shader and can therefore be done in any way you like, but usually one conforms to this standard modelview -> projection pipeline, too.
The main thing to keep in mind is, that after the modelview and projection transforms every vertex with coordinates outside the [-1,1] range will be clipped away. So the [-1,1]-box determines your visible scene after these two transformations.
So from your question I assume you want to use a 2D coordinate system with units of pixels for your vertex coordinates and transformations? In this case this is best done by using glOrtho(0.0, w, 0.0, h, -1.0, 1.0) with w and h being the dimensions of your viewport. This basically counters the viewport transformation and therefore transforms your vertices from the [0,w]x[0,h]x[-1,1]-box into the [-1,1]-box, which the viewport transformation then transforms back to the [0,w]x[0,h]x[0,1]-box.
These have been quite general explanations without mentioning that the actual transformations are done by matrix-vector-multiplications and without talking about homogenous coordinates, but they should have explained the essentials. This documentation of gluProject might also give you some insight, as it actually models the transformation pipeline for a single vertex. But in this documentation they actually forgot to mention the division by the w component (v" = v' / v'(3)) after the v' = P x M x v step.
EDIT: Don't forget to look at the first link in epatel's answer, which explains the transformation pipeline a bit more practical and detailed.
It is called transformation.
Vertices are set in 3D coordinates which is transformed into a viewport coordinates (into your window view). This transformation can be set in various ways. Orthogonal transformation can be easiest to understand as a starter.
http://www.songho.ca/opengl/gl_transform.html
http://www.opengl.org/wiki/Vertex_Transformation
http://www.falloutsoftware.com/tutorials/gl/gl5.htm
Firstly be aware that OpenGL not uses standard pixel coordinates. I mean by that for particular resolution, ie. 800x600 you dont have horizontal coordinates in range 0-799 or 1-800 stepped by one. You rather have coordinates ranged from -1 to 1 later send to graphic card rasterizing unit and after that matched to particular resolution.
I ommited one step here - before all that you have an ModelViewProjection matrix (or viewProjection matrix in some simple cases) which before all that will cast coordinates you use to an projection plane. Default use of that is to implement a camera which converts 3D space of world (View for placing an camera into right position and Projection for casting 3d coordinates into screen plane. In ModelViewProjection it's also step of placing a model into right place in world).
Another case (and you can use Projection matrix this way to achieve what you want) is to use these matrixes to convert one range of resolutions to another.
And there's a trick you will need. You should read about modelViewProjection matrix and camera in openGL if you want to go serious. But for now I will tell you that with proper matrix you can just cast your own coordinate system (and ie. use ranges 0-799 horizontaly and 0-599 verticaly) to standarized -1:1 range. That way you will not see that underlying openGL api uses his own -1 to 1 system.
The easiest way to achieve this is glOrtho function. Here's the link to documentation:
http://www.opengl.org/sdk/docs/man/xhtml/glOrtho.xml
This is example of proper usage:
glMatrixMode (GL_PROJECTION)
glLoadIdentity ();
glOrtho (0, 800, 600, 0, 0, 1)
glMatrixMode (GL_MODELVIEW)
Now you can use own modelView matrix ie. for translation (moving) objects but don't touch your projection example. This code should be executed before any drawing commands. (Can be after initializing opengl in fact if you wont use 3d graphics).
And here's working example: http://nehe.gamedev.net/tutorial/2d_texture_font/18002/
Just draw your figures instead of drawing text. And there is another thing - glPushMatrix and glPopMatrix for choosen matrix (in this example projection matrix) - you wont use that until you combining 3d with 2d rendering.
And you can still use model matrix (ie. for placing tiles somewhere in world) and view matrix (in example for zooming view, or scrolling through world - in this case your world can be larger than resolution and you could crop view by simple translations)
After looking at my answer I see it's a little chaotic but If you confused - just read about Model, View, and Projection matixes and try example with glOrtho. If you're still confused feel free to ask.
MSDN has a great explanation. It may be in terms of DirectX but OpenGL is more-or-less the same.
Google for "opengl rendering pipeline". The first five articles all provide good expositions.
The key transition from vertices to pixels (actually, fragments, but you won't be too far off if you think "pixels") is in the rasterization stage, which occurs after all vertices have been transformed from world-coordinates to screen coordinates and clipped.
Could someone explain to me what are the up front and right vectors of an object and how are they used ?
Are you referring to how vectors in Object or Model space are used? Each object or model has its own coordinate space. This is necessary since the points in the model will be relative to the models origin. This makes it possible to work with arbitrary models in larger worlds. You would perform certain operations on the model (like Rotation) before moving the model in the World (translation). If I understand your question correctly, you are referring to a set of vectors that define the models position in the world. These up, front and right vectors would be what you would use to possibly determine which way the model was facing or moving.
I hope this helps if anything to formulate your question a bit more.
This Gamedev question might be of help glMultMatrix, how does it work?
Those vectors usually refer to world-space transformations of the local body axes of the model in question.
Usually a model is defined with respect to some local coordinate system whose origin is at the center of mass, centroid, or some other convenient location from which to construct the object's geometry. This local coordinate system has its own x, y, and z axes with x = [1, 0, 0]', y = [0, 1, 0]', and z = [0, 0, 1]'. The coordinates of each vertex in the model are then defined with respect to this local frame. Usually the origin is chosen so that the "forward" direction of the model is aligned with this local x, the "left" direction is aligned with local y, and "up" is aligned with local z (though any right-handed system will do.
The model is placed into the world via the modelview matrix in OpenGL. When the model's vertices are sent to the GPU, they are transformed from their local space (aka "object" space or "model space" or "body space") to world space by multiplying them by the modelview matrix. Ignoring scaling, the upper left 3x3 block in the modelview matrix is an orthonormal rotation matrix that defines the projection of the body axes into the world frame, assuming the model is placed at the world origin. The modelview matrix is augmented into a 4x4 by adding the translation between the model and world origins in the upper right 3x1 block of the modelview matrix.