orientation in openGl - opengl

Could someone explain to me what are the up front and right vectors of an object and how are they used ?

Are you referring to how vectors in Object or Model space are used? Each object or model has its own coordinate space. This is necessary since the points in the model will be relative to the models origin. This makes it possible to work with arbitrary models in larger worlds. You would perform certain operations on the model (like Rotation) before moving the model in the World (translation). If I understand your question correctly, you are referring to a set of vectors that define the models position in the world. These up, front and right vectors would be what you would use to possibly determine which way the model was facing or moving.
I hope this helps if anything to formulate your question a bit more.
This Gamedev question might be of help glMultMatrix, how does it work?

Those vectors usually refer to world-space transformations of the local body axes of the model in question.
Usually a model is defined with respect to some local coordinate system whose origin is at the center of mass, centroid, or some other convenient location from which to construct the object's geometry. This local coordinate system has its own x, y, and z axes with x = [1, 0, 0]', y = [0, 1, 0]', and z = [0, 0, 1]'. The coordinates of each vertex in the model are then defined with respect to this local frame. Usually the origin is chosen so that the "forward" direction of the model is aligned with this local x, the "left" direction is aligned with local y, and "up" is aligned with local z (though any right-handed system will do.
The model is placed into the world via the modelview matrix in OpenGL. When the model's vertices are sent to the GPU, they are transformed from their local space (aka "object" space or "model space" or "body space") to world space by multiplying them by the modelview matrix. Ignoring scaling, the upper left 3x3 block in the modelview matrix is an orthonormal rotation matrix that defines the projection of the body axes into the world frame, assuming the model is placed at the world origin. The modelview matrix is augmented into a 4x4 by adding the translation between the model and world origins in the upper right 3x1 block of the modelview matrix.

Related

how do I maintain relative transformation b/w 2 objects after changing transformation of any one without a scenegraph?

Say I have 2 objects, a camera and a cube, both on XZ plane, the cube has some arbitrary rotation, and camera is facing the cube.
now if a transformation R is applied to the camera such that it has a new rotation and position.
I want to move the cube in front of the camera using transformation R1, such that in the view it looks exactly as before R was applied, meaning relative distance, rotation and scale b/w the 2 objects remain same after both R and R1.
Following image gives a gist of the problem.
Assume that there's no scenegraph that we can use.
I've posed the problem mainly in 2D but I'm trying to solve it in 3D, so rotations can have all yaw, pitch and roll, and translations can be anywhere in 3D space.
EDIT:
I forgot to add what I have done so far.
I figured out how to maintain relative distance b/w camera and cube, I can project cube's position to get world to screen point, then unproject the screen point in new camera position to get new world position.
However for rotation, I have tried this
I thought I can apply same rotation as R in R1, this didn't work, it appears to work if rotation happens only in one axis, if rotation happens in more than one axes, it does not work.
I thought I can take delta rotation b/w camera and cube, and simply apply camera's rotation to the cube and then multiply delta rotation, this also didn't work
Let M and V be the model and view matrices before you move the camera, M2 and V2 be the matrices after you move the camera. To be clear: model matrix transforms the coordinates from object local coordinates into world coordinates; a view matrix transforms from world coordinates into clip-space camera coordinates. Consequently V*M*p transforms the position p into clip-space.
For the position on the screen to stay constant, we need V*M*p = V2*M2*p to be true for all p (that's assuming that the FOV doesn't change). Therefore V*M = V2*M2, or
M2 = inverse(V2)*V*M
If you apply the camera transformation on the right (V2 = V*R) then the above expression for M2 can be simplified:
M2 = inverse(R)*M
(that is you apply inverse(R) on the left of the model matrix to compensate).
Alternatively, ask yourself if you really need to keep the object coordinates in the world reference frame. It may be easier to not to apply the view matrix when rendering that object at all; that would effectively keep it relative to the camera at all times without any additional tweaks. That would have better numerical stability too.

OpenGL understand two viewpoint to see model transformation

There are two viewpoints to understand model transformations,I read from Red book 7th edition,[ Grand, Fixed Coordinate System ] and [ Moving a Local Coordinate System ].
My Question is:
What is the difference between the two viewpoints and when to use them for a specified situation ?
additional context info:
I'd like to give some context for you to help me,Or you can just ignore below details.
I understood these two viewpoints in the following way.Think I have following code:
(functions like glTranslatef are deprecated,replaced by math library ,but theory may keep helpful.)
//render the sence,and use orthogonal projection
void display( void )
{
glClear(GL_COLOR_BUFFER_BIT|GL_DEPTH_BUFFER_BIT);
glLoadIdentity();
drawAixs(4.8f);//draw x y z aixs,4.8 is axis length
glRotatef(45.0,0.0,0.0,1.0);
glTranslatef(3.0,0.0,0.0);
glutSolidCube(2.0);
glutSwapBuffers();
}
From the local coordinate view :
In this viewpoint,we can understand it like :
And the current transformation matrix(CTM) is:
From global fixed coordinate view :
In this viewpoint,we can get :
The two coordinate systems are just a convention. There is always one global coordinate system, but there may be many local systems. The notion of a local system is only a convention. A general transformation matrix M transforms vertices from one space to another:
v' = M * v
So that v is in one coordinate space and v' is in another. In case M describes position of an object, we say that it is putting vertices from local object coordinate frame to the global world coordinate frame. On the other hand, if it describes camera position and projection, it puts vertices from the world coordinate frame to another local eye coordinate frame.
A complex objects with joints (such as humanoid characters, or mechanisms with hinges) may actually have several intermediate local spaces, where the transformations are chained, depending on the structure of the skeleton of the object.
Usually, one uses object space where the vertices of the models are defined, world space which is a global space where the coordinates can be related to each other, and camera space or eye space which is space of screen coordinates. But with the introduction of OpenGL 3 shaders, these are completely arbitrary: you can write your shader so that it uses a single matrix that transforms the vertices from object space directly to screen space. So do not worry about coordinate frames, just focus on the task at hand - what it is you want to display and how the objects should be moving (relative to each other or relative to some common reference).

Computer Vision: labelling camera pose

I am trying to create a dataset of images of objects at different poses, where each image is annotated with camera pose (or object pose).
If, for example, I have a world coordinate system and I place the object of interest at the origin and place the camera at a known position (x,y,z) and make it face the origin. Given this information, how can I calculate the pose (rotation matrix) for the camera or for the object.
I had one idea, which was to have a reference coordinate i.e. (0,0,z') where I can define the rotation of the object. i.e. its tilt, pitch and yaw. Then I can calculate the rotation from (0,0,z') and (x,y,z) to give me a rotation matrix. The problem is, how to now combine the two rotation matrices?
BTW, I know the world position of the camera as I am rendering these with OpenGL from a CAD model as opposed to physically moving a camera around.
The homography matrix maps between homogeneous screen coordinates (i,j) to homogeneous world coordinates (x,y,z).
homogeneous coordinates are normal coordinates with a 1 appended. So (3,4) in screen coordinates is (3,4,1) as homogeneous screen coordinates.
If you have a set of homogeneous screen coordinates, S and their associated homogeneous world locations, W. The 4x4 homography matrix satisfies
S * H = transpose(W)
So it boils down to finding several features in world coordinates you can also identify the i,j position in screen coordinates, then doing a "best fit" homography matrix (openCV has a function findHomography)
Whilst knowing the camera's xyz provides helpful info, its not enough to fully constrain the equation and you will have to generate more screen-world pairs anyway. Thus I don't think its worth your time integrating the cameras position into the mix.
I have done a similar experiment here: http://edinburghhacklab.com/2012/05/optical-localization-to-0-1mm-no-problemo/

Confused about OpenGL transformations

In opengl there is one world coordinate system with origin (0,0,0).
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move
objects in world coordinates, or do they move the camera? As you know, the same movement can be achieved by either moving objects or camera.
I am guessing that glTranslate, glRotate, change objects, and gluLookAt changes the camera?
In opengl there is one world coordinate system with origin (0,0,0).
Well, technically no.
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move objects in world coordinates, or do they move the camera?
Neither. OpenGL doesn't know objects, OpenGL doesn't know a camera, OpenGL doesn't know a world. All that OpenGL cares about are primitives, points, lines or triangles, per vertex attributes, normalized device coordinates (NDC) and a viewport, to which the NDC are mapped to.
When you tell OpenGL to draw a primitive, each vertex is processed according to its attributes. The position is one of the attributes and usually a vector with 1 to 4 scalar elements within local "object" coordinate system. The task at hand is to somehow transform the local vertex position attribute into a position on the viewport. In modern OpenGL this happens within a small program, running on the GPU, called a vertex shader. The vertex shader may process the position in an arbitrary way. But the usual approach is by applying a number of nonsingular, linear transformations.
Such transformations can be expressed in terms of homogenous transformation matrices. For a 3 dimensional vector, the homogenous representation in a vector with 4 elements, where the 4th element is 1.
In computer graphics a 3-fold transformation pipeline has become sort of the standard way of doing things. First the object local coordinates are transformed into coordinates relative to the virtual "eye", hence into eye space. In OpenGL this transformation used to be called the modelview transformaion. With the vertex positions in eye space several calculations, like illumination can be expressed in a generalized way, hence those calculations happen in eye space. Next the eye space coordinates are tranformed into the so called clip space. This transformation maps some volume in eye space to a specific volume with certain boundaries, to which the geometry is clipped. Since this transformation effectively applies a projection, in OpenGL this used to be called the projection transformation.
After clip space the positions get "normalized" by their homogenous component, yielding normalized device coordinates, which are then plainly mapped to the viewport.
To recapitulate:
A vertex position is transformed from local to clip space by
vpos_eye = MV · vpos_local
eyespace_calculations(vpos_eye);
vpos_clip = P · vpos_eye
·: inner product column on row vector
Then to reach NDC
vpos_ndc = vpos_clip / vpos_clip.w
and finally to the viewport (NDC coordinates are in the range [-1, 1]
vpos_viewport = (vpos_ndc + (1,1,1,1)) * (viewport.width, viewport.height) / 2 + (viewport.x, viewport.y)
*: vector component wise multiplication
The OpenGL functions glRotate, glTranslate, glScale, glMatrixMode merely manipulate the transformation matrices. OpenGL used to have four transformation matrices:
modelview
projection
texture
color
On which of them the matrix manipulation functions act on can be set using glMatrixMode. Each of the matrix manipulating functions composes a new matrix by multiplying the transformation matrix they describe on top of the select matrix thereby replacing it. The functions glLoadIdentity replace the current matrix with identity, glLoadMatrix replaces it with a user defined matrix, and glMultMatrix multiplies a user defined matrix on top of it.
So how does the modelview matrix then emulate both object placement and a camera. Well, as you already stated
As you know, the same movement can be achieved by either moving objects or camera.
You can not really discern between them. The usual approach is by splitting the object local to eye transformation into two steps:
Object to world – OpenGL calls this the "model transform"
World to eye – OpenGL calls this the "view transform"
Together they form the model-view, in fixed function OpenGL described by the modelview matrix. Now since the order of transformations is
local to world, Model matrix vpos_world = M · vpos_local
world to eye, View matrix vpos_eye = V · vpos_world
we can substitute by
vpos_eye = V · ( M · vpos_local ) = V · M · vpos_local
replacing V · M by the ModelView matrix =: MV
vpos_eye = MV · vpos_local
Thus you can see that what's V and what's M of the compund matrix M is only determined by the order of operations in which you multiply onto the modelview matrix, and at which step you decide to "call it the model transform from here on".
I.e. right after a
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
the view is defined. But at some point you'll start applying model transformations and everything after is model.
Note that in modern OpenGL all the matrix manipulation functions have been removed. OpenGL's matrix stack never was feature complete and no serious application did actually use it. Most programs just glLoadMatrix-ed their self calculated matrices and didn't bother with the OpenGL built-in matrix maniupulation routines.
And ever since shaders were introduced, the whole OpenGL matrix stack got awkward to use, to say it nicely.
The verdict: If you plan on using OpenGL the modern way, don't bother with the built-in functions. But keep in mind what I wrote, because what your shaders do will be very similar to what OpenGL's fixed function pipeline did.
OpenGL is a low-level API, there is no higher-level concepts like an "object" and a "camera" in the "scene", so there are only two matrix modes: MODELVIEW (a multiplication of "camera" matrix by the "object" transformation) and PROJECTION (the projective transformation from world-space to post-perspective space).
Distinction between "Model" and "View" (object and camera) matrices is up to you. glRotate/glTranslate functions just multiply the currently selected matrix by the given one (without even distinguishing between ModelView and Projection).
Those functions multiply (transform) the current matrix set by glMatrixMode() so it depends on the matrix you're working on. OpenGL has 4 different types of matrices; GL_MODELVIEW, GL_PROJECTION, GL_TEXTURE, and GL_COLOR, any one of those functions can change any of those matrices. So, basically, you don't transform objects you just manipulate different matrices to "fake" that effect.
Note that glulookat() is just a convenient function equivalent to a translation followed by some rotations, there's nothing special about it.
All transformations are transformations on objects. Even gluLookAt is just a transformation to transform the objects as if the camera was where you tell it to be. Technically they are transformations on the vertices, but that's just semantics.
That's true, glTranslate, glRotate change the object coordinates before rendering and gluLookAt changes the camera coordinate.

Derive euler rotation from model view matrix

I'm implementing a widget tool kit wich requires some symbols to be aligned with the Y axis.
These symbols are drawn using the model view matrix of the parent objects, causing them to be rotated as well.
The solution would be to apply a counter-rotation before rendering those symbols, but I have to keep track of every rotation (in my case only on Z axis), in order to apply the correct counter-rotation. Sadly, every rotation "out of control" will cause a misalignment between the real model view matrix rotation and the "global" rotation kept.
How would it possible to derive eurler rotation angles directly from the model view matrix?
Just clear the rotational part (the upper left 3x3) of the modelview matrix to identity. This removes any rotation, yet retains the translation.