OpenGL ModelView Confusion - opengl

I am using opengl 2.0 with the fixed function pipeline. It seems that in opengl 2.0
they push the vertices through the model-view stack which is basically (view matrix * model matrix), in which the model matrix doesn't provide any transformation really it brings an object say a cube to be centered at (0,0,0) if the model view matrix has a identity matrix loaded.Also the camera it self would be located at (0,0,0) looking down the negative z axis.
So if I use a translate call with the cube I am I really moving the cube in Eye space ?
From what I learned the generalized viewing pipeline is
Vertices -> Modelling Matrix -> World Space, Objects in World Space -> Viewing Matrix -> Eye Space, Eye Space Objects -> Projection Matrix -> Clipping Space , Then normalization ect
So If I switch to the
model view matrix stack()
loadidentity ()
gltranslate ( up 5 units in the negative z direction)
gldrawcube()
it would move the cube from center of the eye space according to the translation ?
I think my confusion is that I don't know what is loaded into the model view matrix stack when the program starts, I assume it is an identity matrix that brings everything to the centre of the eye space.

In a newly created OpenGL context all matrices are identity, i.e. vectors go through untransformed. In fixed function OpenGL vertex transformation skips the "world" step, collapsing object→world and world→eye into a single transformation. This is no big deal however. Lighting calculations are easiest in eye space anyway. And since fixed function OpenGL doesn't know shaders (except as extension), there's no need to do things in world space.
glTranslate, glRotate, glScale don't transform objects. They manipulate the matrix on top of the stack active for manipulation. So ultimately they contribute to the transformation, but not on the object, but the vertex (position) level.
it would move the cube from center of the eye space according to the translation?
Indeed, but what's "moved" (actually transformed) are the cube's vertices; and it may be not just a translation.
EDIT due to comment
The key thing to understand is transformation composition. First and foremost a transformation is a mapping
T: R^4 -> R^4, v' = v |-> T(v)
There's a subset of transformation, namely the linear transformations which can be represented by matrix multiplication:
v' = T * v
one can concatenate transformations, i.e. v = v |-> T'○T (v) again for the subset of linear transformations, written in matrix form you can expand this to
v' = T * v
v'' = T' * v'
=>
v'' = T' * T * v
Now let V denote the viewing transform and W the world transform. So the total transform is
M = V * W
The order of matrix multiplication matters (i.e. matrix multiplication is not commutative):
∃ M, N ∊ {Matrices}: M * N ≠ N * M
The view transform V is the transform of the whole world so that it is moved in a way, that the camera in the world ends up being at the origin, viewing down the negative Z axis. So let V' be the transform that moves "the camera" from the origin at it's place in the world, the inverse of that movement moves the world in a way that the camera comes to rest at the origin. So
V = inv(V')
Last but not least given some matrices A, B, C for which holds
A = B * C
then
inv(A) = inv(C) * inv(B)
i.e. the order of operations reversed. So if you "position" your "camera" using inverse OpenGL matrix operations, the order of the operations must be reversed. And since the overall order of operation matters the viewing transformations must happen before the model transformations.

Related

how do I maintain relative transformation b/w 2 objects after changing transformation of any one without a scenegraph?

Say I have 2 objects, a camera and a cube, both on XZ plane, the cube has some arbitrary rotation, and camera is facing the cube.
now if a transformation R is applied to the camera such that it has a new rotation and position.
I want to move the cube in front of the camera using transformation R1, such that in the view it looks exactly as before R was applied, meaning relative distance, rotation and scale b/w the 2 objects remain same after both R and R1.
Following image gives a gist of the problem.
Assume that there's no scenegraph that we can use.
I've posed the problem mainly in 2D but I'm trying to solve it in 3D, so rotations can have all yaw, pitch and roll, and translations can be anywhere in 3D space.
EDIT:
I forgot to add what I have done so far.
I figured out how to maintain relative distance b/w camera and cube, I can project cube's position to get world to screen point, then unproject the screen point in new camera position to get new world position.
However for rotation, I have tried this
I thought I can apply same rotation as R in R1, this didn't work, it appears to work if rotation happens only in one axis, if rotation happens in more than one axes, it does not work.
I thought I can take delta rotation b/w camera and cube, and simply apply camera's rotation to the cube and then multiply delta rotation, this also didn't work
Let M and V be the model and view matrices before you move the camera, M2 and V2 be the matrices after you move the camera. To be clear: model matrix transforms the coordinates from object local coordinates into world coordinates; a view matrix transforms from world coordinates into clip-space camera coordinates. Consequently V*M*p transforms the position p into clip-space.
For the position on the screen to stay constant, we need V*M*p = V2*M2*p to be true for all p (that's assuming that the FOV doesn't change). Therefore V*M = V2*M2, or
M2 = inverse(V2)*V*M
If you apply the camera transformation on the right (V2 = V*R) then the above expression for M2 can be simplified:
M2 = inverse(R)*M
(that is you apply inverse(R) on the left of the model matrix to compensate).
Alternatively, ask yourself if you really need to keep the object coordinates in the world reference frame. It may be easier to not to apply the view matrix when rendering that object at all; that would effectively keep it relative to the camera at all times without any additional tweaks. That would have better numerical stability too.

What is the role of gl_Position.w in Vulkan?

Variable gl_Position output from a GLSL vertex shader must have 4 coordinates. In OpenGL, it seems w coordinate is used to scale the vector, by dividing the other coordinates by it. What is the purpose of w in Vulkan?
Shaders and projections in Vulkan behave exactly the same as in OpenGL. There are small differences in depth ranges ([-1, 1] in OpenGL, [0, 1] in Vulkan) or in the origin of the coordinate system (lower-left in OpenGL, upper-left in Vulkan), but the principles are exactly the same. The hardware is still the same and it performs calculations in the same way both in OpenGL and in Vulkan.
4-component vectors serve multiple purposes:
Different transformations (translation, rotation, scaling) can be
represented in the same way, with 4x4 matrices.
Projection can also be represented with a 4x4 matrix.
Multiple transformations can be combined into one 4x4 matrix.
The .w component You mention is used during perspective projection.
All this we can do with 4x4 matrices and thus we need 4-component vectors (so they can be multiplied by 4x4 matrices). Again, I write about this because the above rules apply both to OpenGL and to Vulkan.
So for purpose of the .w component of the gl_Position variable - it is exactly the same in Vulkan. It is used to scale the position vector - during perspective calculations (projection matrix multiplication) original depth is modified by the original .w component and stored in the .z component of the gl_Position variable. And additionally, original depth is also stored in the .w component. After that (as a fixed-function step) hardware performs perspective division and divides position stored in the gl_Position variable by its .w component.
In orthographic projection steps performed by the hardware are exactly the same, but values used for calculations are different. So the perspective division step is still performed by the hardware but it does nothing (position is dived by 1.0).
gl_Position is a Homogeneous coordinates. The w component plays a role at perspective projection.
The projection matrix describes the mapping from 3D points of the view on a scene, to 2D points on the viewport. It transforms from eye space to the clip space, and the coordinates in the clip space are transformed to the normalized device coordinates (NDC) by dividing with the w component of the clip coordinates (Perspective divide).
At Perspective Projection the projection matrix describes the mapping from 3D points in the world as they are seen from of a pinhole camera, to 2D points of the viewport. The eye space coordinates in the camera frustum (a truncated pyramid) are mapped to a cube (the normalized device coordinates).
Perspective Projection Matrix:
r = right, l = left, b = bottom, t = top, n = near, f = far
2*n/(r-l) 0 0 0
0 2*n/(t-b) 0 0
(r+l)/(r-l) (t+b)/(t-b) -(f+n)/(f-n) -1
0 0 -2*f*n/(f-n) 0
When a Cartesian coordinate in view space is transformed by the perspective projection matrix, then the the result is a Homogeneous coordinates. The w component grows with the distance to the point of view. This cause that the objects become smaller after the Perspective divide, if they are further away.
In computer graphics, transformations are represented with matrices. If you want something to rotate, you multiply all its vertices (a vector) by a rotation matrix. Want it to move? Multiply by translation matrix, etc.
tl;dr: You can't describe translation along the z-axis with 3D matrices and vectors. You need at least 1 more dimension, so they just added a dummy dimension w. But things break if it's not 1, so keep it at 1 :P.
Anyway, now we begin with a quick review on matrix multiplication:
You basically put x above a, y above b, z above c. Multiply the whole column by the variable you just moved, and sum up everything in the row.
So if you were to translate a vector, you'd want something like:
See how x and y is now translated by az and bz? That's pretty awkward though:
You'd have to account for how big z is whenever you move things (what if z was negative? You'd have to move in opposite directions. That's cumbersome as hell if you just want to move something an inch over...)
You can't move along the z axis. You'll never be able to fly or go underground
But, if you can make sure z = 1 at all times:
Now it's much clearer that this matrix allows you to move in the x-y plane by a, and b amounts. Only problem is that you're conceptually levitating all the time, and you still can't go up or down. You can only move in 2D.
But you see a pattern here? With 3D matrices and 3D vectors, you can describe all the fundamental movements in 2D. So what if we added a 4th dimension?
Looks familiar. If we keep w = 1 at all times:
There we go, now you get translation along all 3 axis. This is what's called homogeneous coordinates.
But what if you were doing some big & complicated transformation, resulting in w != 1, and there's no way around it? OpenGL (and basically any other CG system I think) will do what's called normalization: divide the resultant vector by the w component. I don't know enough to say exactly why ('cause scaling is a linear transformation?), but it has favorable implications (can be used in perspective transforms). Anyway, the translation matrix would actually look like:
And there you go, see how each component is shrunken by w, then it's translated? That's why w controls scaling.

Confused about OpenGL transformations

In opengl there is one world coordinate system with origin (0,0,0).
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move
objects in world coordinates, or do they move the camera? As you know, the same movement can be achieved by either moving objects or camera.
I am guessing that glTranslate, glRotate, change objects, and gluLookAt changes the camera?
In opengl there is one world coordinate system with origin (0,0,0).
Well, technically no.
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move objects in world coordinates, or do they move the camera?
Neither. OpenGL doesn't know objects, OpenGL doesn't know a camera, OpenGL doesn't know a world. All that OpenGL cares about are primitives, points, lines or triangles, per vertex attributes, normalized device coordinates (NDC) and a viewport, to which the NDC are mapped to.
When you tell OpenGL to draw a primitive, each vertex is processed according to its attributes. The position is one of the attributes and usually a vector with 1 to 4 scalar elements within local "object" coordinate system. The task at hand is to somehow transform the local vertex position attribute into a position on the viewport. In modern OpenGL this happens within a small program, running on the GPU, called a vertex shader. The vertex shader may process the position in an arbitrary way. But the usual approach is by applying a number of nonsingular, linear transformations.
Such transformations can be expressed in terms of homogenous transformation matrices. For a 3 dimensional vector, the homogenous representation in a vector with 4 elements, where the 4th element is 1.
In computer graphics a 3-fold transformation pipeline has become sort of the standard way of doing things. First the object local coordinates are transformed into coordinates relative to the virtual "eye", hence into eye space. In OpenGL this transformation used to be called the modelview transformaion. With the vertex positions in eye space several calculations, like illumination can be expressed in a generalized way, hence those calculations happen in eye space. Next the eye space coordinates are tranformed into the so called clip space. This transformation maps some volume in eye space to a specific volume with certain boundaries, to which the geometry is clipped. Since this transformation effectively applies a projection, in OpenGL this used to be called the projection transformation.
After clip space the positions get "normalized" by their homogenous component, yielding normalized device coordinates, which are then plainly mapped to the viewport.
To recapitulate:
A vertex position is transformed from local to clip space by
vpos_eye = MV · vpos_local
eyespace_calculations(vpos_eye);
vpos_clip = P · vpos_eye
·: inner product column on row vector
Then to reach NDC
vpos_ndc = vpos_clip / vpos_clip.w
and finally to the viewport (NDC coordinates are in the range [-1, 1]
vpos_viewport = (vpos_ndc + (1,1,1,1)) * (viewport.width, viewport.height) / 2 + (viewport.x, viewport.y)
*: vector component wise multiplication
The OpenGL functions glRotate, glTranslate, glScale, glMatrixMode merely manipulate the transformation matrices. OpenGL used to have four transformation matrices:
modelview
projection
texture
color
On which of them the matrix manipulation functions act on can be set using glMatrixMode. Each of the matrix manipulating functions composes a new matrix by multiplying the transformation matrix they describe on top of the select matrix thereby replacing it. The functions glLoadIdentity replace the current matrix with identity, glLoadMatrix replaces it with a user defined matrix, and glMultMatrix multiplies a user defined matrix on top of it.
So how does the modelview matrix then emulate both object placement and a camera. Well, as you already stated
As you know, the same movement can be achieved by either moving objects or camera.
You can not really discern between them. The usual approach is by splitting the object local to eye transformation into two steps:
Object to world – OpenGL calls this the "model transform"
World to eye – OpenGL calls this the "view transform"
Together they form the model-view, in fixed function OpenGL described by the modelview matrix. Now since the order of transformations is
local to world, Model matrix vpos_world = M · vpos_local
world to eye, View matrix vpos_eye = V · vpos_world
we can substitute by
vpos_eye = V · ( M · vpos_local ) = V · M · vpos_local
replacing V · M by the ModelView matrix =: MV
vpos_eye = MV · vpos_local
Thus you can see that what's V and what's M of the compund matrix M is only determined by the order of operations in which you multiply onto the modelview matrix, and at which step you decide to "call it the model transform from here on".
I.e. right after a
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
the view is defined. But at some point you'll start applying model transformations and everything after is model.
Note that in modern OpenGL all the matrix manipulation functions have been removed. OpenGL's matrix stack never was feature complete and no serious application did actually use it. Most programs just glLoadMatrix-ed their self calculated matrices and didn't bother with the OpenGL built-in matrix maniupulation routines.
And ever since shaders were introduced, the whole OpenGL matrix stack got awkward to use, to say it nicely.
The verdict: If you plan on using OpenGL the modern way, don't bother with the built-in functions. But keep in mind what I wrote, because what your shaders do will be very similar to what OpenGL's fixed function pipeline did.
OpenGL is a low-level API, there is no higher-level concepts like an "object" and a "camera" in the "scene", so there are only two matrix modes: MODELVIEW (a multiplication of "camera" matrix by the "object" transformation) and PROJECTION (the projective transformation from world-space to post-perspective space).
Distinction between "Model" and "View" (object and camera) matrices is up to you. glRotate/glTranslate functions just multiply the currently selected matrix by the given one (without even distinguishing between ModelView and Projection).
Those functions multiply (transform) the current matrix set by glMatrixMode() so it depends on the matrix you're working on. OpenGL has 4 different types of matrices; GL_MODELVIEW, GL_PROJECTION, GL_TEXTURE, and GL_COLOR, any one of those functions can change any of those matrices. So, basically, you don't transform objects you just manipulate different matrices to "fake" that effect.
Note that glulookat() is just a convenient function equivalent to a translation followed by some rotations, there's nothing special about it.
All transformations are transformations on objects. Even gluLookAt is just a transformation to transform the objects as if the camera was where you tell it to be. Technically they are transformations on the vertices, but that's just semantics.
That's true, glTranslate, glRotate change the object coordinates before rendering and gluLookAt changes the camera coordinate.

Find final world coordinates from model matrix or quaternion

I am displaying an object in OpenGL using a model matrix for the object, which I build from my pre-stored object location AND a quaternion applied to the rotation. I need to find the final cartesian coordinates in 3D of my object AFTER the rotations and transformations applied (the coordinates that the object appears at on the screen). How can I get the plain coordinates?
If I understand correctly, you have an object; if you rendered it without applying any transformation, its center would be at [0,0,0].
You have a point, [a,b,c], in 3D space. You apply a translation to the modelview matrix. Now, if you rendered the object, its center would be at [a,b,c] in world space coordinates.
You have a quaternion, [qw,qx,qy,qz]. You create a rotation matrix, M, from this and apply it to the modelview matrix. Now you want to know the new coordinates, [a',b',c'], of the object's center in world space.
If this is true, then the easiest way is to just do the matrix multiplication yourself:
a' = m11*a + m12*b + m13*c
b' = m21*a + m22*b + m23*c
c' = m31*a + m32*b + m33*c
where
[m11 m12 m13]
M = [m21 m22 m23]
[m31 m32 m33]
But perhaps you're not actually building M. Another way would be to use the quaternion directly, although that essentially involves building the rotation matrix and then using it.
There should be no need to actually use gluProject. When you apply the rotation to the modelview matrix, the matrix multiply is done there. So you could just get the values from the matrix itself:
double mv[16];
glGetDoublev(GL_MODELVIEW_MATRIX,mv);
a' = mv[13];
b' = mv[14];
c' = mv[15];
This tells you where the modelview matrix is moving the model's origin.
Re-implement gluProject() and apply everything but the viewport transform.

Translating a Quaternion

(perhaps this is better for a math Stack Exchange?)
I have a chain composed of bones. Each bone has a with a tip and tail. The following code computes where its tip will be, given a rotation, and sets the next link in the chain's position appropriately:
// Quaternion is a hand-rolled class that works correctly (as far as I can tell.)
Quaternion quat = new Quaternion(getRotationAngleDegrees(), getRotation());
// figure out where the tip will be after applying the rotation
Vector3f rotatedTip = quat.applyRotationTo(tip);
// set the next bone's tail to be at this one's tip
updateNextPosFrom(rotatedTip);
This works if the rotation is supposed to occur around the origin of the object's coordinate system. But what if I want the rotation to occur around some other arbitrary point in the object? I'm not sure how to translate the quaternion. What is the best way to do it?
(I'm using JOGL / OpenGL.)
Dual quaternions are useful for expressing rigid spatial transformations (combined rotations and translations.)
Based on dual numbers (one of the Clifford algebras, d = a + e b where a, b are real and e is unequal to zero but e^2 = 0), dual quaternions, U + e V, can represent lines in space with U the unit direction quaternion and V the moment about a reference point. In this way, dual quaternion lines are very much like Pluecker lines.
While the quaternion transform Q V Q* (Q* is the quaternion conjugate of Q) is used to rotate a unit vector quaternion V about a point, a similar dual quaternion form can be used to apply to line a screw transform (the rigid rotation about an axis combined with a translation along the axis.)
Just as any rigid 2D transform can be resolved to a rotation about a point, any rigid 3D transform can be resolved to a screw.
For such power and expressiveness, dual quaternion references are thin, and the Wikipedia article is as good a place as any to start.
A quaternion is used specifically to handle a rotation factor, but does not include a translation at all.
Typically, in this situation, you'll want to apply a rotation to a point based on the "bone's" length, but centered at the origin. You can then translate post-rotation to the proper location in space.
Quaternions are generally used to represent rotations only; they cannot represent translations as well.
You need to convert your quaternion into a rotation matrix, insert it into the appropriate part of your standard OpenGL 4x4 matrix, and combine it with a translation in order to rotate about an arbitrary point.
4x4 rotation matrix:
[ r r r 0 ]
[ r r r 0 ] <- the r's are the 3x3 rotation matrix from the wiki article
[ r r r 0 ]
[ 0 0 0 1 ]
The Wikipedia page on forward kinematics points to this paper: Introduction to Homogeneous Transformations & Robot Kinematics.
Edit : This answer is wrong. It argues on 4x4 transformation matrices properties, which are not quaternions...
I might have got it wrong but to me (unlike some answers) a quaternion is indeed a tool to handle rotations and translations (and more). It is a 4x4 matrix where the last column represents the translation. Using matrix algebra, replace the 3-vector (x, y, z) by the 4-vector (x, y, z, 1) and compute the transformed vector by the matrix. You will find that values of the last column of the matrix will be added to the coordinates x, y, z of the original vector, as in a translation.
A 3x3 matrix for a 3D space represents a linear transformation (like rotation around the origin). You cannot use a 3x3 matrix for an affine transformation like a translation. So I understand simply the quaternions as a little "trick" to represent more kinds of transformations using matrix algebra. The trick is to add a fourth coordinate equal to 1 and to use 4x4 matrices. Because matrix algebra remains valid, you can combine space transformations by multiplying the matrices, which is indeed powerful.