I am creating 3D/2D graphics engine, I wrote some fancy Vector2, Vector3 classes, Window wrapper and OpenGL context creation framework and since a while I was thinking how can I switch Coordinate System axes. By default in OpenGL it goes like that (as far as I know):
+X axis stands for Right | -X for Left
+Y axis stands for Up | -Y stands for Down
-Z stands for Forward | -Z stands for Backward
I really, really do not want to have coords like that, it just makes for me coords unreadable. So I thought about that UE4-style coordinates:
+X axis stands for Forward
+Y axis stands for Right
+Z axis stands for Up
How can I switch these axes?
I've read about "tweaking" perspective but there was only inverting axis not switching them.
And my second question is where I can learn some matrix operations (straightly for computer graphics)? I do not want to download ready-to-use source code or extensions. The main purpose i write this engine is to learn maths and the most important thing for me now are matrixes - projection matrices, rotation matrices, translation matrices, scale etc.
You are describing OpenGL's clip space coordinates and comparing it to UE4's world space coordinates. In general, clip space and world space do not need to have any relationship to each other whatsoever, so it does not really make sense to compare them.
All you have to do is create a view matrix which converts your coordinates from world space to camera space. This matrix is often combined with the conversion from model space to world space, and the matrix which converts from camera space to clip space. Combining all three matrices gives you the "modelviewprojection" matrix.
Your vertex shader will end up looking something like this:
// In model space
in vec3 Coords;
// Conversion from model space to clip space
uniform mat4 MVP;
void main() {
gl_Position = MVP * vec4(Coords, 1.0);
}
MVP will be made out of a combination of rotation, translation, and scale matrices that work together. The resulting matrix will probably look something like this:
MVP = projection matrix * rotation matrix * translation matrix * scale matrix
The rotation matrix is the sauce that gets your axes right. It will look something like this:
rotation matrix = rotate around Z axis by (-roll) * rotate around X axis by (pitch + pi/2) * rotate around Z axis by (yaw - pi/2)
I suggest that you use the GLM library, which provides the base vector and matrix types and operations in C++ that are provided for you in GLSL. This lets you do linear algebra fairly easily.
I do not recommend trying to write these functions yourself. It is mostly just a boring and repetitive programming task with lots of opportunities to make typos. Speaking from experience. Or, let me put it this way. Making your own pencils does not make you a better writer.
Related
I am building a camera class to look arround a scene. At the moment I have 3 cubes just spread arround to have a good impression of what is going on. I have set my scroll button on a mouse to give me translation along z-axis and when I move my mouse left or right I detect this movement and rotate arround y-axis. This is just to see what happens and play arround a bit. So I succeeded in making the camera rotate by rotating the cubes arround the origin but after I rotate by some angle, lets say 90 degrees, and try to translate along z axis to my surprise I find out that my cubes are now going from left to right and not towards me or away from me. So what is going on here? It seems that z axis is rotated also. I guess the same goes for x axis. So it seems that nothing actually moved in regard to the origin, but the whole coordinate system with all the objects was just rotated. Can anyone help me here, what is going on? How coordinate system works in opengl?
You are most likely confusing local and global rotations. Usual cheap remedy is to change(reverse) order of some of your transformation. However doing this blindly is trial&error and can be frustrating. Its better to understand the math first...
Old API OpeGL uses MVP matrix which is:
MVP = Model * View * Projection
Where Model and View are already multiplied together. What you have is most likely the same. Now the problem is that Model is direct matrix, but View is Inverse.
So if you have some transform matrix representing your camera in oder to use it to transform back you need to use its inverse...
MVP = Model * Inverse(Camera) * Projection
Then you can use the same order of transformations for both Model and Camera and also use their geometric properties like basis vectors etc ... then stuff like camera local movements or camera follow are easy. Beware some tutorials use glTranspose instead of real matrix Inverse. That is correct only if the Matrix contains only unit (or equal sized) orthogonal basis vectors without any offset so no scale,skew,offset or projections just rotation and equal scale along all axises !!!
That means when you rotate Model and View in the same way the result is opposite. So in old code there is usual to have something like this:
// view part of matrix
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glRotate3f(view_c,0,0,1); // ugly euler angles
glRotate3f(view_b,0,1,0); // ugly euler angles
glRotate3f(view_a,1,0,0); // ugly euler angles
glTranslatef(view_pos); // set camera position
// model part of matrix
for (i=0;i<objs;i++)
{
glMatrixMode(GL_MODELVIEW);
glPushMatrix();
glTranslatef(obj_pos[i]); // set camera position
glRotate3f(obj_a[i],1,0,0); // ugly euler angles
glRotate3f(obj_b[i],0,1,0); // ugly euler angles
glRotate3f(obj_c[i],0,0,1); // ugly euler angles
//here render obj[i]
glMatrixMode(GL_MODELVIEW);
glPopMatrix();
}
note the order of transforms is opposite (I just wrote it here in editor so its not tested and can be opposite to native GL notation ... I do not use Euler angles) ... The order must match your convention... To know more about these (including examples) not using useless Euler angles see:
Understanding 4x4 homogenous transform matrices
Here is 4D version of what your 3D camera class should look like (just shrink the matrices to 4x4 and have just 3 rotations instead of 6):
reper4D
pay attention to difference between local lrot_?? and global grot_?? functions. Also note rotations are defined by plane not axis vector as axis vector is just human abstraction that does not really work except 2D and 3D ... planes work from 2D to ND
PS. its a good idea to have the distortions (scale,skew) separated from model and keep transform matrices representing coordinate systems orthonormal. It will ease up a lot of things latter on once you got to do advanced math on them. Resulting in:
MVP = Model * Model_distortion * Inverse(Camera) * Projection
Variable gl_Position output from a GLSL vertex shader must have 4 coordinates. In OpenGL, it seems w coordinate is used to scale the vector, by dividing the other coordinates by it. What is the purpose of w in Vulkan?
Shaders and projections in Vulkan behave exactly the same as in OpenGL. There are small differences in depth ranges ([-1, 1] in OpenGL, [0, 1] in Vulkan) or in the origin of the coordinate system (lower-left in OpenGL, upper-left in Vulkan), but the principles are exactly the same. The hardware is still the same and it performs calculations in the same way both in OpenGL and in Vulkan.
4-component vectors serve multiple purposes:
Different transformations (translation, rotation, scaling) can be
represented in the same way, with 4x4 matrices.
Projection can also be represented with a 4x4 matrix.
Multiple transformations can be combined into one 4x4 matrix.
The .w component You mention is used during perspective projection.
All this we can do with 4x4 matrices and thus we need 4-component vectors (so they can be multiplied by 4x4 matrices). Again, I write about this because the above rules apply both to OpenGL and to Vulkan.
So for purpose of the .w component of the gl_Position variable - it is exactly the same in Vulkan. It is used to scale the position vector - during perspective calculations (projection matrix multiplication) original depth is modified by the original .w component and stored in the .z component of the gl_Position variable. And additionally, original depth is also stored in the .w component. After that (as a fixed-function step) hardware performs perspective division and divides position stored in the gl_Position variable by its .w component.
In orthographic projection steps performed by the hardware are exactly the same, but values used for calculations are different. So the perspective division step is still performed by the hardware but it does nothing (position is dived by 1.0).
gl_Position is a Homogeneous coordinates. The w component plays a role at perspective projection.
The projection matrix describes the mapping from 3D points of the view on a scene, to 2D points on the viewport. It transforms from eye space to the clip space, and the coordinates in the clip space are transformed to the normalized device coordinates (NDC) by dividing with the w component of the clip coordinates (Perspective divide).
At Perspective Projection the projection matrix describes the mapping from 3D points in the world as they are seen from of a pinhole camera, to 2D points of the viewport. The eye space coordinates in the camera frustum (a truncated pyramid) are mapped to a cube (the normalized device coordinates).
Perspective Projection Matrix:
r = right, l = left, b = bottom, t = top, n = near, f = far
2*n/(r-l) 0 0 0
0 2*n/(t-b) 0 0
(r+l)/(r-l) (t+b)/(t-b) -(f+n)/(f-n) -1
0 0 -2*f*n/(f-n) 0
When a Cartesian coordinate in view space is transformed by the perspective projection matrix, then the the result is a Homogeneous coordinates. The w component grows with the distance to the point of view. This cause that the objects become smaller after the Perspective divide, if they are further away.
In computer graphics, transformations are represented with matrices. If you want something to rotate, you multiply all its vertices (a vector) by a rotation matrix. Want it to move? Multiply by translation matrix, etc.
tl;dr: You can't describe translation along the z-axis with 3D matrices and vectors. You need at least 1 more dimension, so they just added a dummy dimension w. But things break if it's not 1, so keep it at 1 :P.
Anyway, now we begin with a quick review on matrix multiplication:
You basically put x above a, y above b, z above c. Multiply the whole column by the variable you just moved, and sum up everything in the row.
So if you were to translate a vector, you'd want something like:
See how x and y is now translated by az and bz? That's pretty awkward though:
You'd have to account for how big z is whenever you move things (what if z was negative? You'd have to move in opposite directions. That's cumbersome as hell if you just want to move something an inch over...)
You can't move along the z axis. You'll never be able to fly or go underground
But, if you can make sure z = 1 at all times:
Now it's much clearer that this matrix allows you to move in the x-y plane by a, and b amounts. Only problem is that you're conceptually levitating all the time, and you still can't go up or down. You can only move in 2D.
But you see a pattern here? With 3D matrices and 3D vectors, you can describe all the fundamental movements in 2D. So what if we added a 4th dimension?
Looks familiar. If we keep w = 1 at all times:
There we go, now you get translation along all 3 axis. This is what's called homogeneous coordinates.
But what if you were doing some big & complicated transformation, resulting in w != 1, and there's no way around it? OpenGL (and basically any other CG system I think) will do what's called normalization: divide the resultant vector by the w component. I don't know enough to say exactly why ('cause scaling is a linear transformation?), but it has favorable implications (can be used in perspective transforms). Anyway, the translation matrix would actually look like:
And there you go, see how each component is shrunken by w, then it's translated? That's why w controls scaling.
I have a basic cube generated in my 3D world. I can rotate correctly around the camera, but when I translate after rotating, the translations are not correct.
For example, if I rotate 90 degrees and translate into the Z axis, it would move as if translating in the X axis.
glLoadIdentity();
glRotatef(angle,0,1,0); //Rotate around the camera.
glTranslatef(movX,movY,movZ); //Translate after rotating around the camera.
glCallList(cubes[0]);
I need some help with this. Also, I tried translating before rotating, but the rotation is not at the camera. It is at the edge of the cube.
Keep in mind that in OpenGL that the transformation is applied to the camera, not the objects rendered; so you observe the inverse of the transformation you expected.
Also in OpenGL the Y and Z axes are flipped (Y is vertical), so you observe a horizontal translation instead of a vertical one.
Also, because the object is rotated though 90 degrees about Y, the X and Z axes replace each other (one of them is reversed).
willywonkadailyblah's answer is half correct. Because you are using the old OpenGL, you're using the old matrix stack. You are modifying the modelview matrix when you're doing your glRotatef and glTranslatef calls. The modelview matrix is actually the model's matrix and the camera's view matrix precombined (already multiplied together). These matrices are what determine where your object is in 3D space and where your viewing position/direction of the world is. So you can think of your calls as moving the camera, but it's probably easier to think of them as moving and rotating the world.
These rotate and translation calls are linear transformations. This has a precise definition, but for our purposes it means that you can represent the transformation as a matrix and you multiply it with the point's coordinates to apply the transformation to a point. Now matrix multiplication is not commutative, meaning AB != BA. All this to say that when you rotate, then translate it is different than translating and rotating, which I think you know. But then when you translate, rotate, and translate again, it might be a little more difficult to follow what you're actually doing. Worse even if you throw in some scaling in there. So I would suggest learning how linear transformations work and maintaining your own matrices for the objects and camera if you're serious about learning OpenGL.
learnopengl.org is an excellent website, but it teaches you Modern OpenGL, not what you're currently using. But the lesson on transformations and on coordinate systems are probably generally helpful, even without exact code for you to follow
I am learning OpenGL graphics, and am getting into shadows. The tutorials that I am reading are telling me to transform my normals and light vector to camera space. Why is this? Why can't you just keep the coords in model space?
A follow up question to this is how to handle model transformations. I am unable to find a definitive answer. I currently have this code:
vec3 normCamSpace = normalize(mat3(V) * normal);"
vec3 dirToLight = (V*vec4(lightPos, 0.0)).xyz;"
float cosTheta = clamp(dot(normCamSpace, dirToLight),0,1);"
V is the view matrix, or the camera matrix. I am unsure how to move or edit the light when the model changes in position, rotation, and scale.
The main reason is, that usually your light positions will not be given in model space, but world space. However for illumination to work efficiently all calculations must happen in a common space. In your usual transformation chain, model local coordinates are transformed by the modelview matrix directly into view space
p_view = MV · p_local
Since you normally have only one modelview matrix it would be cumbersome to separate this steap into something like
p_world = M · p_local
p_view = V · p_world
For that you required MV to be separated.
Since the projection transformation traditionally happens as a separate step, view space is the natural "common lower ground" on which illumination calculation to base on. It just involves transforming transforming your light positions from world to view space, and since light positions are not very complex, this is done on the CPU and the pretransformed light positions given as shader.
Note that nothing is stopping you from performing illumination calculations in world space, or model local space. It just takes transforming the light positions correctly.
I am learning OpenGL graphics, and am getting into shadows. The tutorials that I am reading are telling me to transform my normals and light vector to camera space. Why is this? Why can't you just keep the coords in model space?
Actually if you're the one writing the shader, you can use whatever coordinate space you want. IMO calculating lighting in world space feels more "natural", but that's matter of taste.
However, there are two small details:
You cannot "naturally" calculate lighting in object space, if your object is a skinned mesh (character model animated by bones). Such model will require world space or view space. If your object can be only translated and rotated (affine transforms only), then lighting can be easily calculated in model/object space. I think some game engines actualy worked this way.
If you use camera space, you can drop one subtraction when calculating specular highlights. Blinn/phong specular models require vector to(or from) eye to calculate specular factor. In camera space vector from eye to point is equal to point position. This is a very small optimization and it probably isn't worth the effort.
In opengl there is one world coordinate system with origin (0,0,0).
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move
objects in world coordinates, or do they move the camera? As you know, the same movement can be achieved by either moving objects or camera.
I am guessing that glTranslate, glRotate, change objects, and gluLookAt changes the camera?
In opengl there is one world coordinate system with origin (0,0,0).
Well, technically no.
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move objects in world coordinates, or do they move the camera?
Neither. OpenGL doesn't know objects, OpenGL doesn't know a camera, OpenGL doesn't know a world. All that OpenGL cares about are primitives, points, lines or triangles, per vertex attributes, normalized device coordinates (NDC) and a viewport, to which the NDC are mapped to.
When you tell OpenGL to draw a primitive, each vertex is processed according to its attributes. The position is one of the attributes and usually a vector with 1 to 4 scalar elements within local "object" coordinate system. The task at hand is to somehow transform the local vertex position attribute into a position on the viewport. In modern OpenGL this happens within a small program, running on the GPU, called a vertex shader. The vertex shader may process the position in an arbitrary way. But the usual approach is by applying a number of nonsingular, linear transformations.
Such transformations can be expressed in terms of homogenous transformation matrices. For a 3 dimensional vector, the homogenous representation in a vector with 4 elements, where the 4th element is 1.
In computer graphics a 3-fold transformation pipeline has become sort of the standard way of doing things. First the object local coordinates are transformed into coordinates relative to the virtual "eye", hence into eye space. In OpenGL this transformation used to be called the modelview transformaion. With the vertex positions in eye space several calculations, like illumination can be expressed in a generalized way, hence those calculations happen in eye space. Next the eye space coordinates are tranformed into the so called clip space. This transformation maps some volume in eye space to a specific volume with certain boundaries, to which the geometry is clipped. Since this transformation effectively applies a projection, in OpenGL this used to be called the projection transformation.
After clip space the positions get "normalized" by their homogenous component, yielding normalized device coordinates, which are then plainly mapped to the viewport.
To recapitulate:
A vertex position is transformed from local to clip space by
vpos_eye = MV · vpos_local
eyespace_calculations(vpos_eye);
vpos_clip = P · vpos_eye
·: inner product column on row vector
Then to reach NDC
vpos_ndc = vpos_clip / vpos_clip.w
and finally to the viewport (NDC coordinates are in the range [-1, 1]
vpos_viewport = (vpos_ndc + (1,1,1,1)) * (viewport.width, viewport.height) / 2 + (viewport.x, viewport.y)
*: vector component wise multiplication
The OpenGL functions glRotate, glTranslate, glScale, glMatrixMode merely manipulate the transformation matrices. OpenGL used to have four transformation matrices:
modelview
projection
texture
color
On which of them the matrix manipulation functions act on can be set using glMatrixMode. Each of the matrix manipulating functions composes a new matrix by multiplying the transformation matrix they describe on top of the select matrix thereby replacing it. The functions glLoadIdentity replace the current matrix with identity, glLoadMatrix replaces it with a user defined matrix, and glMultMatrix multiplies a user defined matrix on top of it.
So how does the modelview matrix then emulate both object placement and a camera. Well, as you already stated
As you know, the same movement can be achieved by either moving objects or camera.
You can not really discern between them. The usual approach is by splitting the object local to eye transformation into two steps:
Object to world – OpenGL calls this the "model transform"
World to eye – OpenGL calls this the "view transform"
Together they form the model-view, in fixed function OpenGL described by the modelview matrix. Now since the order of transformations is
local to world, Model matrix vpos_world = M · vpos_local
world to eye, View matrix vpos_eye = V · vpos_world
we can substitute by
vpos_eye = V · ( M · vpos_local ) = V · M · vpos_local
replacing V · M by the ModelView matrix =: MV
vpos_eye = MV · vpos_local
Thus you can see that what's V and what's M of the compund matrix M is only determined by the order of operations in which you multiply onto the modelview matrix, and at which step you decide to "call it the model transform from here on".
I.e. right after a
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
the view is defined. But at some point you'll start applying model transformations and everything after is model.
Note that in modern OpenGL all the matrix manipulation functions have been removed. OpenGL's matrix stack never was feature complete and no serious application did actually use it. Most programs just glLoadMatrix-ed their self calculated matrices and didn't bother with the OpenGL built-in matrix maniupulation routines.
And ever since shaders were introduced, the whole OpenGL matrix stack got awkward to use, to say it nicely.
The verdict: If you plan on using OpenGL the modern way, don't bother with the built-in functions. But keep in mind what I wrote, because what your shaders do will be very similar to what OpenGL's fixed function pipeline did.
OpenGL is a low-level API, there is no higher-level concepts like an "object" and a "camera" in the "scene", so there are only two matrix modes: MODELVIEW (a multiplication of "camera" matrix by the "object" transformation) and PROJECTION (the projective transformation from world-space to post-perspective space).
Distinction between "Model" and "View" (object and camera) matrices is up to you. glRotate/glTranslate functions just multiply the currently selected matrix by the given one (without even distinguishing between ModelView and Projection).
Those functions multiply (transform) the current matrix set by glMatrixMode() so it depends on the matrix you're working on. OpenGL has 4 different types of matrices; GL_MODELVIEW, GL_PROJECTION, GL_TEXTURE, and GL_COLOR, any one of those functions can change any of those matrices. So, basically, you don't transform objects you just manipulate different matrices to "fake" that effect.
Note that glulookat() is just a convenient function equivalent to a translation followed by some rotations, there's nothing special about it.
All transformations are transformations on objects. Even gluLookAt is just a transformation to transform the objects as if the camera was where you tell it to be. Technically they are transformations on the vertices, but that's just semantics.
That's true, glTranslate, glRotate change the object coordinates before rendering and gluLookAt changes the camera coordinate.