I have read multiple articles and posts about pre/post multiplications, column/row major, DirectX vs OpenGL, and I might be more confused than at the beginning.
So, let's say we write those instructions (pseudocode) in OpenGL:
rotate(...)
translate(...)
From what I understand, OpenGL will do v' = R * T * v, effectively transforming the local coordinates frame of the vector.
But in DirectX, if we do the transformations (pseudocode) in the same order,
rotate(...)
translate(...)
the result will not be the same, right? Since DirectX pre-multiplies, the result will be v' = v * R * T, thus transforming the vector using global coordinates.
So, am I correct when I say that OpenGL being post-multiplication and DirectX being pre-multiplication is like saying that OpenGL moves in local coordinates while DirectX moves in global coordinates?
Thank you.
Your best bet is to read Matrices, Handedness, Pre and Post Multiplication, Row vs Column Major, and Notations.
OpenGL code is often using a right-handed coordinate system, column-major matrices, column vectors, and post-multiplication.
Direct3D code is often using a left-handed coordinate system, row-major matrices, row vectors, and pre-multiplication.
XNA Game Studio's math library (and therefore Monogame, Unity, etc.) use right-handed coordinate system, row-major matrices, row vectors, and pre-multiplication.
The DirectXMath library uses row-major matrices, row vectors, and pre-multiplication, but leaves it up to you to choose to use a left-handed or right-handed coordinate system where it matters. This was even true of the older now deprecated D3DXMath.
In either system, you are still doing the same object coordinate -> world coordinate -> eye coordinate -> clip coordinate transformations.
So with OpenGL GLM you might do:
using namespace glm;
mat4 myTranslationMatrix = translate(10.0f, 0.0f, 0.0f);
mat4 myRotationMatrix = rotate( 90.f, vec3( 0, 1, 0 ) );
mat4 myScaleMatrix = scale(2.0f, 2.0f, 2.0f);
mat4 myModelMatrix = myTranslationMatrix * myRotationMatrix * myScaleMatrix;
vec4 myTransformedVector = myModelMatrix * myOriginalVector;
In DirectXMath you'd do:
using namespace DirectX;
XMMATRIX myTranslationMatrix = XMMatrixTranslation(10.0f, 0.0f, 0.0f);
XMMATRIX myRotationMatrix = XMMatrixRotationY(XMConvertToRadians(90.f));
XMMATRIX myScaleMatrix = XMMatrixScaling(2.0f, 2.0f, 2.0f)
XMMATRIX myModelMatrix = myScaleMatrix * myRotationMatrix * myTranslationMatrix;
XMVECTOR myTransformedVector = XMVector4Transform( myOriginalVector, myModelMatrix );
And you will get the same transformation as a result.
If you are new to DirectXMath, then you should take a look at the SimpleMath wrapper in the DirectX Tool Kit which hides some of the strict SIMD-friendly alignment requirements with C++ constructors and operators. Because SimpleMath is based on XNA Game Studio C# math design, it assumes right-handed view coordinates but you can easily mix it with 'native' DirectXMath as well to use left-handed view coordinates if desired.
Most of these decisions were arbitrary, but did have sound design reasoning for them. OpenGL's math library was trying to match normal mathematical convention of post-multiplication, which lead to them adopting column-major matrices. In the early days of Direct3D, the team felt that the reversing of the concatenation order was confusing so they flipped all the conventions.
Many years later, the XNA Game Studio team felt that the traditional Direct3D concatenation order was intuitive, but that having 'forward' be negative z was confusing so they switched to right-handed coordinates. Many of the more modern Direct3D samples therefore use right-handed view systems, but you'll still see a mix of both left and right-handed viewing setups for Direct3D samples.So at this point, we really have "OpenGL style", "classic Direct3D style", and "modern Direct3D style".
Note that these conventions really mattered back when things were done with fixed-function hardware but with programmable shader pipelines what matters is that you are consistent. In fact, the HLSL shader defaults to expecting matrices to be in column-major form so you'll often see DirectXMath matrices transposed as they are copied into constant buffers.
There is no such thing as premultiplication in OpenGL.
The reason why you think OpenGL reverses the multiplications is, that it stores the matrices in column-major layout:
a c
b d
The mathematical rule for multiplying transposed matrices is:
A^T * B^T = (B*A)^T
So if you want to compute v' = v * R * T (all row-major matrices) you will have to write
(v')^T = v * R * T
or in your pseudo-code:
translate(...) rotate(...) translate(...)
Alternatively you could store your rotation and translation matrices column-major too.
Related
I found this in our internal code as well and I'm trying to understand what is happening.
In the following code: https://github.com/microsoft/DirectX-Graphics-Samples/tree/master/Samples/Desktop/D3D12MeshShaders/src/MeshletRender
They do Transpose(M * V * P) before sending it to the shader. In the shader it's treated as a row-major matrix and they do pos * MVP. Why is this? I have similar code where we multiply the MVP outside in a row-major matrix and then insert it into the shaders row-major matrix, and then we do mul(pos, transpose(mvp)).
We have similar code for PSSL where we do the M * V * P and send it to the shader where we have specified that the matrix is row_major float4x4 but then we don't have to do transpose.
Hopefully someone can help me out here because it's very confusing. Does it have to do with home the memory is handled?
I got confirmed that DX11 is column-major.
On line 32, the combined model-view-projection matrix is computed by
multiplying the projection, view, and world matrix together. You will
notice that we are post-multiplying the world matrix by the view
matrix and the model-view matrix by the projection matrix. If you have
done some programming with DirectX in the past, you may have used
row-major matrix order in which case you would have swapped the order
of multiplications. Since DirectX 10, the default order for matrices
in HLSL is column-major so we will stick to this convention in this
demo and future DirectX demos.
Using column-major matrices means that we have to post-multiply the
vertex position by the model-view-projection matrix to correctly
transform the vertex position from object-space to homogeneous
clip-space.
From https://www.3dgep.com/introduction-to-directx-11/
And https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-per-component-math#matrix-ordering
Matrix packing order for uniform parameters is set to column-major by
default.
Hope this saves someone from going insane.
I am creating 3D/2D graphics engine, I wrote some fancy Vector2, Vector3 classes, Window wrapper and OpenGL context creation framework and since a while I was thinking how can I switch Coordinate System axes. By default in OpenGL it goes like that (as far as I know):
+X axis stands for Right | -X for Left
+Y axis stands for Up | -Y stands for Down
-Z stands for Forward | -Z stands for Backward
I really, really do not want to have coords like that, it just makes for me coords unreadable. So I thought about that UE4-style coordinates:
+X axis stands for Forward
+Y axis stands for Right
+Z axis stands for Up
How can I switch these axes?
I've read about "tweaking" perspective but there was only inverting axis not switching them.
And my second question is where I can learn some matrix operations (straightly for computer graphics)? I do not want to download ready-to-use source code or extensions. The main purpose i write this engine is to learn maths and the most important thing for me now are matrixes - projection matrices, rotation matrices, translation matrices, scale etc.
You are describing OpenGL's clip space coordinates and comparing it to UE4's world space coordinates. In general, clip space and world space do not need to have any relationship to each other whatsoever, so it does not really make sense to compare them.
All you have to do is create a view matrix which converts your coordinates from world space to camera space. This matrix is often combined with the conversion from model space to world space, and the matrix which converts from camera space to clip space. Combining all three matrices gives you the "modelviewprojection" matrix.
Your vertex shader will end up looking something like this:
// In model space
in vec3 Coords;
// Conversion from model space to clip space
uniform mat4 MVP;
void main() {
gl_Position = MVP * vec4(Coords, 1.0);
}
MVP will be made out of a combination of rotation, translation, and scale matrices that work together. The resulting matrix will probably look something like this:
MVP = projection matrix * rotation matrix * translation matrix * scale matrix
The rotation matrix is the sauce that gets your axes right. It will look something like this:
rotation matrix = rotate around Z axis by (-roll) * rotate around X axis by (pitch + pi/2) * rotate around Z axis by (yaw - pi/2)
I suggest that you use the GLM library, which provides the base vector and matrix types and operations in C++ that are provided for you in GLSL. This lets you do linear algebra fairly easily.
I do not recommend trying to write these functions yourself. It is mostly just a boring and repetitive programming task with lots of opportunities to make typos. Speaking from experience. Or, let me put it this way. Making your own pencils does not make you a better writer.
In opengl there is one world coordinate system with origin (0,0,0).
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move
objects in world coordinates, or do they move the camera? As you know, the same movement can be achieved by either moving objects or camera.
I am guessing that glTranslate, glRotate, change objects, and gluLookAt changes the camera?
In opengl there is one world coordinate system with origin (0,0,0).
Well, technically no.
What confuses me is what all the transformations like glTranslate, glRotate, etc. do? Do they move objects in world coordinates, or do they move the camera?
Neither. OpenGL doesn't know objects, OpenGL doesn't know a camera, OpenGL doesn't know a world. All that OpenGL cares about are primitives, points, lines or triangles, per vertex attributes, normalized device coordinates (NDC) and a viewport, to which the NDC are mapped to.
When you tell OpenGL to draw a primitive, each vertex is processed according to its attributes. The position is one of the attributes and usually a vector with 1 to 4 scalar elements within local "object" coordinate system. The task at hand is to somehow transform the local vertex position attribute into a position on the viewport. In modern OpenGL this happens within a small program, running on the GPU, called a vertex shader. The vertex shader may process the position in an arbitrary way. But the usual approach is by applying a number of nonsingular, linear transformations.
Such transformations can be expressed in terms of homogenous transformation matrices. For a 3 dimensional vector, the homogenous representation in a vector with 4 elements, where the 4th element is 1.
In computer graphics a 3-fold transformation pipeline has become sort of the standard way of doing things. First the object local coordinates are transformed into coordinates relative to the virtual "eye", hence into eye space. In OpenGL this transformation used to be called the modelview transformaion. With the vertex positions in eye space several calculations, like illumination can be expressed in a generalized way, hence those calculations happen in eye space. Next the eye space coordinates are tranformed into the so called clip space. This transformation maps some volume in eye space to a specific volume with certain boundaries, to which the geometry is clipped. Since this transformation effectively applies a projection, in OpenGL this used to be called the projection transformation.
After clip space the positions get "normalized" by their homogenous component, yielding normalized device coordinates, which are then plainly mapped to the viewport.
To recapitulate:
A vertex position is transformed from local to clip space by
vpos_eye = MV · vpos_local
eyespace_calculations(vpos_eye);
vpos_clip = P · vpos_eye
·: inner product column on row vector
Then to reach NDC
vpos_ndc = vpos_clip / vpos_clip.w
and finally to the viewport (NDC coordinates are in the range [-1, 1]
vpos_viewport = (vpos_ndc + (1,1,1,1)) * (viewport.width, viewport.height) / 2 + (viewport.x, viewport.y)
*: vector component wise multiplication
The OpenGL functions glRotate, glTranslate, glScale, glMatrixMode merely manipulate the transformation matrices. OpenGL used to have four transformation matrices:
modelview
projection
texture
color
On which of them the matrix manipulation functions act on can be set using glMatrixMode. Each of the matrix manipulating functions composes a new matrix by multiplying the transformation matrix they describe on top of the select matrix thereby replacing it. The functions glLoadIdentity replace the current matrix with identity, glLoadMatrix replaces it with a user defined matrix, and glMultMatrix multiplies a user defined matrix on top of it.
So how does the modelview matrix then emulate both object placement and a camera. Well, as you already stated
As you know, the same movement can be achieved by either moving objects or camera.
You can not really discern between them. The usual approach is by splitting the object local to eye transformation into two steps:
Object to world – OpenGL calls this the "model transform"
World to eye – OpenGL calls this the "view transform"
Together they form the model-view, in fixed function OpenGL described by the modelview matrix. Now since the order of transformations is
local to world, Model matrix vpos_world = M · vpos_local
world to eye, View matrix vpos_eye = V · vpos_world
we can substitute by
vpos_eye = V · ( M · vpos_local ) = V · M · vpos_local
replacing V · M by the ModelView matrix =: MV
vpos_eye = MV · vpos_local
Thus you can see that what's V and what's M of the compund matrix M is only determined by the order of operations in which you multiply onto the modelview matrix, and at which step you decide to "call it the model transform from here on".
I.e. right after a
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
the view is defined. But at some point you'll start applying model transformations and everything after is model.
Note that in modern OpenGL all the matrix manipulation functions have been removed. OpenGL's matrix stack never was feature complete and no serious application did actually use it. Most programs just glLoadMatrix-ed their self calculated matrices and didn't bother with the OpenGL built-in matrix maniupulation routines.
And ever since shaders were introduced, the whole OpenGL matrix stack got awkward to use, to say it nicely.
The verdict: If you plan on using OpenGL the modern way, don't bother with the built-in functions. But keep in mind what I wrote, because what your shaders do will be very similar to what OpenGL's fixed function pipeline did.
OpenGL is a low-level API, there is no higher-level concepts like an "object" and a "camera" in the "scene", so there are only two matrix modes: MODELVIEW (a multiplication of "camera" matrix by the "object" transformation) and PROJECTION (the projective transformation from world-space to post-perspective space).
Distinction between "Model" and "View" (object and camera) matrices is up to you. glRotate/glTranslate functions just multiply the currently selected matrix by the given one (without even distinguishing between ModelView and Projection).
Those functions multiply (transform) the current matrix set by glMatrixMode() so it depends on the matrix you're working on. OpenGL has 4 different types of matrices; GL_MODELVIEW, GL_PROJECTION, GL_TEXTURE, and GL_COLOR, any one of those functions can change any of those matrices. So, basically, you don't transform objects you just manipulate different matrices to "fake" that effect.
Note that glulookat() is just a convenient function equivalent to a translation followed by some rotations, there's nothing special about it.
All transformations are transformations on objects. Even gluLookAt is just a transformation to transform the objects as if the camera was where you tell it to be. Technically they are transformations on the vertices, but that's just semantics.
That's true, glTranslate, glRotate change the object coordinates before rendering and gluLookAt changes the camera coordinate.
I am converting some old software to support OpenGL. DirectX and OpenGL have different coordinate systems (OpenGL is right, DirectX is left). I know that in the old fixed pipeline functionality, I would use:
glScalef(1.0f, 1.0f, -1.0f);
This time around, I am working with GLM and shaders and need a compatible solution. I have tried multiplying my camera matrix by a scaling vector with no luck.
Here is my camera set up:
// Calculate the direction, right and up vectors
direction = glm::vec3(cos(anglePitch) * sin(angleYaw), sin(anglePitch), cos(anglePitch) * cos(angleYaw));
right = glm::vec3(sin(angleYaw - 3.14f/2.0f), 0, cos(angleYaw - 3.14f/2.0f));
up = glm::cross(right, direction);
// Update our camera matrix, projection matrix and combine them into my view matrix
cameraMatrix = glm::lookAt(position, position+direction, up);
projectionMatrix = glm::perspective(50.0f, 4.0f / 3.0f, 0.1f, 1000.f);
viewMatrix = projectionMatrix * cameraMatrix;
I have tried a number of things including reversing the vectors and reversing the z coordinate in the shader. I have also tried multiplying by the inverse of the various matrices and vectors and multiplying the camera matrix by a scaling vector.
Don't think about the handedness that much. It's true, they use different conventions, but you can just choose not to use them and it boils down to almost the same thing in both APIs. My advice is to use the exact same matrices and setups in both APIs except for these two things:
All you should need to do to port from DX to GL is:
Reverse the cull-face winding - DX culls counter-clockwise by default, while that's what GL keeps.
Adjust for the different depth range: DX uses a depth range of 0(near) to 1(far), while GL uses a signed range from -1 for near and 1 for far. You can just do this as "a last step" in the projection matrix.
DX9 also has issues with pixel-coordinate offsets, but that's something else entirely and it's no longer an issue with DX10 onward.
From what you describe, the winding is probably your problem, since you are using the GLM functions to generate matrices that should be alright for OpenGL.
I have a bit of experience writing OpenGL 2 applications and want to learn using OpenGL 3. For this I've bought the Addison Wesley "Red-book" and "Orange-book" (GLSL) which descirbe the deprecation of the fixed functionality and the new programmable pipeline (shaders). But what I can't get a grasp of is how to construct a scene with multiple objects without using the deprecated translate*, rotate* and scale* functions.
What I used to do in OGL2 was to "move about" in 3D space using the translate and rotate functions, and create the objects in local coordinates where I wanted them using glBegin ... glEnd. In OGL3 these functions are all deprecated, and, as I understand, replaced by shaders. But I can't call a shaderprogram for each and every object I make, can I? Wouldn't this affect all the other objects too?
I'm not sure if I've explained my problem satisfactory, but the core of it is how to program a scene with multiple objects defined in local coordinates in OpenGL 3.1. All the beginner tutorials I've found only uses a single object and doesn't have/solve this problem.
Edit: Imagine you want two spinning cubes. It would be a pain manually modifying each vertex coordinate, and you can't simply modify the modelview-matrix, because that would rather spin the camera around two static cubes...
Let's start with the basics.
Usually, you want to transform your local triangle vertices through the following steps:
local-space coords-> world-space coords -> view-space coords -> clip-space coords
In standard GL, the first 2 transforms are done through GL_MODELVIEW_MATRIX, the 3rd is done through GL_PROJECTION_MATRIX
These model-view transformations, for the many interesting transforms that we usually want to apply (say, translate, scale and rotate, for example), happen to be expressible as vector-matrix multiplication when we represent vertices in homogeneous coordinates. Typically, the vertex V = (x, y, z) is represented in this system as (x, y, z, 1).
Ok. Say we want to transform a vertex V_local through a translation, then a rotation, then a translation. Each transform can be represented as a matrix*, let's call them T1, R1, T2.
We want to apply the transform to each vertex: V_view = V_local * T1 * R1 * T2. Matrix multiplication being associative, we can compute once and for all M = T1 * R1 * T2.
That way, we only need to pass down M to the vertex program, and compute V_view = V_local * M. In the end, a typical vertex shader multiplies the vertex position by a single matrix. All the work to compute that one matrix is how you move your object from local space to the clip space.
Ok... I glanced over a number of important details.
First, what I described so far only really covers the transformation we usually want to do up to the view space, not the clip space. However, the hardware expects the output position of the vertex shader to be represented in that special clip-space. It's hard to explain clip-space coordinates without significant math, so I will leave that out, but the important bit is that the transformation that brings the vertices to that clip-space can usually be expressed as the same type of matrix multiplication. This is what the old gluPerspective, glFrustum and glOrtho compute.
Second, this is what you apply to vertex positions. The math to transform normals is somewhat different. That's because you want the normal to stay perpendicular to the surface after transformation (for reference, it requires a multiplication by the inverse-transpose of the model-view in the general case, but that can be simplified in many cases)
Third, you never send 4-D coordinates to the vertex shader. In general you pass 3-D ones. OpenGL will transform those 3-D coordinates (or 2-D, btw) to 4-D ones so that the vertex shader does not have to add the extra coordinate. it expands each vertex to add the 1 as the w coordinate.
So... to put all that back together, for each object, you need to compute those magic M matrices based on all the transforms that you want to apply to the object. Inside the shader, you then have to multiply each vertex position by that matrix and pass that to the vertex shader Position output. Typical code is more or less (this is using old nomenclature):
mat4 MVP;
gl_Position=MVP * gl_Vertex;
* the actual matrices can be found on the web, notably on the man pages for each of those functions: rotate, translate, scale, perspective, ortho
Those functions are apparently deprecated, but are technically still perfectly functional and indeed will compile. So you can certainly still use the translate3f(...) etc functions.
HOWEVER, this tutorial has a good explanation of how the new shaders and so on work, AND for multiple objects in space.
You can create x arrays of vertexes, and bind them into x VAO objects, and you render the scene from there with shaders etc...meh, it's easier for you to just read it - it is a really good read to grasp the new concepts.
Also, the OpenGL 'Red Book' as it is called has a new release - The Official Guide to Learning OpenGL, Versions 3.0 and 3.1. It includes 'Discussion of OpenGL’s deprecation mechanism and how to verify your programs for future versions of OpenGL'.
I hope that's of some assistance!