Matrix transposition before passing to the vertex shader - c++

I'm very confused about passing matrices to a vertex shader, as far as i know you have to transpose matrices before passing them to a vertex shader.
But my world matrix when i pass it to a vertex shader did not work correctly, it worked fine with scaling and rotation but translation caused weird visual glitches. So through trial and error i found that this problem could be solved by not transposing the world matrix before passing it to a vertex shader, but when i tried the same with view and projection matrices nothing worked.
I don't understand why I'm very confused, do i have to transpose all matrices except world matrices?

It depends on the code of your shaders.
Without any of the /Zpr or /Zpc HLSL compiler options, when your HLSL code says pos = mul( matrix, vector ) the matrix is expected to be column major. When HLSL code says pos = mul( vector, matrix ), the matrix is expected to be row major.
Column major matrices are slightly faster to handle on GPUs, following reasons.
The HLSL for multiplication compiles into four dp4 instructions. Dot products are fast on GPUs, used everywhere a lot, especially in pixel shaders.
VRAM access pattern is slightly better. If you want to know more, the keyword is “memory coalescing”, most sources are about CUDA but that thing is equally applicable to graphics.
That’s why Direct3D defaults to column major layout.

Related

Why is the MVP being transposed in DirectX example

I found this in our internal code as well and I'm trying to understand what is happening.
In the following code: https://github.com/microsoft/DirectX-Graphics-Samples/tree/master/Samples/Desktop/D3D12MeshShaders/src/MeshletRender
They do Transpose(M * V * P) before sending it to the shader. In the shader it's treated as a row-major matrix and they do pos * MVP. Why is this? I have similar code where we multiply the MVP outside in a row-major matrix and then insert it into the shaders row-major matrix, and then we do mul(pos, transpose(mvp)).
We have similar code for PSSL where we do the M * V * P and send it to the shader where we have specified that the matrix is row_major float4x4 but then we don't have to do transpose.
Hopefully someone can help me out here because it's very confusing. Does it have to do with home the memory is handled?
I got confirmed that DX11 is column-major.
On line 32, the combined model-view-projection matrix is computed by
multiplying the projection, view, and world matrix together. You will
notice that we are post-multiplying the world matrix by the view
matrix and the model-view matrix by the projection matrix. If you have
done some programming with DirectX in the past, you may have used
row-major matrix order in which case you would have swapped the order
of multiplications. Since DirectX 10, the default order for matrices
in HLSL is column-major so we will stick to this convention in this
demo and future DirectX demos.
Using column-major matrices means that we have to post-multiply the
vertex position by the model-view-projection matrix to correctly
transform the vertex position from object-space to homogeneous
clip-space.
From https://www.3dgep.com/introduction-to-directx-11/
And https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-per-component-math#matrix-ordering
Matrix packing order for uniform parameters is set to column-major by
default.
Hope this saves someone from going insane.

How to use OpenGL Column_Major MVP Matrix in Direct3D 11

I am trying to use a single matrix stack for both OpenGL and Direct3D APIs. From all my research on this site and other articles at:
Article 1,
Article 2 among others, it is my understanding that this should be fairly easy as long as certain nuances are handled and consistency is maintained.
Also, this MSDN Article suggests that HLSL by default uses column_major format.
So I have a working Right-Handed Coordinate System, Column_Major MVP matrix for opengl.
I am trying to use this same matrix in DirectX. Since it is a Right-Handed System, I made sure that I do set rasterizerDesc.FrontCounterClockwise = true;
I have the following in my HLSL shader :
output.position = mul(u_mvpMatrix, input.position);
Note that in the above code I am using a post-multiplication as my mvp-matrix is column major and I send it to the shader without transposing it. So it should send it as column major and therefore, the post-multiplication.
Note: I did try pre-multiplication as well as post and pre-multiplication with the transpose of my mvp.
When calculating the projection matrix, as suggested in Article 2 above, I make sure that the depth range for DX is 0-1 and for openGl it is -1-to-1!
I even tested my resulting matrix against the XMMatrixPerspectiveFovRH function from the DirectXMath library and my matrix matches the one produced by that function.
Now with Rasterization flag set ->FrontCounterClockwise = true, RH column major MVP matrix, with correct depth scaling, I would have expected that this would be all that would be needed to get D3D to work with my Matrix.
What is it that I am missing here?
P.S: I believe I provided all the information here, but please let me know if any more information is needed.

How to multiply vertices with model matrix outside the vertex shader

I am using OpenGL ES2 to render a fairly large number of mostly 2d items and so far I have gotten away by sending a premultiplied model/view/projection matrix to the vertex shader as a uniform and then multiplying my vertices with the resulting MVP in there.
All items are batched using texture atlases and I use one MVP per batch. So all my vertices are relative to the translation of that MVP.
Now I want to have rotation and scaling for each of the separate items, which means I need a different model for each of them. So I modified my vertex to include the model (16 floats!) and added a mat4 attribute in my shader and it all works well. But I'm kinda dissapointed with this solution since it dramatically increased the vertex size.
So as I was staring at my screen trying to think of a different solution I thought about transforming my vertices to world space before I send them over to the shader. Or even to screen space if its possible. The vertices I use are unnormalized coordinates in pixels.
So the question is, is such a thing possible? And if yes how do you do it? I can't think why it shouldn't be since its just maths but after a fairly long search on google, it doesn't look like a lot of people are actually doing this...
Strange cause if it is indeed possible, it would be quite a major optimization in cases like this one.
If the number of matrices per batch are limited then you can pass all those matrices as uniforms (preferably in a UBO) and expand the vertex data with an index which specifies which matrix you need to use.
This is similar to GPU skinning used for skeletal animation.

inconsistencies with matrix maths between GLSL and GLM, or is there such thing as a "bad" view matrix

So, I've come across some oddities between GLSL and GLM.
If I generate the following view matrix (C++):
vec3 pos(4, 1, 1);
vec3 dir(1, 0, 0);
mat4 viewMat = glm::lookAt(pos, pos+dir, vec3(0,0,1));
And then, in glsl, do:
fragColour.rgb = vec3(inverse(viewMat) * vec4(0,0,0,1)) / 4.f;
Then I expect for the screen to become pinkish-red, or (1.0,0.25,0.25). Instead, I get black.
If I do this in GLM, however:
vec3 colour = vec3(glm::inverse(viewMat) * vec4(0,0,0,1)) / 4.f;
cout << glm::to_string(colour) << endl;
I get the expect result of (1.0,0.25,0.25).
Now, if I change the viewMat to instead be (C++):
vec3 pos(4, 1, 1);
vec3 dir(1, 0.000001, 0);
mat4 viewMat = glm::lookAt(pos, pos+dir, vec3(0,0,1));
Then bam! I get (1.0,0.25,0.25) in both GLSL and GLM.
This makes no sense to me. Why does it do this? This view matrix works fine everywhere else in GLSL - I just can't invert it. This happens whenever dirY == 0.f.
Also, please suggest improvements for the question title, I'm not sure what it should be.
Edit: Also, it doesn't seem to have anything to do with lookAt's up vector (which I set to Z anyway). Even if I set up to (0,1,0), the same thing happens. Everything turns sideways, but I still can't invert the view matrix in GLSL.
Edit: Ok, so at derhass' suggestion, I tried sending the view matrix in inverted already. Bam, works perfectly. So, it seems that my GL implementation really is somehow incapable of inverting that matrix. This would have to easily be the weirdest GL bug I've ever come across. Some kind of explanation of why it's a bad idea to invert matrices in shaders would be appreciated though. EditAgain: Sending in inverted matrices throughout my engine resulted in a huge framerate boost. DEFINITELY DO THAT.
Arbitrary 4x4 matrix inversion is not a fast and safe task
For many reasons like lower FPU accuracy on GPU side and the need of many divisions during inversion (well it depends on the method of computation), and not all matrices have inverse etc (I think that is also the reason why GL has no such implementation either) ... For better image on this see Understanding 4x4 homogenous transform matrices and look for matrix_inv function there how complex the computation really is (it uses determinants). There is also GEM (Gauss elimination method) but that is not used because of its quirks and need of sorting rows ...
If the matrices are static per frame render which is usually the case it is a waste of GPU power computing it in Vertex/Fragment/Geometry shaders again and again per each Vertex/Fragment (that is why the speed boost is there).
Someone could oppose that orthogonal homogenous matrix inversion is just transposing the matrix but how the GL/GLSL could know that it deals with such matrix (the check is not that simple either) anyway in that case you can use transpose which is implemented in GLSL and should be fast (it is just reordering of elements)

Create view matrices in GLSL shader

I have many positions and directions stored in 1D textures on the GPU. I want to use those as rendersources in a GLSL geometry shader. To do this, I need to create corresponding view matrices from those textures.
My first thought is to take a detour to the CPU, read the textures to memory and create a bunch of view matrices from there, with something like glm::lookat(). Then send the matrices as uniform variables to the shader.
My question is, wether it is possible to skip this detour and instead create the view matrices directly in the GLSL geometry shader? Also, is this feasible performance wise?
Nobody says (or nobody should say) that your view matrix has to come from the CPU through a uniform. You can just generate the view matrix from the vectors in your texture right inside the shader. Maybe the implementation of the good old gluLookAt is of help to you there.
If this approach is a good idea performance-wise, is another question, but if this texture is quite large or changes frequently, this aproach might be better than reading it back to the CPU.
But maybe you can pre-generate the matrices into another texture/buffer using a simple GPGPU-like shader that does nothing more than generate a matrix for each position/vector in the textures and store this in another texture (using FBOs) or buffer (using transform feedback). This way you don't need to make a roundtrip to the CPU and you don't need to generate the matrices anew for each vertex/primitive/whatever. On the other hand this will increase the required memory as a 4x4 matrix is a bit more heavy than a position and a direction.
Sure. Read the texture, and build the matrices from the values...
vec4 x = texture(YourSampler, WhateverCoords1);
vec4 y = texture(YourSampler, WhateverCoords2);
vec4 z = texture(YourSampler, WhateverCoords3);
vec4 w = texture(YourSampler, WhateverCoords4);
mat4 matrix = mat4(x,y,z,w);
Any problem with this ? Or did I miss something ?
The view matrix is a uniform, and uniforms don't change in the middle of a render batch, nor can they be written to from a shader (directly). Insofar I don't see how generating it could be possible, at least not directly.
Also note that the geometry shader runs after vertices have been transformed with the modelview matrix, so it does not make all too much sense (at least during the same pass) to re-generate that matrix or part of it.
You could of course probably still do some hack with transform feedback, writing some values to a buffer, and either copy/bind this as uniform buffer later or just read the values from within a shader and multiply as a matrix. That would at least avoid a roundtrip to the CPU -- the question is whether such an approach makes sense and whether you really want to do such an obscure thing. It is hard to tell what's best without knowing exactly what you want to achieve, but quite probably just transforming things in the vertex shader (read those textures, build a matrix, multiply) will work better and easier.