How to compute vector transformations on the CPU? - c++

[enter image description here][1]
I’ve been trying to transform my vertices outside the main graphics pipeline. I need them on the CPU. But as simple as it seemed at first, I have spent a significant duration trying to implement that but simply failed. I have been trying to figure out the error with my method, but it just seems perfect to me.
I have my world, camera and projection matrices (that I am using in the main graphics pipeline to render objects) working. I use the same matrices to transform the vectors with the function XMVector4Transform(). I have set the w component of my vector to 1 and then when I transform my vertices, instead of getting normalized (between -1 to 1) outputs (while the 3d model is inside the screen space), I am getting values that are outside the screen while with the same matrix transformations in the shader, it is being rendered inside the screen space.
Now after some digging I found that I need to use the function XMVector4Normalize() to normalize the coords. Though after using that the results were normalized, but still there is a major offset between the CPU computed vertices and those that I compute in the shader. And the offset margin increases as I move the objects to the edges.
https://i.stack.imgur.com/bhnpB.png
in the above screenshot, the wireframe is rendered using the CPU computed vertices and the solidly shaded version is being rendered in the main pipeline. the offset that i mentioned can be clearly observed in the screenshot.
PS : I am rendering the CPU computed verts just to test...
DirectX::XMVECTOR v1;
v1.m128_f32[0] = pMesh[i].GetVertices()[j].x;
v1.m128_f32[1] = pMesh[i].GetVertices()[j].y;
v1.m128_f32[2] = pMesh[i].GetVertices()[j].z;
v1.m128_f32[3] = 1.f;
projectedVectors[i].verts.emplace_back(XMVECTOR());
v1 = XMVector4Transform(XMVector4Transform(v1, *mView), *mProj);;
v1 = XMVector4Normalize(v1);

The result you get from this transform is in so-called clip space. This is an artificial 4D-space, where the w-component denotes the common denominator for all other components. Therefore, instead of normalizing, you want to divide by w (so-called perspective divide).
Btw, instead of assigning the components of the vector to the registers yourself, you could also use XMLoadFloat4(). This would take care of everything taking into account what is available.

Related

Does OpenGL change vertices in memory when you apply transformations?

So let's say I have a single vertex (to make things easy) in my program, (0, 0, 0). Right at the origin. I render a single frame with a simple translation matrix, moving the vertex two units down the x-axis. The vertex is rendered accordingly. Does the same vertex now show up in the VRAM as
(2, 0, 0)? I've read that it's important to load all the respective identity matrices in OpenGL every time a frame is rendered--and I assume that's because everything would continually move, rotate, etc. further and further, implying that applying transformations DOES modify actual data, not just the appearance onscreen.
Strictly speaking, OpenGL is just an API definition. An implementation can do whatever it wants as long as it meets the specifications.
That being said, the answer to your question is generally: NO. It's hard to picture how storing transformed vertices back into the memory that also contained the original vertices would ever make sense.
The original vertex positions are passed into the vertex shader, where they are processed, which can include transformations. Once they exit the vertex shader, the transformed positions will most likely be stored in some kind of cache or dedicated on-chip GPU memory until they are processed by the next steps of the pipeline, which includes perspective division, application of the viewport transform, and rasterization. Once those vertex processing steps are completed, the transformed vertices can be discarded. They may stay in a cache for a little longer, for possible reuse of the processed vertex in case the same original vertex is used again. But they are not stored in any persistent way.
The way I interpret it, what you heard about having to reset the matrices for each frame was probably a misunderstanding. If you want to apply the same matrices in the next frame, you don't have to do anything at all.
What they were most likely talking about is related to how the matrix stack in legacy OpenGL works. Most calls that modify the current matrix, like glTranslatef(), glRotatef(), etc, are applied incrementally to the current matrix. For example, if you call glRotatef(), the rotation is combined with the transformation that was already on the matrix stack. The result it that your newly specified rotation is applied to the vertices first, followed by the transformations that were already on the matrix stack.
Based on this, if you want to specify transformations from scratch at the start of each frame, you will call glLoadIdentity() to reset the current transformation on the matrix stack before you start specifying your new transformations. Or you can use glPushMatrix()/glPopMatrix() to save and restore the desired state of the matrix stack.
If you use what many people call "modern OpenGL", meaning that you don't use the legacy fixed pipeline functionality, you don't have to worry about any of that. The matrix stack is gone for good, and you get to calculate your own transformation matrices, and pass them to your shader code.
Here is a link on wiki about the mathematics involved with Transformation Matrices. https://en.wikipedia.org/wiki/Transformation_matrix this will give you an understanding of the math behind the scenes. Another way to look at this is also on the lines of linear or vector algebra. So what happens under the hood when you render a scene is that all of the vertex (pixel) data is sent from the CPU to the GPU to be rasterized and drawn to the screen. This is your batch process or render call, now you also have a frame function that will happen x amount of times per second which will give you your frames per second. So if you are rendering at say 60 FPS then these pixels, vertices, triangles etc., will be drawn 60 times each second. When you apply a transformation to this set of vertices what happens here is you have a transformation matrix that is being multiplied to your model view projection matrix. MVP * T which this will be saved back into your existing MVP matrix if this is how you have your calculations set up. There are some differences between which version of OpenGL you are using as you go from OpenGL v1.0 Pure CPU calls up to v4.5. As far as I know after version 3.2 or 3.3 I don't remember which version off hand you have to implement the MVP yourself where versions greater than v1.5 where shaders were first introduced was handled for you already. Here is the documentation on OpenGL https://www.opengl.org/ and on the main page there will be a topic that says documentation from there you can either select OpenGL Registry or which ever specific version you want to look at. From here you can read their documentation about the OpenGL API since this site covers everything that is available in their API. So as you begin to understand this process, yes the actual coordinate data for these vertices does change, however it will not continuously change unless you are incrementing a static type variable with a factor of time thus giving you some kind of simulation of movement or animation. If you apply only a single transformation then these pixels, vertices, triangles, etc., will either Rotate, Translate, Scale, or Shear depending on which Transformation you are applying. I will tell you that the order of these operations does matter, but I will not tell you which order they are, that will be for you to read up on and to figure out. These reason this does matter is due to the fact that not every Matrix Multiplication has a valid Inverse Matrix. The Identity is used for reasons such as round off errors and floating point precision, so that if you happen to apply say 1,000 transformations in a matter of about 10 seconds, you do not have astronomical errors. This should be enough to point you in the right direction and also serve as a guide as to how the OpenGL API works.

How do I get started with a GPU voxelizer?

I've been reading various articles about how to write a GPU voxelizer. From my understanding the process goes like this:
Inspect the triangles individually and decide the axis that displays the triangle in the largest way. Call this the dominant axis.
Render the triangle on its dominant axis and sample the texels that come out.
Write that texel data onto a 3D texture and then do what you will with the data
Disregarding conservative rasterization, I have a lot of questions regarding this process.
I've gotten as far as rendering each triangle, choosing a dominant axis and orthogonally projecting it. What should the values of the orthogonal projection be? Should it be some value based around the size of the voxels or how large of an area the map should cover?
What am I supposed to do in the fragment shader? How do I write to my 3D texture such that it stores the voxel data? From my understanding, due to choosing the dominant axis we can't have more than a depth of 1 voxel for each fragment. However, since we projected orthogonally I don't see how that would reflect onto the 3D texture.
Finally, I am wondering on where to store the texture data. I know it's a bad idea to store data CPU side since you have to pass it all in to use it on the GPU, however the sourcecode I am kind of following chooses to store all its texture on the CPU side, such as those for a light map. My assumption is that data that will only be used on the GPU should be stored there and data used on both should be stored on the CPU side of things. So, from this I store my data on the CPU side. Is that correct?
My main sources have been: https://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-SparseVoxelization.pdf OpenGL Insights
https://github.com/otaku690/sparsevoxeloctree A SVO using a voxelizer. The issue is that the shader code is not in the github.
In my own implementation, the whole scene is positioned and scaled into one unit cube centered on world origin. The modelview-project matrices are straightforward then. And the viewport is simply the desired voxel resolution.
I use 2-pass approach to output those voxel fragments: the 1st pass calculate the number of output voxel fragments by accumulating a single variable using atomic counter. Then I use the info to allocate a linear buffer.
In the 2nd pass the rasterized voxel fragments are stored into the allocated linear buffer, using atomic counter to avoid write conflict.

Why would it be beneficial to have a separate projection matrix, yet combine model and view matrix?

When you are learning 3D programming, you are taught that it's easiest think in terms of 3 transformation matrices:
The Model Matrix. This matrix is individual to every single model and it rotates and scales the object as desired and finally moves it to its final position within your 3D world. "The Model Matrix transforms model coordinates to world coordinates".
The View Matrix. This matrix is usually the same for a large number of objects (if not for all of them) and it rotates and moves all objects according to the current "camera position". If you imaging that the 3D scene is filmed by a camera and what is rendered on the screen are the images that were captured by this camera, the location of the camera and its viewing direction define which parts of the scene are visible and how the objects appear on the captured image. There are little reasons for changing the view matrix while rendering a single frame, but those do in fact exists (e.g. by rendering the scene twice and changing the view matrix in between, you can create a very simple, yet impressive mirror within your scene). Usually the view matrix changes only once between two frames being drawn. "The View Matrix transforms world coordinates to eye coordinates".
The Projection Matrix. The projection matrix decides how those 3D coordinates are mapped to 2D coordinates, e.g. if there is a perspective applied to them (objects get smaller the farther they are away from the viewer) or not (orthogonal projection). The projection matrix hardly ever changes at all. It may have to change if you are rendering into a window and the window size has changed or if you are rendering full screen and the resolution has changed, however only if the new window size/screen resolution has a different display aspect ratio than before. There are some crazy effects for that you may want to change this matrix but in most cases its pretty much constant for the whole live of your program. "The Projection Matrix transforms eye coordinates to screen coordinates".
This makes all a lot of sense to me. Of course one could always combine all three matrices into a single one, since multiplying a vector first by matrix A and then by matrix B is the same as multiplying the vector by matrix C, where C = B * A.
Now if you look at the classical OpenGL (OpenGL 1.x/2.x), OpenGL knows a projection matrix. Yet OpenGL does not offer a model or a view matrix, it only offers a combined model-view matrix. Why? This design forces you to permanently save and restore the "view matrix" since it will get "destroyed" by model transformations applied to it. Why aren't there three separate matrices?
If you look at the new OpenGL versions (OpenGL 3.x/4.x) and you don't use the classical render pipeline but customize everything with shaders (GLSL), there are no matrices available any longer at all, you have to define your own matrices. Still most people keep the old concept of a projection matrix and a model-view matrix. Why would you do that? Why not using either three matrices, which means you don't have to permanently save and restore the model-view matrix or you use a single combined model-view-projection (MVP) matrix, which saves you a matrix multiplication in your vertex shader for ever single vertex rendered (after all such a multiplication doesn't come for free either).
So to summarize my question: Which advantage has a combined model-view matrix together with a separate projection matrix over having three separate matrices or a single MVP matrix?
Look at it practically. First, the fewer matrices you send, the fewer matrices you have to multiply with positions/normals/etc. And therefore, the faster your vertex shaders.
So point 1: fewer matrices is better.
However, there are certain things you probably need to do. Unless you're doing 2D rendering or some simple 3D demo-applications, you are going to need to do lighting. This typically means that you're going to need to transform positions and normals into either world or camera (view) space, then do some lighting operations on them (either in the vertex shader or the fragment shader).
You can't do that if you only go from model space to projection space. You cannot do lighting in post-projection space, because that space is non-linear. The math becomes much more complicated.
So, point 2: You need at least one stop between model and projection.
So we need at least 2 matrices. Why model-to-camera rather than model-to-world? Because working in world space in shaders is a bad idea. You can encounter numerical precision problems related to translations that are distant from the origin. Whereas, if you worked in camera space, you wouldn't encounter those problems, because nothing is too far from the camera (and if it is, it should probably be outside the far depth plane).
Therefore: we use camera space as the intermediate space for lighting.
In most cases your shader will need the geometry in world or eye coordinates for shading so you have to seperate the projection matrix from the model and view matrices.
Making your shader multiply the geometry with two matrices hurts performance. Assuming each model have thousends (or more) vertices it is more efficient to compute a model view matrix in the cpu once, and let the shader do one less mtrix-vector multiplication.
I have just solved a z-buffer fighting problem by separating the projection matrix. There is no visible increase of the GPU load. The two folowing screenshots shows the two results - pay attention to the green and white layers fighting.

COLLADA: Inverse bind pose in the wrong space?

I'm working on writing my own COLLADA importer. I've gotten pretty far, loading meshes and materials and such. But I've hit a snag on animation, specifically: joint rotations.
The formula I'm using for skinning my meshes is straight-forward:
weighted;
for (i = 0; i < joint_influences; i++)
{
weighted +=
joint[joint_index[i]]->parent->local_matrix *
joint[joint_index[i]]->local_matrix *
skin->inverse_bind_pose[joint_index[i]] *
position *
skin->weight[j];
}
position = weighted;
And as far as the literature is concerned, this is the correct formula. Now, COLLADA specifies two types of rotations for the joints: local and global. You have to concatenate the rotations together to get the local transformation for the joint.
What the COLLADA documentation does not differentiate between is the joint's local rotation and the joint's global rotation. But in most of the models I've seen, rotations can have an id of either rotate (global) or jointOrient (local).
When I disregard the global rotations and only use the local ones, I get the bind pose for the model. But when I add the global rotations to the joint's local transformation, strange things start to happen.
This is without using global rotations:
And this is with global rotations:
In both screenshots I'm drawing the skeleton using lines, but in the first it's invisible because the joints are inside the mesh. In the second screenshot the vertices are all over the place!
For comparison, this is what the second screenshot should look like:
It's hard to see, but you can see that the joints are in the correct position in the second screenshot.
But now the weird thing. If I disregard the inverse bind pose as specified by COLLADA and instead take the inverse of the joint's parent local transform times the joint's local transform, I get the following:
In this screenshot I'm drawing a line from each vertex to the joints that have influence. The fact that I get the bind pose is not so strange, because the formula now becomes:
world_matrix * inverse_world_matrix * position * weight
But it leads me to suspect that COLLADA's inverse bind pose is in the wrong space.
So my question is: in what space does COLLADA specifies its inverse bind pose? And how can I transform the inverse bind pose to the space I need?
I started by comparing my values to the ones I read from Assimp (an open source model loader). Stepping through the code I looked at where they built their bind matrices and their inverse bind matrices.
Eventually I ended up in SceneAnimator::GetBoneMatrices, which contains the following:
// Bone matrices transform from mesh coordinates in bind pose to mesh coordinates in skinned pose
// Therefore the formula is offsetMatrix * currentGlobalTransform * inverseCurrentMeshTransform
for( size_t a = 0; a < mesh->mNumBones; ++a)
{
const aiBone* bone = mesh->mBones[a];
const aiMatrix4x4& currentGlobalTransform
= GetGlobalTransform( mBoneNodesByName[ bone->mName.data ]);
mTransforms[a] = globalInverseMeshTransform * currentGlobalTransform * bone->mOffsetMatrix;
}
globalInverseMeshTransform is always identity, because the mesh doesn't transform anything. currentGlobalTransform is the bind matrix, the joint's parent's local matrices concatenated with the joint's local matrix. And mOffsetMatrix is the inverse bind matrix, which comes directly from the skin.
I checked the values of these matrices to my own (oh yes I compared them in a watch window) and they were exactly the same, off by maybe 0.0001% but that's insignificant. So why does Assimp's version work and mine doesn't even though the formula is the same?
Here's what I got:
When Assimp finally uploads the matrices to the skinning shader, they do the following:
helper->piEffect->SetMatrixTransposeArray( "gBoneMatrix", (D3DXMATRIX*)matrices, 60);
Waaaaait a second. They upload them transposed? It couldn't be that easy. No way.
Yup.
Something else I was doing wrong: I was converting the coordinates the right system (centimeters to meters) before applying the skinning matrices. That results in completely distorted models, because the matrices are designed for the original coordinate system.
FUTURE GOOGLERS
Read all the node transforms (rotate, translation, scale, etc.) in the order you receive them.
Concatenate them to a joint's local matrix.
Take the joint's parent and multiply it with the local matrix.
Store that as the bind matrix.
Read the skin information.
Store the joint's inverse bind pose matrix.
Store the joint weights for each vertex.
Multiply the bind matrix with the inverse bind pose matrix and transpose it, call it the skinning matrix.
Multiply the skinning matrix with the position times the joint weight and add it to the weighted position.
Use the weighted position to render.
Done!
BTW, if you transpose the matrices upon loading them rather than transposing the matrix at the end (which can be problematic when animating) you want to perform your multiplication differently (the method you use above appears to be for using skinning in DirectX when using OpenGL friendly matrices - ergo the transpose.)
In DirectX I transpose matrices when they are loaded from the file and then I use (in the example below I am simply applying the bind pose for the sake of simplicity):
XMMATRIX l_oWorldMatrix = XMMatrixMultiply( l_oBindPose, in_oParentWorldMatrix );
XMMATRIX l_oMatrixPallette = XMMatrixMultiply( l_oInverseBindPose, l_oWorldMatrix );
XMMATRIX l_oFinalMatrix = XMMatrixMultiply( l_oBindShapeMatrix, l_oMatrixPallette );

Mapping from 2D projection back to 3D point cloud

I have a 3D model consisting of point vertices (XYZ) and eventually triangular faces.
Using OpenGL or camera-view-matrix-projection I can project the 3D model to a 2D plane, i.e. a view window or an image with m*n resolution.
The question is how can I determine the correspondence between a pixel from the 2D projection plan and its corresponding vertex (or face) from the original 3D model.
Namely,
What is the closest vertices in 3D model for a given pixel from 2D projection?
It sounds like picking in openGL or raytracing problem. Is there however any easy solution?
With the idea of ray tracing it is actually about finding the first vertex/face intersected with the ray from a view point. Can someone show me some tutorial or examples? I would like to find an algorithm independent from using OpenGL.
Hit testing in OpenGL usually is done without raytracing. Instead, as each primitive is rendered, a plane in the output is used to store the unique ID of the primitive. Hit testing is then as simple as reading the ID plane at the cursor location.
My (possibly-naive) thought would be to create an array of the vertices and then sort them by their distance (or distance-squared, for speed) once projected to your screen point. The first item in the list will be closest. It will be O(n) for n vertices, but no worse.
Edit: Better for speed and memory: simply loop through all vertices and keep track of the vertex whose projection is closest (distance squared) to your viewport pixel. This assumes that you are able to perform the projection yourself, without relying on OpenGL.
For example, in pseudo-code:
function findPointFromViewPortXY( pointOnViewport )
closestPoint = false
bestDistance = false
for (each point in points)
projectedXY = projectOntoViewport(point)
distanceSquared = distanceBetween(projectedXY, pointOnViewport)
if bestDistance==false or distanceSquared<bestDistance
closestPoint = point
bestDistance = distanceSquared
return closestPoint
In addition to Ben Voigt's answer:
If you do a separate pass over pickable objects, then you can set the viewport to contain only a single pixel that you will read.
You can also encode triangle ID by using geometry shader (gl_PrimitiveID).