Skeletal animation with ASSIMP - c++

I have been trying to implement skeletal animation in my own 3D openGL/c++ game engine using ASSIMP to import the models.
I think that the problem is caused by the calculation of the bone matrices, loaded as uniform variables in the shader because if I assign them to identities, the mesh is loaded in its bind pose.
These are the functions I use to initialize and calculate the ready bone matrices:
void Joint::AddToSceneGraph(GameObject* root)
{
GameObject* parent = root->FindChild(m_name);
//m_globalInverseMatrix = inverse(root->GetTransform().GetMatrix());
parent->AddComponent(this);
while (parent->GetParent() != root)
parent = parent->GetParent();
parent->GetTransform().SetParent(NULL); (1)
}
mat4 Joint::GetMatrix()
{
mat4 ans = /*m_globalInverseMatrix*/GetTransform().GetMatrix()*m_offsetMatrix;
return ans;
}
Since I am trying to render the model in its bind pose I wont supply you the code of calculating animation matrices.
Some clarifications - I have a Transform class which has Transform* parent and whose GetMatrix() method calculates the matrix from the parent's one, scaling and translating vec3s and rotation quaternion so the relation between parent and child is taken in consideration. I assume that the parent's transform of the root joint has to be NULL thus its matrix - identity, which is the purpose of (1). Although I am not sure for this assumption, I am sure that the node with the root of the skeleton and the node, containing the meshes, are siblings.
Also I am not sure weather I should use m_globalInverseMatrix and furthermore what exactly is its purpose.
In general I think the main issue I have is misunderstanding of how ASSIMP calculates the offset matrix or so called inverse bind pose matrix and how to invert its effect. This issue results in the model looking "packed" in itself :

From the docs in Assimp, the offset matrix of the bones transforms from the mesh space to the bone space.
I managed to render the bind pos with a hierarchy similar to yours (if I have not misunderstood, you have a root node with two children, one spawning the skeleton hierarchy and the other spawning the mesh). What you have to do is set the m_globalInverseMatrix to the inverse transform of the node containing the mesh. Then, each bone's transform is:
boneTransform = m_globalInverseMatrix * boneHierarchyTransforms * boneOffsetMatrix
where the boneHierarchyTransform comes from traversing the tree up to each bone node and the boneOffsetMatrix is the one you mention and I referred to in the first paragraph.

Related

How do I calculate the start matrix for each bone(t-pose) (using collada and opengl)

I have already loaded vertices, materials, normals, weights, joint IDs, joints themselves and the parent children info (hierarchy), I have also managed to render it all, and when I rotate or translate one of the joints, the children rotate with the parent.
My problem is, the parent rotates on a wrong point or offset (hopefully you understand what I mean), this means, that I've gotten the initial offsets wrong, right? To get the start t-pose, I'm guessing I don't need rotation or translation, only the offset of the position of the joint, but I have no clue of how to get it, been stuck for ages. In the Collada file, there is a transform for each joint, I've loaded that one also, but I don't know how to implement it correctly, my 3d model gets deformed and looks wrong.
If you answer this question please make it as if you where explaining it to a monkey (me), and step by step if possible, I'm unfamiliar with these bind and inverse bind terms, and very confused. I think that if i manage to get this, I'll eventually figure out the rest of skeletal animation myself, so it's just this little thing.
I've recently gotten bones, joints and nodes working, so I'll try to explain exactly how I achieved it. Do note, I am using Assimp to import my DAE files, but as far as I know, Assimp doesn't do any processing on the data, so this explanation should directly relate to the data in the Collada file.
I'm just learning all this myself, so I may get things wrong. If I do, anyone, please tell me and I will update this answer accordingly.
Semantics
A mesh is a set of vertices, normals, texture coordinates and faces. The points stored in a mesh are in a bind pose, or a rest pose. This is often, but not always, a T-pose.
A skin is a controller. It refers to a single mesh, and contains the list of bones that will modify that mesh (this is where the bones are stored). You can think of the skin element as the actual model (or part of the model) that will be rendered.
A bone is a flat list of names and associated matrices. There is no hierarchical data here, it is simply a flat list. The hierarchy is provided by the nodes that refer to the bones.
A node, or joint, is a hierarchical data element. They are stored in a hierarchy, with a parent node having zero or more child nodes. A node may be linked to zero or more bones, and may be linked to zero or more skins. There should only be one root node. A joint is the same as a node, so I will refer to joins as nodes.
Do note that nodes and bones are separate. You do not modify a bone to animate your model. Instead, you modify a node, which gets applied to the bone when the model is rendered.
Skin
A skin is the thing you will render. A skin always refers to one single mesh. You can have multiple skins in a DAE file, as part of the same model (or, scene). Sometimes, a model will reuse meshes by transforming them. For instance, you may have a mesh for a single arm, and reuse that arm, mirrored, for the other side of the body. I believe that is what the bind_shape_matrix value of a skin is used for. So far, I haven't used this, and my matrices are always identity, so I cannot speak as to it's usage.
Bone
A bone is what applies transformations to your model. You do not modify bones. Instead, you modify the nodes that control the bones. More on this later.
A bone consists of the following:
A name, used to find the node that controls this bone (Name_array)
A bind pose matrix, sometimes called an "inverse bind matrix" or an "offset matrix" (bind_poses array)
A list of vertex indices that the bone will affect (vertex_weights element)
A list of weights of the same length above, that tell how much the bone will affect that vertex. (weights array)
Node
A node is a hierarchical data element, describing how the model gets transformed when rendered. You will always start with one root node, and travel up the node tree, applying transforms in sequence. I use a depth-first algorithm for this.
The node tells how the model, skins, and bones should be transformed when rendering or animating.
A node may refer to a skin. This means that skin will be used as part of the render for this model. If you see a node refer to a skin, it gets included when rendering.
A node consists of the following:
A name (sid attribute)
A transform matrix (transform element)
Child nodes (node elements)
GlobalInverseTransform Matrix
The GlobalInverseTransform matrix is calculated by taking the Transform matrix of the first node, and inverting it. Simple as that.
The Algorithm
Now we can get to the good bits - the actual skinning and rendering.
Calculating a node's LocalTransform
Each node should have a matrix, called the LocalTransform matrix. This matrix isn't in the DAE file, but is calculated by your software. It is basically the accumulation of the Transform matrices of the node, and all its parents.
First step is to traverse the node hierarchy.
Start at the first node, and calculate the LocalTransform for the node, using the Transform matrix of the node, and the LocalTransform of the parent. If the node has no parent, use an identity matrix as the parent's LocalTransform matrix.
Node.LocalTransform = ParentNode.LocalTransform * Node.Transform
Repeat this process recursively for every child node in this node.
Calculating a bone's FinalTransform matrix
Just like a node, a bone should have a FinalTransform matrix. Again, this is not stored in the DAE file, it is calculated by your software as part of the render process.
For each mesh used, for each bone in that mesh, apply the following algorithm:
For each mesh used:
For each bone in mesh:
If a node with the same name exists:
Bone.FinalTransform = Bone.InverseBind * Node.LocalTransform * GlobalInverseTransform
Otherwise:
Bone.FinalTransform = Bone.InverseBind * GlobalInverseTransform
We now have the FinalTransform matrix for each bone in the model.
Calculating a vertex's position
Once we have all the bones calculated, we can then transform the mesh's points into their final render locations. This is the algorithm I use. This is not the "correct" way to do this, as it should be calculated by a vertex shader on-the-fly, but it works to demonstrate what's happening.
From the root node:
For each mesh referred to by node:
Create an array to hold the transformed vertices, the same size as your source vertices array.
Create an array to hold the transformed normals, the same size as your source vertices array (normals and vertices arrays should be the same length at the beginning.
If the mesh has no bones:
Copy source vertices and source normals to output arrays - mesh is not skinned
Otherwise:
For every bone in the mesh:
For every weight in the bone:
OutputVertexArray(Weight.VertexIndex) = Mesh.InputVertexArray(Weight.VertexIndex) * Bone.FinalTransform * Weight.TransformWeight
OutputNormalArray(Weight.VertexIndex) = Normalize(Mesh.InputNormalArray(Weight.VertexIndex) * Bone.FinalTransform * Weight.TransformWeight)
Render the mesh, using OutputVertexArray, OutputNormalArray, Mesh.InputTexCoordsArray and the mesh's face indices.
Recursively call this process for each child node.
This should get you a correctly rendered output.
Note that with this system, it is possible to re-use a mesh more than once.
Animating
Just a quick note on animating. I haven't done much with this, and Assimp hides much of the gory details of Collada (and introduces its own form of gore), but to use predefined animations from your file, you do some interpolation of translations, rotations and scales to come up with a matrix that represents a node's animated state at as single point in time.
Remember, matrix construction follows the TRS (translate, rotate, scale) convention, where translations happen first, then rotations, then scale.
AnimatedNodeTransform = TranslationMatrix * RotationMatrix * ScaleMatrix
The generated matrix completely replaces the node's Transform matrix - it is not combined with the matrix.
I am still trying to work out how to perform on-the-fly animation (think Inverse Kinematics) correctly. For some models I try, it works great. I can apply a quaternion to the node’s Transform matrix and it will work. However, some other models will do strange things, like rotate the node around the origin, so I think I’m still missing something there. If I finally solve this, I will update this section to reflect what I discover.
Hope this helps. If I've missed anything, or gotten anything wrong, anyone please feel free to correct me. I am only learning this stuff myself. If I notice any mistakes, I will edit the answer.
Also, be aware that I use Direct3D, so my matrix multiplication order is probably reversed from yours. You will likely need to flip the multiplication order of some of the operations in my answer.

Matrix calculations for gpu skinning

I'm trying to do skeletal animation in OpenGL using Assimp as my model import library.
What exactly do I need to the with the bones' offsetMatrix variable? What do I need to multiply it by?
Let's take for instance this code, which I used to animate characters in a game I worked. I used Assimp too, to load bone information and I read myself the OGL tutorial already pointed out by Nico.
glm::mat4 getParentTransform()
{
if (this->parent)
return parent->nodeTransform;
else
return glm::mat4(1.0f);
}
void updateSkeleton(Bone* bone = NULL)
{
bone->nodeTransform = bone->getParentTransform() // This retrieve the transformation one level above in the tree
* bone->transform //bone->transform is the assimp matrix assimp_node->mTransformation
* bone->localTransform; //this is your T * R matrix
bone->finalTransform = inverseGlobal // which is scene->mRootNode->mTransformation from assimp
* bone->nodeTransform //defined above
* bone->boneOffset; //which is ai_mesh->mBones[i]->mOffsetMatrix
for (int i = 0; i < bone->children.size(); i++) {
updateSkeleton (&bone->children[i]);
}
}
Essentially the GlobalTransform as it is referred in the tutorial Skeletal Animation with Assimp or properly the transform of the root node scene->mRootNode->mTransformation is the transformation from local space to global space. To give you an example, when in a 3D modeler (let's pick Blender for instance) you create your mesh or you load your character, it is usually positioned (by default) at the origin of the Cartesian plane and its rotation is set to the identity quaternion.
However you can translate/rotate your mesh/character from the origin (0,0,0) to somewhere else and have in a single scene even multiple meshes with different positions. When you load them, especially if you do skeletal animation, it is mandatory to translate them back in local space (i.e. back at the origin 0,0,0 ) and this is the reason why you have to multiply everything by the InverseGlobal (which brings back your mesh to local space).
After that you need to multiply it by the node transform which is the multiplication of the parentTransform (the transformation one level up in the tree, this is the overall transform) the transform (formerly the assimp_node->mTransformation which is just the transformation of the bone relative to the node's parent) and your local transformation (any T * R) you want to apply to do: forward kinematic, inverse kinematic or key-frame interpolation.
Eventually there is the boneOffset (ai_mesh->mBones[i]->mOffsetMatrix) that transforms from mesh space to bone space in bind pose as stated in the documentation.
Here there is a link to GitHub if you want to look at the whole code for my Skeleton class.
Hope it helps.
The offset matrix defines the transform (translation, scale, rotation) that transforms the vertex in mesh space, and converts it to "bone" space. As an example consider the following vertex and a bone with the following properties;
Vertex Position<0, 1, 2>
Bone Position<10, 2, 4>
Bone Rotation<0,0,0,1> // Note - no rotation
Bone Scale<1, 1, 1>
If we multiply a vertex by the offset Matrix in this case we would get a vertex position of <-10, -1, 2>.
How do we use this? You have two options on how to use this matrix which is down to how we store the vertex data in the vertex buffers. The options are;
1) Store the mesh vertices in mesh space
2) Store the mesh vertices in bone space
In the case of #1, we would take the offsetMatrix and apply it to the vertices that are influenced by the bone as we build the vertex buffer. And then when we animate the mesh, we later apply the animated matrix for that bone.
In the case of #2, we would use the offsetMatrix in combination with the animation matrix for that bone when transforming the vertices stored in the vertex buffer. So it would be something like (Note: you may have to switch the matrix concatenations around here);
anim_vertex = (offset_matrix * anim_matrix) * mesh_vertex
Does this help?
As I already assumed, the mOffsetMatrix is the inverse bind pose matrix. This tutorial states the correct transformations that you need for linear blend skinning:
You first need to evaluate your animation state. This will give you a system transform from animated bone space to world space for every bone (GlobalTransformation in the tutorial). The mOffsetMatrix is the system transform from world space to bind pose bone space. Therefore, what you do for skinning is the following (assuming that a specific vertex is influenced by a single bone): Transform the vertex to bone space with mOffsetMatrix. Now assume an animated bone and transform the intermediate result back from animated bone space to world space. So:
boneMatrix[i] = animationMatrix[i] * mOffsetMatrix[i]
If the vertex is influenced by multiple bones, LBS simply averages the results. That's where the weights come into play. Skinning is usually implemented in a vertex shader:
vec4 result = vec4(0);
for each influencing bone i
result += weight[i] * boneMatrix[i] * vertexPos;
Usually, the maximum number of influencing bones is fixed and you can unroll the for loop.
The tutorial uses an additional m_GlobalInverseTransform for the boneMatrix. However, I have no clue why they do that. Basically, this undoes the overall transformation of the entire scene. Probably it is used to center the model in the view.

How to implement joints and bones in openGL?

I am in the process of rolling my own openGL framework, and know how to draw 3d objects ... etc...
But how do you define relationships between 3d objects that may have a joint?
Or how do you define the 3d object as being a "bone"?
Are there any good resources?
As OpenGL is only a graphics library and not a 3D modeling framework the task of defining and using "bones" falls onto you.
There are different ways of actually implementing it, but the general idea is:
You treat each part of your model as a bone (e.g. head, torso, lower legs, upper legs, etc).
Each bone has a parent which it is connected to (e.g. the parent of the lower left leg is the upper left leg).
Thus each bone has a number of children.
Now you define each bone's position as a relative position to the parent bone. When displaying a bone you now multiply it's relative position with the parent bone's relative position to get the absolute position.
To visualize:
Think of it as a doll. When you grab the doll's arm and move it around, the relative position (and rotation) of the hand won't change. Its absolute position WILL change because you've moved one of its parents around.
When tried skeletal animations I learnt most of it from this link:
http://content.gpwiki.org/index.php/OpenGL:Tutorials:Basic_Bones_System
But how do you define relationships between 3d objects that may have a joint?
OpenGL does not care about these things. I't a pure drawing API. So it's upon you to unleash your creativity and define such structures yourself. The usual approach to skeletal animatio is having a bone/rig system, where each bone has an orientation (represented by a quaternion or a 3×3 matrix) a length and a list of bones attached to it further, i.e. some kind of tree.
I'd define this structure as
typedef float quaternion[4];
struct Bone {
quaternion orientation;
float length;
int n_subbones;
Bone *subbones;
};
In addition to that you need a pivot from where the rig starts. I'd do it like this
typedef float vec3[3];
struct GeomObjectBase {
vec3 position;
quaternion orientation;
};
struct BoneRig {
struct GeomObjectBase gob;
struct Bone pivot_bone;
}
Next you need some functions that iterate through this structure, generate the matrix palette out of it, so that it can be applied to the model mesh.
note: I'm using freeglut
Totally irrelevant

COLLADA: Inverse bind pose in the wrong space?

I'm working on writing my own COLLADA importer. I've gotten pretty far, loading meshes and materials and such. But I've hit a snag on animation, specifically: joint rotations.
The formula I'm using for skinning my meshes is straight-forward:
weighted;
for (i = 0; i < joint_influences; i++)
{
weighted +=
joint[joint_index[i]]->parent->local_matrix *
joint[joint_index[i]]->local_matrix *
skin->inverse_bind_pose[joint_index[i]] *
position *
skin->weight[j];
}
position = weighted;
And as far as the literature is concerned, this is the correct formula. Now, COLLADA specifies two types of rotations for the joints: local and global. You have to concatenate the rotations together to get the local transformation for the joint.
What the COLLADA documentation does not differentiate between is the joint's local rotation and the joint's global rotation. But in most of the models I've seen, rotations can have an id of either rotate (global) or jointOrient (local).
When I disregard the global rotations and only use the local ones, I get the bind pose for the model. But when I add the global rotations to the joint's local transformation, strange things start to happen.
This is without using global rotations:
And this is with global rotations:
In both screenshots I'm drawing the skeleton using lines, but in the first it's invisible because the joints are inside the mesh. In the second screenshot the vertices are all over the place!
For comparison, this is what the second screenshot should look like:
It's hard to see, but you can see that the joints are in the correct position in the second screenshot.
But now the weird thing. If I disregard the inverse bind pose as specified by COLLADA and instead take the inverse of the joint's parent local transform times the joint's local transform, I get the following:
In this screenshot I'm drawing a line from each vertex to the joints that have influence. The fact that I get the bind pose is not so strange, because the formula now becomes:
world_matrix * inverse_world_matrix * position * weight
But it leads me to suspect that COLLADA's inverse bind pose is in the wrong space.
So my question is: in what space does COLLADA specifies its inverse bind pose? And how can I transform the inverse bind pose to the space I need?
I started by comparing my values to the ones I read from Assimp (an open source model loader). Stepping through the code I looked at where they built their bind matrices and their inverse bind matrices.
Eventually I ended up in SceneAnimator::GetBoneMatrices, which contains the following:
// Bone matrices transform from mesh coordinates in bind pose to mesh coordinates in skinned pose
// Therefore the formula is offsetMatrix * currentGlobalTransform * inverseCurrentMeshTransform
for( size_t a = 0; a < mesh->mNumBones; ++a)
{
const aiBone* bone = mesh->mBones[a];
const aiMatrix4x4& currentGlobalTransform
= GetGlobalTransform( mBoneNodesByName[ bone->mName.data ]);
mTransforms[a] = globalInverseMeshTransform * currentGlobalTransform * bone->mOffsetMatrix;
}
globalInverseMeshTransform is always identity, because the mesh doesn't transform anything. currentGlobalTransform is the bind matrix, the joint's parent's local matrices concatenated with the joint's local matrix. And mOffsetMatrix is the inverse bind matrix, which comes directly from the skin.
I checked the values of these matrices to my own (oh yes I compared them in a watch window) and they were exactly the same, off by maybe 0.0001% but that's insignificant. So why does Assimp's version work and mine doesn't even though the formula is the same?
Here's what I got:
When Assimp finally uploads the matrices to the skinning shader, they do the following:
helper->piEffect->SetMatrixTransposeArray( "gBoneMatrix", (D3DXMATRIX*)matrices, 60);
Waaaaait a second. They upload them transposed? It couldn't be that easy. No way.
Yup.
Something else I was doing wrong: I was converting the coordinates the right system (centimeters to meters) before applying the skinning matrices. That results in completely distorted models, because the matrices are designed for the original coordinate system.
FUTURE GOOGLERS
Read all the node transforms (rotate, translation, scale, etc.) in the order you receive them.
Concatenate them to a joint's local matrix.
Take the joint's parent and multiply it with the local matrix.
Store that as the bind matrix.
Read the skin information.
Store the joint's inverse bind pose matrix.
Store the joint weights for each vertex.
Multiply the bind matrix with the inverse bind pose matrix and transpose it, call it the skinning matrix.
Multiply the skinning matrix with the position times the joint weight and add it to the weighted position.
Use the weighted position to render.
Done!
BTW, if you transpose the matrices upon loading them rather than transposing the matrix at the end (which can be problematic when animating) you want to perform your multiplication differently (the method you use above appears to be for using skinning in DirectX when using OpenGL friendly matrices - ergo the transpose.)
In DirectX I transpose matrices when they are loaded from the file and then I use (in the example below I am simply applying the bind pose for the sake of simplicity):
XMMATRIX l_oWorldMatrix = XMMatrixMultiply( l_oBindPose, in_oParentWorldMatrix );
XMMATRIX l_oMatrixPallette = XMMatrixMultiply( l_oInverseBindPose, l_oWorldMatrix );
XMMATRIX l_oFinalMatrix = XMMatrixMultiply( l_oBindShapeMatrix, l_oMatrixPallette );

3D Scene graph traversing problem

I have implemented a small scene graph to be rendered by OpenGL, all the objects derive from a common Node class and during OpenGL frame rendering I just call the visit method of the root node and it traverses the graph recursively. The first matrix I pass when beginning traversal is the camera matrix.
The visit method looks like this:
void Node::visit(const QMatrix4x4 &mv) {
QMatrix4x4 m = mv;
m.rotate(m_rot);
m.translate(m_pos);
m.scale(m_scale);
m_effectiveMV = m;
for (int i = 0; i < m_children.size(); i++) {
m_children[i]->visit(m_effectiveMV);
}
draw(); // draws if this node has anything to draw,
// otherwise just transformation.
}
The problem I experience is, when I set rotation for a child node, the rotation happens relative to the parent node, not around the node itself. Can anyone spot what I'm doing wrong here?
Assuming your matrix methods are doing the right thing, translation should be the first one in the list:
m.translate(m_pos);
m.rotate(m_rot);
m.scale(m_scale);
This will first scale and rotate the vertex, then translate it into the parents system and so on.
Matrix operations are not commutative, i.e. the order in which matrix multiplication happens does matter. Rotating something first, then translate it, is different to first translating and then rotate/orbit around the original center.
Instead of creating the transformation by successive application of various transformations I recommend building it directly. The upper left 3×3 is the rotation part, which you can copy directly from the rotation matrix. Scaling multiplies x,y,z factors to the 1st, 2nd and 3rd column of the matrix. Translation is the 4th column. The rotation is either stored as a 3×3 matrix or a quaternion. Don't use Euler angles, they are numerically instable.