Why is DirectX skipping every second shader call? - c++

I get a bit frustrating trying to figure out why I have to call a DirectX 11 shader twice to see the desired result.
Here's my current state:
I have a 3d object built from vertex and index buffer. This object is then instanced several times. Later, there will be a lot more than just one object, but for now, I'm testing it with this one only. In my render routine, I iterate over all instances, change the world matrices (so that all object instances are put together and form "one big, whole object") and call the shader method to render the data to the screen.
Here's the code so far, that doesn't work:
m_pLevel->Simulate(0.1f);
std::list<CLevelElementInstance*>& lst = m_pLevel->GetInstances();
float x = -(*lst.begin())->GetPosition().x, y = -(*lst.begin())->GetPosition().y, z = -(*lst.begin())->GetPosition().z;
int i = 0;
for (std::list<CLevelElementInstance*>::iterator it = lst.begin(); it != lst.end(); it++)
{
// Extract base element from current instance
CLevelElement* elem = (*it)->GetBaseElement();
// Write vertex and index buffer to video memory
elem->Render(m_pDirect3D->GetDeviceContext());
// Call shader
m_pTextureShader->Render(m_pDirect3D->GetDeviceContext(), elem->GetIndexCount(), XMMatrixTranslation(x, y, z + (i * 8)), viewMatrix, projectionMatrix, elem->GetTexture());
++i;
}
My std::list consists of 4 3d objects, which are all the same. They only differ in their position in 3d space. All of the objects are 8.0f x 8.0f x 8.0f, so for simplicity I just line them up. (As can be seen on the shader render line, where I just add 8 units to the Z dimension)
The result is the following: I only see two elements rendered on the screen. And between them, there's an empty space the size of an element.
At first, I thought I did some errors with the 3d math, but after a lot of time spent debugging my code, I could not find any errors.
And here is now the confusing part:
If I change the content of the for-loop and add another call to the shader, I suddenly see all four elements; and they're all on their correct positions in 3d space:
m_pLevel->Simulate(0.1f);
std::list<CLevelElementInstance*>& lst = m_pLevel->GetInstances();
float x = -(*lst.begin())->GetPosition().x, y = -(*lst.begin())->GetPosition().y, z = -(*lst.begin())->GetPosition().z;
int i = 0;
for (std::list<CLevelElementInstance*>::iterator it = lst.begin(); it != lst.end(); it++)
{
// Extract base element from current instance
CLevelElement* elem = (*it)->GetBaseElement();
// Write vertex and index buffer to video memory
elem->Render(m_pDirect3D->GetDeviceContext());
// Call shader
m_pTextureShader->Render(m_pDirect3D->GetDeviceContext(), elem->GetIndexCount(), XMMatrixTranslation(x, y, z + (i * 8)), viewMatrix, projectionMatrix, elem->GetTexture());
// Call shader a second time - this seems to have no effect but to allow the next iteration to perform it's shader rendering...
m_pTextureShader->Render(m_pDirect3D->GetDeviceContext(), elem->GetIndexCount(), XMMatrixTranslation(x, y, z + (i * 8)), viewMatrix, projectionMatrix, elem->GetTexture());
++i;
}
Does anybody has an idea of what is going on here?
If it helps, here's the code of the shader:
bool CTextureShader::Render(ID3D11DeviceContext* _pDeviceContext, const int _IndexCount, XMMATRIX& _pWorldMatrix, XMMATRIX& _pViewMatrix, XMMATRIX& _pProjectionMatrix, ID3D11ShaderResourceView* _pTexture)
{
bool result = SetShaderParameters(_pDeviceContext, _pWorldMatrix, _pViewMatrix, _pProjectionMatrix, _pTexture);
if (!result)
return false;
RenderShader(_pDeviceContext, _IndexCount);
return true;
}
bool CTextureShader::SetShaderParameters(ID3D11DeviceContext* _pDeviceContext, XMMATRIX& _WorldMatrix, XMMATRIX& _ViewMatrix, XMMATRIX& _ProjectionMatrix, ID3D11ShaderResourceView* _pTexture)
{
HRESULT result;
D3D11_MAPPED_SUBRESOURCE mappedResource;
MatrixBufferType* dataPtr;
unsigned int bufferNumber;
_WorldMatrix = XMMatrixTranspose(_WorldMatrix);
_ViewMatrix = XMMatrixTranspose(_ViewMatrix);
_ProjectionMatrix = XMMatrixTranspose(_ProjectionMatrix);
result = _pDeviceContext->Map(m_pMatrixBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
if (FAILED(result))
return false;
dataPtr = (MatrixBufferType*)mappedResource.pData;
dataPtr->world = _WorldMatrix;
dataPtr->view = _ViewMatrix;
dataPtr->projection = _ProjectionMatrix;
_pDeviceContext->Unmap(m_pMatrixBuffer, 0);
bufferNumber = 0;
_pDeviceContext->VSSetConstantBuffers(bufferNumber, 1, &m_pMatrixBuffer);
_pDeviceContext->PSSetShaderResources(0, 1, &_pTexture);
return true;
}
void CTextureShader::RenderShader(ID3D11DeviceContext* _pDeviceContext, const int _IndexCount)
{
_pDeviceContext->IASetInputLayout(m_pLayout);
_pDeviceContext->VSSetShader(m_pVertexShader, NULL, 0);
_pDeviceContext->PSSetShader(m_pPixelShader, NULL, 0);
_pDeviceContext->PSSetSamplers(0, 1, &m_pSampleState);
_pDeviceContext->DrawIndexed(_IndexCount, 0, 0);
}
If it helps, I can also post the code from the shaders here.
Any help would be appreciated - I'm totally stuck here :-(

The problem is that you are transposing your data every frame, so it's only "right" every other frame:
_WorldMatrix = XMMatrixTranspose(_WorldMatrix);
_ViewMatrix = XMMatrixTranspose(_ViewMatrix);
_ProjectionMatrix = XMMatrixTranspose(_ProjectionMatrix);
Instead, you should be doing something like:
XMMATRIX worldMatrix = XMMatrixTranspose(_WorldMatrix);
XMMATRIX viewMatrix = XMMatrixTranspose(_ViewMatrix);
XMMATRIX projectionMatrix = XMMatrixTranspose(_ProjectionMatrix);
result = _pDeviceContext->Map(m_pMatrixBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
if (FAILED(result))
return false;
dataPtr = (MatrixBufferType*)mappedResource.pData;
dataPtr->world = worldMatrix;
dataPtr->view = viewMatrix;
dataPtr->projection = projectionMatrix;
_pDeviceContext->Unmap(m_pMatrixBuffer, 0);

I wonder if the problem is that the temporary created by the call to XMMatrixTranslation(x, y, z + (i * 8)) is being passed into a function by reference, and then passed into another function by reference where it is modified.
My knowledge of the c++ spec isn't complete enough for me to tell whether that's undefined behaviour or not (but I know that there are tricky rules in this area - assigning a temporary to a non-const temporary is not supported, for example). Regardless, it's close enough to being suspicious that even if it is well defined c++ it still might be a dusty corner that could trip up a non-compliant compiler.
To rule out this possibility, try doing it like:
XMMatrix worldMatrix = XMMatrixTranslation(x, y, z + (i * 8));
m_pTextureShader->Render(m_pDirect3D->GetDeviceContext(), elem->GetIndexCount(), worldMatrix, viewMatrix, projectionMatrix, elem->GetTexture());

Related

trying to add a wave effect to a vertex plane

so what this is supposed to do it create a plane that moves in a sine wave pattern (code that has nothing to do with making it a wave is not shown as it all works fine) and while the plane renders fine it just stays as a flat plane. I'm assuming the problem is that the frame time isn't being called by the vertex shader so I'd like to know how to do that because I'm confused as to why this isn't working
the plane I'm trying to make into a wave ^
manipulation_shader_vs.hlsl
cbuffer TimeBufferType : register(b1)
{
float time;
};
OutputType main(InputType input)
{
OutputType output;
input.position.y = sin(input.position.x + time);
input.normal = float3(-cos(input.position.x + time), 1, 0);
...
timer.cpp
void Timer::frame()
{
INT64 currentTime;
float timeDifference;
// Query the current time.
QueryPerformanceCounter((LARGE_INTEGER*)&currentTime);
timeDifference = (float)(currentTime - startTime);
frameTime = timeDifference / ticksPerS;
// Calc FPS
frames += 1.f;
elapsedTime += frameTime;
if (elapsedTime > 1.0f)
{
fps = frames / elapsedTime;
frames = 0.0f;
elapsedTime = 0.0f;
}
// Restart the timer.
startTime = currentTime;
return;
}
float Timer::getTime()
{
return frameTime;
}
manipulationshader.cpp
void ManipulationShader::setShaderParameters(ID3D11DeviceContext* deviceContext, const XMMATRIX& worldMatrix, const XMMATRIX& viewMatrix, const XMMATRIX& projectionMatrix, ID3D11ShaderResourceView* texture, Light* light, Timer* time)
{
TimeBufferType* timePtr;
deviceContext->Map(TimeBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
timePtr = (TimeBufferType*)mappedResource.pData;
timePtr->time = time->getTime();
deviceContext->Unmap(TimeBuffer, 0);
deviceContext->PSSetConstantBuffers(0, 1, &TimeBuffer);
}
This is a classic "0 vs. 1 base" bug.
In your shader you have the constant buffer bound to slot 1.
cbuffer TimeBufferType : register(b1)
In your code, you are binding the constant buffer to slot 0.
deviceContext->PSSetConstantBuffers(0, 1, &TimeBuffer);
and as noted in the comments by another StackOverflow user, you are binding it only for the pixel shader, not the vertex shader. You are therefore missing this method call:
deviceContext->VSSetConstantBuffers(0, 1, &TimeBuffer);
It's also important in Direct3D coding to always check functions that return HRESULT for success or failure. It's very hard to debug complex programs otherwise. For many cases, a "fast-fatal" solution like ThrowIfFailed works well. You can also use a conditional like the following:
D3D11_MAPPED_SUBRESOURCE mappedResource = {};
if (SUCCEEDED(deviceContext->Map(TimeBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0,
&mappedResource)))
{
auto timePtr = reinterpret_cast<TimeBufferType*>(mappedResource.pData);
timePtr->time = time->getTime();
deviceContext->Unmap(TimeBuffer, 0);
}
See Microsoft Docs.
One additional recommendation: Take a look at StepTimer for your 'animation time' computation. Time computation is quite tricky, and there are quite a number of edge-cases you are not dealing with. For more, see this blog post.

DirectX - Writing to 3D Texture Causing Display Driver Failure

I'm testing writing to 2D and 3D textures in compute shaders, outputting a gradient noise texture consisting of 32 bit floats. Writing to a 2D texture works fine, but writing to a 3D texture isn't. Are there additional considerations that need to be made when creating a 3D texture when compared to a 2D texture?
Code of how I'm defining the 3D texture below:
HRESULT BaseComputeShader::CreateTexture3D(UINT width, UINT height, UINT depth, DXGI_FORMAT format, ID3D11Texture3D** texture)
{
D3D11_TEXTURE3D_DESC textureDesc;
ZeroMemory(&textureDesc, sizeof(textureDesc));
textureDesc.Width = width;
textureDesc.Height = height;
textureDesc.Depth = depth;
textureDesc.MipLevels = 1;
textureDesc.Format = format;
textureDesc.Usage = D3D11_USAGE_DEFAULT;
textureDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE | D3D11_BIND_UNORDERED_ACCESS;
textureDesc.CPUAccessFlags = 0;
textureDesc.MiscFlags = 0;
return renderer->CreateTexture3D(&textureDesc, 0, texture);
}
HRESULT BaseComputeShader::CreateTexture3DUAV(UINT depth, DXGI_FORMAT format, ID3D11Texture3D** texture, ID3D11UnorderedAccessView** unorderedAccessView)
{
D3D11_UNORDERED_ACCESS_VIEW_DESC uavDesc;
ZeroMemory(&uavDesc, sizeof(uavDesc));
uavDesc.Format = format;
uavDesc.ViewDimension = D3D11_UAV_DIMENSION_TEXTURE3D;
uavDesc.Texture3D.MipSlice = 0;
uavDesc.Texture3D.FirstWSlice = 0;
uavDesc.Texture3D.WSize = depth;
return renderer->CreateUnorderedAccessView(*texture, &uavDesc, unorderedAccessView);
}
HRESULT BaseComputeShader::CreateTexture3DSRV(DXGI_FORMAT format, ID3D11Texture3D** texture, ID3D11ShaderResourceView** shaderResourceView)
{
D3D11_SHADER_RESOURCE_VIEW_DESC srvDesc;
ZeroMemory(&srvDesc, sizeof(srvDesc));
srvDesc.Format = format;
srvDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURE3D;
srvDesc.Texture3D.MostDetailedMip = 0;
srvDesc.Texture3D.MipLevels = 1;
return renderer->CreateShaderResourceView(*texture, &srvDesc, shaderResourceView);
}
And how I'm writing to it in the compute shader:
// The texture we're writing to
RWTexture3D<float> outputTexture : register(u0);
[numthreads(8, 8, 8)]
void main(uint3 DTid : SV_DispatchThreadID)
{
float noiseValue = 0.0f;
float value = 0.0f;
float localAmplitude = amplitude;
float localFrequency = frequency;
// Loop for the number of octaves, running the noise function as many times as desired (8 is usually sufficient)
for (int k = 0; k < octaves; k++)
{
noiseValue = noise(float3(DTid.x * localFrequency, DTid.y * localFrequency, DTid.z * localFrequency)) * localAmplitude;
value += noiseValue;
// Calculate a new amplitude based on the input persistence/gain value
// amplitudeLoop will get smaller as the number of layers (i.e. k) increases
localAmplitude *= persistence;
// Calculate a new frequency based on a lacunarity value of 2.0
// This gives us 2^k as the frequency
// i.e. Frequency at k = 4 will be f * 2^4 as we have looped 4 times
localFrequency *= 2.0f;
}
// Output value to 2D index in the texture provided by thread indexing
outputTexture[DTid.xyz] = value;
}
And finally, how I'm running the shader:
// Set the shader
deviceContext->CSSetShader(computeShader, nullptr, 0);
// Set the shader's buffers and views
deviceContext->CSSetConstantBuffers(0, 1, &cBuffer);
deviceContext->CSSetUnorderedAccessViews(0, 1, &textureUAV, nullptr);
// Launch the shader
deviceContext->Dispatch(512, 512, 512);
// Reset the shader now we're done
deviceContext->CSSetShader(nullptr, nullptr, 0);
// Reset the shader views
ID3D11UnorderedAccessView* ppUAViewnullptr[1] = { nullptr };
deviceContext->CSSetUnorderedAccessViews(0, 1, ppUAViewnullptr, nullptr);
// Create the shader resource view for access in other shaders
HRESULT result = CreateTexture3DSRV(DXGI_FORMAT_R32_FLOAT, &texture, &textureSRV);
if (result != S_OK)
{
MessageBox(NULL, L"Failed to create texture SRV after compute shader execution", L"Failed", MB_OK);
exit(0);
}
My bad, simple mistake. Compute shader threads are limited in number. In the compute shader you're limited to a total of 1024 threads, and the dispatch call cannot dispatch more than 65535 thread groups. The HLSL compiler will catch the former issue, but the Visual C++ compiler will not catch the latter issue.
If you create a texture of 512 * 512 * 512 (which seems what you are trying to achieve), your dispatch needs to be divided by groups:
deviceContext->Dispatch(512 / 8, 512 / 8, 512 / 8);
In your previous case, the dispatch was :
512*8 * 512*8 * 512*8 = 68719476736 units
Which very likely triggered the time out detection and crashes the driver
Also the limit of 65535 is per dimension, so in your case you are completely safe to run this.
And last one, you can create both shader resource view and unordered view right after creating your 3d texture (before the dispatch call).
This is generally recommended to avoid mixing context code and resource creation code.
On resource creation, your check is not valid either :
if (result != S_OK)
HRESULT success condition is >= 0
you can use the built in macro instead eg :
if (SUCCEEDED(result))

DirectX 11 - Compute Shader, copy data from the GPU to the CPU

I've just started up using Direct compute in an attempt to move a fluid simulation I have been working on, onto the GPU. I have found a very similar (if not identical) question here however seems the resolution to my problem is not the same as theirs; I do have my CopyResource the right way round for sure! As with the pasted question, I only get a buffer filled with 0's when copy back from the GPU. I really can't see the error as I don't understand how I can be reaching out of bounds limits. I'm going to apologise for the mass amount of code pasting about to occur but I want be sure I've not got any of the setup wrong.
Output Buffer, UAV and System Buffer set up
outputDesc.Usage = D3D11_USAGE_DEFAULT;
outputDesc.BindFlags = D3D11_BIND_UNORDERED_ACCESS;
outputDesc.ByteWidth = sizeof(BoundaryConditions) * numElements;
outputDesc.CPUAccessFlags = 0;
outputDesc.StructureByteStride = sizeof(BoundaryConditions);
outputDesc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
result =_device->CreateBuffer(&outputDesc, 0, &m_outputBuffer);
outputDesc.Usage = D3D11_USAGE_STAGING;
outputDesc.BindFlags = 0;
outputDesc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
result = _device->CreateBuffer(&outputDesc, 0, &m_outputresult);
D3D11_UNORDERED_ACCESS_VIEW_DESC uavDesc;
uavDesc.Format = DXGI_FORMAT_UNKNOWN;
uavDesc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
uavDesc.Buffer.FirstElement = 0;
uavDesc.Buffer.Flags = 0;
uavDesc.Buffer.NumElements = numElements;
result =_device->CreateUnorderedAccessView(m_outputBuffer, &uavDesc, &m_BoundaryConditionsUAV);
Running the Shader in my frame loop
HRESULT result;
D3D11_MAPPED_SUBRESOURCE mappedResource;
_deviceContext->CSSetShader(m_BoundaryConditionsCS, nullptr, 0);
_deviceContext->CSSetUnorderedAccessViews(0, 1, &m_BoundaryConditionsUAV, 0);
_deviceContext->Dispatch(1, 1, 1);
// Unbind output from compute shader
ID3D11UnorderedAccessView* nullUAV[] = { NULL };
_deviceContext->CSSetUnorderedAccessViews(0, 1, nullUAV, 0);
// Disable Compute Shader
_deviceContext->CSSetShader(nullptr, nullptr, 0);
_deviceContext->CopyResource(m_outputresult, m_outputBuffer);
D3D11_MAPPED_SUBRESOURCE mappedData;
result = _deviceContext->Map(m_outputresult, 0, D3D11_MAP_READ, 0, &mappedData);
BoundaryConditions* newbc = reinterpret_cast<BoundaryConditions*>(mappedData.pData);
for (int i = 0; i < 4; i++)
{
Debug::Instance()->Log(newbc[i].x.x);
}
_deviceContext->Unmap(m_outputresult, 0);
HLSL
struct BoundaryConditions
{
float3 x;
float3 y;
};
RWStructuredBuffer<BoundaryConditions> _boundaryConditions;
[numthreads(4, 1, 1)]
void ComputeBoundaryConditions(int3 id : SV_DispatchThreadID)
{
_boundaryConditions[id.x].x = float3(id.x,id.y,id.z);
}
I dispatch the Compute shader after I begin a frame and before I end the frame. I have played around with moving the shaders dispatch call outside of the end scene and before the present ect but nothing seems to effect the process. Can't seem to figure this one out!
Holy Smokes I fixed the error! I was creating the compute shader to a different ID3D11ComputeShader pointer! D: Works like a charm! Pheew Sorry and thanks Adam!

Draw multiple objects in D3D12

I am tinkering with DirectX 12 and have hit a wall while trying to draw a "checker board." I have search the net quite a bit, so any help/pointers will be appreciated.
In D3D11 the working code is as follows.
auto context = m_deviceResources->GetD3DDeviceContext();
for (int i = -10; i < 10; i++)
{
for (int j = -10; j < 10; j++)
{
// perform translation
XMStoreFloat4x4(&m_constantBufferData.model, XMMatrixTranspose(XMMatrixTranslation(i, j, 0.0f)));
context->UpdateSubresource(
m_constantBuffer.Get(),
0,
NULL,
&m_constantBufferData,
0,
0
);
// shaders, etc...
// draw the square
context->DrawIndexed(
m_indexCount,
0,
0
);
}
}
In D3D12, I have tried doing the same thing, but it appears to be performing the translation globally, as all square are in the same location.
bool Sample3DSceneRenderer::Render()
{
if (!m_loadingComplete)
{
return false;
}
DX::ThrowIfFailed(m_deviceResources->GetCommandAllocator()->Reset());
DX::ThrowIfFailed(m_commandList->Reset(m_deviceResources->GetCommandAllocator(), m_pipelineState.Get()));
PIXBeginEvent(m_commandList.Get(), 0, L"Draw the objects");
{
m_commandList->SetGraphicsRootSignature(m_rootSignature.Get());
ID3D12DescriptorHeap* ppHeaps[] = { m_cbvHeap.Get() };
m_commandList->SetDescriptorHeaps(_countof(ppHeaps), ppHeaps);
CD3DX12_GPU_DESCRIPTOR_HANDLE gpuHandle(m_cbvHeap->GetGPUDescriptorHandleForHeapStart(), m_deviceResources->GetCurrentFrameIndex(), m_cbvDescriptorSize);
m_commandList->SetGraphicsRootDescriptorTable(0, gpuHandle);
D3D12_VIEWPORT viewport = m_deviceResources->GetScreenViewport();
m_commandList->RSSetViewports(1, &viewport);
m_commandList->RSSetScissorRects(1, &m_scissorRect);
CD3DX12_RESOURCE_BARRIER renderTargetResourceBarrier =
CD3DX12_RESOURCE_BARRIER::Transition(m_deviceResources->GetRenderTarget(), D3D12_RESOURCE_STATE_PRESENT, D3D12_RESOURCE_STATE_RENDER_TARGET);
m_commandList->ResourceBarrier(1, &renderTargetResourceBarrier);
D3D12_CPU_DESCRIPTOR_HANDLE renderTargetView = m_deviceResources->GetRenderTargetView();
D3D12_CPU_DESCRIPTOR_HANDLE depthStencilView = m_deviceResources->GetDepthStencilView();
m_commandList->ClearRenderTargetView(renderTargetView, m_colors.Get_background(), 0, nullptr);
m_commandList->ClearDepthStencilView(depthStencilView, D3D12_CLEAR_FLAG_DEPTH, 1.0f, 0, 0, nullptr);
m_commandList->OMSetRenderTargets(1, &renderTargetView, false, &depthStencilView);
m_commandList->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
m_commandList->IASetVertexBuffers(0, 1, &m_vertexBufferView);
m_commandList->IASetIndexBuffer(&m_indexBufferView);
for (float i = -10.0f; i < 10.0f; i++)
{
for (float j = -10.0f; j < 10.0f; j++)
{
// as far as I know, this is how I should perform the translation
XMStoreFloat4x4(&m_constantBufferData.model, XMMatrixTranspose(XMMatrixTranslation(i, j, 0.0f)));
UINT8* destination = m_mappedConstantBuffer + (m_deviceResources->GetCurrentFrameIndex() * c_alignedConstantBufferSize);
memcpy(destination, &m_constantBufferData, sizeof(m_constantBufferData));
m_commandList->DrawIndexedInstanced(6, 1, 0, 0, 0);
}
}
CD3DX12_RESOURCE_BARRIER presentResourceBarrier =
CD3DX12_RESOURCE_BARRIER::Transition(m_deviceResources->GetRenderTarget(), D3D12_RESOURCE_STATE_RENDER_TARGET, D3D12_RESOURCE_STATE_PRESENT);
m_commandList->ResourceBarrier(1, &presentResourceBarrier);
}
PIXEndEvent(m_commandList.Get());
DX::ThrowIfFailed(m_commandList->Close());
ID3D12CommandList* ppCommandLists[] = { m_commandList.Get() };
m_deviceResources->GetCommandQueue()->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists);
return true;
}
Thank you,
Chelsey
You're just writing your translation matrix to the same piece of memory for every copy of the model. Since the GPU hasn't even begun drawing the first model by the time you've finished writing the translation matrix for the last one, the only place any of these models are going to draw is at the place of the last translation matrix written.
You need to write each matrix to a separate, distinct location in memory and ensure they're not overwritten by anything else until the GPU has finished drawing the models.
The act of calling DrawIndexedInstanced does not immediately instruct the GPU to draw anything, it merely adds a command to a command list to draw the object at some time in the future. If you're not familiar with the asynchronous nature of Graphics APIs and GPU execution you should probably read up a bit more on how it works.

OpenGL skeleton animation

I am trying to add animation to my program.
I have human model created in Blender with skeletal animation, and I can skip through the keyframes to see the model walking.
Now I've exported the model to an XML (Ogre3D) format, and in this XML file I can see the rotation, translation and scale assigned to each bone at a specific time (t=0.00000, t=0.00040, ... etc.)
What I've done is found which vertices are assigned each bone. Now I'm assuming all I need to do is apply the transformations defined for the bone to each one of these vertices. Is this the correct approach?
In my OpenGL draw() function (rough pseudo-code):
for (Bone b : bones){
gl.glLoadIdentity();
List<Vertex> v= b.getVertices();
rotation = b.getRotation();
translation = b.getTranslation();
scale = b.getScale();
gl.glTranslatef(translation);
gl.glRotatef(rotation);
gl.glScalef(scale);
gl.glDrawElements(v);
}
Vertices are usually affected by more than one bone -- it sounds like you're after linear blend skinning. My code's in C++ unfortunately, but hopefully it'll give you the idea:
void Submesh::skin(const Skeleton_CPtr& skeleton)
{
/*
Linear Blend Skinning Algorithm:
P = (\sum_i w_i * M_i * M_{0,i}^{-1}) * P_0 / (sum i w_i)
Each M_{0,i}^{-1} matrix gets P_0 (the rest vertex) into its corresponding bone's coordinate frame.
We construct matrices M_n * M_{0,n}^-1 for each n in advance to avoid repeating calculations.
I refer to these in the code as the 'skinning matrices'.
*/
BoneHierarchy_CPtr boneHierarchy = skeleton->bone_hierarchy();
ConfiguredPose_CPtr pose = skeleton->get_pose();
int boneCount = boneHierarchy->bone_count();
// Construct the skinning matrices.
std::vector<RBTMatrix_CPtr> skinningMatrices(boneCount);
for(int i=0; i<boneCount; ++i)
{
skinningMatrices[i] = pose->bones(i)->absolute_matrix() * skeleton->to_bone_matrix(i);
}
// Build the vertex array.
RBTMatrix_Ptr m = RBTMatrix::zeros(); // used as an accumulator for \sum_i w_i * M_i * M_{0,i}^{-1}
int vertCount = static_cast<int>(m_vertices.size());
for(int i=0, offset=0; i<vertCount; ++i, offset+=3)
{
const Vector3d& p0 = m_vertices[i].position();
const std::vector<BoneWeight>& boneWeights = m_vertices[i].bone_weights();
int boneWeightCount = static_cast<int>(boneWeights.size());
Vector3d p;
if(boneWeightCount != 0)
{
double boneWeightSum = 0;
for(int j=0; j<boneWeightCount; ++j)
{
int boneIndex = boneWeights[j].bone_index();
double boneWeight = boneWeights[j].weight();
boneWeightSum += boneWeight;
m->add_scaled(skinningMatrices[boneIndex], boneWeight);
}
// Note: This is effectively p = m*p0 (if we think of p0 as (p0.x, p0.y, p0.z, 1)).
p = m->apply_to_point(p0);
p /= boneWeightSum;
// Reset the accumulator matrix ready for the next vertex.
m->reset_to_zeros();
}
else
{
// If this vertex is unaffected by the armature (i.e. no bone weights have been assigned to it),
// use its rest position as its real position (it's the best we can do).
p = p0;
}
m_vertArray[offset] = p.x;
m_vertArray[offset+1] = p.y;
m_vertArray[offset+2] = p.z;
}
}
void Submesh::render() const
{
glPushClientAttrib(GL_CLIENT_VERTEX_ARRAY_BIT);
glPushAttrib(GL_ENABLE_BIT | GL_POLYGON_BIT);
glEnableClientState(GL_VERTEX_ARRAY);
glVertexPointer(3, GL_DOUBLE, 0, &m_vertArray[0]);
if(m_material->uses_texcoords())
{
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glTexCoordPointer(2, GL_DOUBLE, 0, &m_texCoordArray[0]);
}
m_material->apply();
glDrawElements(GL_TRIANGLES, static_cast<GLsizei>(m_vertIndices.size()), GL_UNSIGNED_INT, &m_vertIndices[0]);
glPopAttrib();
glPopClientAttrib();
}
Note in passing that real-world implementations usually do this sort of thing on the GPU to the best of my knowledge.
Your code assumes that each bone has an independent transformation matrix (you reset your matrix at the start of each loop iteration). But in reality, bones form a hierarchical structure that you must preserve when you do your rendering. Consider that when your upper arm rotates your forearm rotates along, because it is attached. The forearm may have its own rotation, but that is applied after it is rotated with the upper arm.
The rendering of the skeleton then is done recursively. Here is some pseudo-code:
function renderBone(Bone b) {
setupTransformMatrix(b);
draw(b);
foreach c in b.getChildren()
renderBone(c);
}
main() {
gl.glLoadIdentity();
renderBone(rootBone);
}
I hope this helps.