I'm testing writing to 2D and 3D textures in compute shaders, outputting a gradient noise texture consisting of 32 bit floats. Writing to a 2D texture works fine, but writing to a 3D texture isn't. Are there additional considerations that need to be made when creating a 3D texture when compared to a 2D texture?
Code of how I'm defining the 3D texture below:
HRESULT BaseComputeShader::CreateTexture3D(UINT width, UINT height, UINT depth, DXGI_FORMAT format, ID3D11Texture3D** texture)
{
D3D11_TEXTURE3D_DESC textureDesc;
ZeroMemory(&textureDesc, sizeof(textureDesc));
textureDesc.Width = width;
textureDesc.Height = height;
textureDesc.Depth = depth;
textureDesc.MipLevels = 1;
textureDesc.Format = format;
textureDesc.Usage = D3D11_USAGE_DEFAULT;
textureDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE | D3D11_BIND_UNORDERED_ACCESS;
textureDesc.CPUAccessFlags = 0;
textureDesc.MiscFlags = 0;
return renderer->CreateTexture3D(&textureDesc, 0, texture);
}
HRESULT BaseComputeShader::CreateTexture3DUAV(UINT depth, DXGI_FORMAT format, ID3D11Texture3D** texture, ID3D11UnorderedAccessView** unorderedAccessView)
{
D3D11_UNORDERED_ACCESS_VIEW_DESC uavDesc;
ZeroMemory(&uavDesc, sizeof(uavDesc));
uavDesc.Format = format;
uavDesc.ViewDimension = D3D11_UAV_DIMENSION_TEXTURE3D;
uavDesc.Texture3D.MipSlice = 0;
uavDesc.Texture3D.FirstWSlice = 0;
uavDesc.Texture3D.WSize = depth;
return renderer->CreateUnorderedAccessView(*texture, &uavDesc, unorderedAccessView);
}
HRESULT BaseComputeShader::CreateTexture3DSRV(DXGI_FORMAT format, ID3D11Texture3D** texture, ID3D11ShaderResourceView** shaderResourceView)
{
D3D11_SHADER_RESOURCE_VIEW_DESC srvDesc;
ZeroMemory(&srvDesc, sizeof(srvDesc));
srvDesc.Format = format;
srvDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURE3D;
srvDesc.Texture3D.MostDetailedMip = 0;
srvDesc.Texture3D.MipLevels = 1;
return renderer->CreateShaderResourceView(*texture, &srvDesc, shaderResourceView);
}
And how I'm writing to it in the compute shader:
// The texture we're writing to
RWTexture3D<float> outputTexture : register(u0);
[numthreads(8, 8, 8)]
void main(uint3 DTid : SV_DispatchThreadID)
{
float noiseValue = 0.0f;
float value = 0.0f;
float localAmplitude = amplitude;
float localFrequency = frequency;
// Loop for the number of octaves, running the noise function as many times as desired (8 is usually sufficient)
for (int k = 0; k < octaves; k++)
{
noiseValue = noise(float3(DTid.x * localFrequency, DTid.y * localFrequency, DTid.z * localFrequency)) * localAmplitude;
value += noiseValue;
// Calculate a new amplitude based on the input persistence/gain value
// amplitudeLoop will get smaller as the number of layers (i.e. k) increases
localAmplitude *= persistence;
// Calculate a new frequency based on a lacunarity value of 2.0
// This gives us 2^k as the frequency
// i.e. Frequency at k = 4 will be f * 2^4 as we have looped 4 times
localFrequency *= 2.0f;
}
// Output value to 2D index in the texture provided by thread indexing
outputTexture[DTid.xyz] = value;
}
And finally, how I'm running the shader:
// Set the shader
deviceContext->CSSetShader(computeShader, nullptr, 0);
// Set the shader's buffers and views
deviceContext->CSSetConstantBuffers(0, 1, &cBuffer);
deviceContext->CSSetUnorderedAccessViews(0, 1, &textureUAV, nullptr);
// Launch the shader
deviceContext->Dispatch(512, 512, 512);
// Reset the shader now we're done
deviceContext->CSSetShader(nullptr, nullptr, 0);
// Reset the shader views
ID3D11UnorderedAccessView* ppUAViewnullptr[1] = { nullptr };
deviceContext->CSSetUnorderedAccessViews(0, 1, ppUAViewnullptr, nullptr);
// Create the shader resource view for access in other shaders
HRESULT result = CreateTexture3DSRV(DXGI_FORMAT_R32_FLOAT, &texture, &textureSRV);
if (result != S_OK)
{
MessageBox(NULL, L"Failed to create texture SRV after compute shader execution", L"Failed", MB_OK);
exit(0);
}
My bad, simple mistake. Compute shader threads are limited in number. In the compute shader you're limited to a total of 1024 threads, and the dispatch call cannot dispatch more than 65535 thread groups. The HLSL compiler will catch the former issue, but the Visual C++ compiler will not catch the latter issue.
If you create a texture of 512 * 512 * 512 (which seems what you are trying to achieve), your dispatch needs to be divided by groups:
deviceContext->Dispatch(512 / 8, 512 / 8, 512 / 8);
In your previous case, the dispatch was :
512*8 * 512*8 * 512*8 = 68719476736 units
Which very likely triggered the time out detection and crashes the driver
Also the limit of 65535 is per dimension, so in your case you are completely safe to run this.
And last one, you can create both shader resource view and unordered view right after creating your 3d texture (before the dispatch call).
This is generally recommended to avoid mixing context code and resource creation code.
On resource creation, your check is not valid either :
if (result != S_OK)
HRESULT success condition is >= 0
you can use the built in macro instead eg :
if (SUCCEEDED(result))
Related
I'm trying to make frustrum culling via compute shader. For that I have a pair of buffers for instanced vertex attributes, and a pair of buffers for indirect draw commands. My compute shader checks if instance coordinates from first buffer are within bounding volume, referencing first draw buffer for counts, subgroupBallot and bitCount to see offset within subgroup, then add results from other subgroups and a global offset, and finally stores the result in second buffer. The global offset is stored in second indirect draw buffer.
The problem is that, when under load, frustum may be few(>1) frames late to the moving camera, with wide lines of disappeared objects on edge. It seems weird to me because culling and rendering are done within same command buffer.
When taking capture in renderdoc, taking a screenshot alt+printScreen, or pausing the render-present thread, things snap back to as they should be.
My only guess is that compute shader from past frame continues to execute even when new frame starts to be drawn, though this should not be happening due to pipeline barriers.
Shader code:
#version 460
#extension GL_KHR_shader_subgroup_ballot : require
struct drawData{
uint indexCount;
uint instanceCount;
uint firstIndex;
uint vertexOffset;
uint firstInstance;
};
struct instanceData{
float x, y, z;
float a, b, c, d;
};
layout(local_size_x = 128, local_size_y = 1, local_size_z = 1) in;
layout(set = 0, binding = 0) uniform A
{
mat4 cam;
vec4 camPos;
vec4 l;
vec4 t;
vec4 r;
vec4 b;
};
layout(set = 0, binding = 1) buffer B
{
uint count;
drawData data[];
} Draw[2];
layout(set = 0, binding = 2) buffer C
{
instanceData data[];
} Instance[2];
shared uint offsetsM[32];
void main()
{
const uint gID = gl_LocalInvocationID.x;
const uint lID = gl_SubgroupInvocationID;
const uint patchSize = gl_WorkGroupSize.x;
Draw[1].data[0] = Draw[0].data[0];//copy data like index count
Draw[1].count = Draw[0].count;
uint offsetG = 0;//accumulating offset within end buffer
uint loops = Draw[0].data[0].instanceCount/patchSize;//constant loop count
for(uint i = 0; i<loops;++i){
uint posa = i*patchSize+gID;//runs better this way for some reason
vec3 pos = camPos.xyz-vec3(Instance[0].data[posa].x, Instance[0].data[posa].y, Instance[0].data[posa].z);//position relative to camera
mat4x3 lrtb = mat4x3(l.xyz, r.xyz, t.xyz, b.xyz);
vec4 dist = pos*lrtb+Model.data[0].rad;//dot products and radius tolerance
bool Pass = posa<Draw[0].data[0].instanceCount&&//is real
(dot(pos, pos)<l.w*l.w) &&//not too far
all(greaterThan(dist, vec4(0))); //within view frustum
subgroupBarrier();//no idea what is the best, put what works
uvec4 actives = subgroupBallot(Pass);//count passed instances
if(subgroupElect())
offsetsM[gl_SubgroupID] = bitCount(actives).x+bitCount(actives).y;
barrier();
uint offsetL = bitCount(actives&gl_SubgroupLtMask).x+bitCount(actives&gl_SubgroupLtMask).y;//offset withing subgroup
uint ii = 0;
if(Pass){
for(; ii<gl_SubgroupID; ++ii)
offsetG+= offsetsM[ii];//offsets before subgroup
Instance[1].data[offsetG+offsetL] = Instance[0].data[posa];
for(; ii<gl_NumSubgroups; ++ii)
offsetG+= offsetsM[ii];}//offsets after subgroup
else for(; ii<gl_NumSubgroups; ++ii)
offsetG+= offsetsM[ii];//same but no data copying
}
if(gID == 0)
Draw[1].data[0].instanceCount = offsetG;
}
For renderpass after the compute I have dependencies:
{//1
deps[1].srcSubpass = VK_SUBPASS_EXTERNAL;
deps[1].dstSubpass = 0;
deps[1].srcStageMask = VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT;
deps[1].dstStageMask = VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT;
deps[1].srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT;
deps[1].dstAccessMask = VK_ACCESS_INDIRECT_COMMAND_READ_BIT;
deps[1].dependencyFlags = 0;
}
{//2
deps[2].srcSubpass = VK_SUBPASS_EXTERNAL;
deps[2].dstSubpass = 0;
deps[2].srcStageMask = VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT;
deps[2].dstStageMask = VK_PIPELINE_STAGE_VERTEX_INPUT_BIT;
deps[2].srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT;
deps[2].dstAccessMask = VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT;
deps[2].dependencyFlags = 0;
}
The command buffer is(fully reused as is, one for each image in swapchain):
vkBeginCommandBuffer(cmd, &begInfo);
vkCmdBindDescriptorSets(cmd, VK_PIPELINE_BIND_POINT_COMPUTE, layoutsPipe[1],
0, 1, &descs[1], 0, 0);
vkCmdBindPipeline(cmd, VK_PIPELINE_BIND_POINT_COMPUTE, pipes[1]);
vkCmdDispatch(cmd, 1, 1, 1);
VkBufferMemoryBarrier bufMemBar[2];
{//mem bars
{//0 indirect
bufMemBar[0].srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT;
bufMemBar[0].dstAccessMask = VK_ACCESS_INDIRECT_COMMAND_READ_BIT;
bufMemBar[0].buffer = bufferIndirect;
bufMemBar[0].offset = 0;
bufMemBar[0].size = -1;
}
{//1 vertex instance
bufMemBar[1].srcAccessMask = VK_ACCESS_SHADER_WRITE_BIT;
bufMemBar[1].dstAccessMask = VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT;
bufMemBar[1].buffer = bufferInstance;
bufMemBar[1].offset = 0;
bufMemBar[1].size = -1;
}
}
vkCmdPipelineBarrier(cmd, VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,
VK_PIPELINE_STAGE_DRAW_INDIRECT_BIT, 0, 0, 0, 1, &bufMemBar[0], 0, 0);
vkCmdPipelineBarrier(cmd, VK_PIPELINE_STAGE_COMPUTE_SHADER_BIT,
VK_PIPELINE_STAGE_VERTEX_INPUT_BIT , 0, 0, 0, 1, &bufMemBar[1], 0, 0);
VkRenderPassBeginInfo passBegInfo;
passBegInfo.renderPass = pass;
passBegInfo.framebuffer = chain.frames[i];
passBegInfo.renderArea = {{0, 0}, chain.dim};
VkClearValue clears[2]{{0},{0}};
passBegInfo.clearValueCount = 2;
passBegInfo.pClearValues = clears;
vkCmdBeginRenderPass(cmd, &passBegInfo, VK_SUBPASS_CONTENTS_INLINE);
vkCmdBindDescriptorSets(cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, layoutsPipe[0], 0, 1, &descs[0], 0, 0);
vkCmdBindPipeline (cmd, VK_PIPELINE_BIND_POINT_GRAPHICS, pipes[0]);
VkBuffer buffersVertex[2]{bufferVertexProto, bufferInstance};
VkDeviceSize offsetsVertex[2]{0, 0};
vkCmdBindVertexBuffers(cmd, 0, 2, buffersVertex, offsetsVertex);
vkCmdBindIndexBuffer (cmd, bufferIndex, 0, VK_INDEX_TYPE_UINT32);
vkCmdDrawIndexedIndirectCount(cmd, bufferIndirect, 0+4,
bufferIndirect, 0,
count.maxDraws, sizeof(VkDrawIndexedIndirectCommand));
vkCmdEndRenderPass(cmd);
vkEndCommandBuffer(cmd);
Rendering and presentation are synchronised with two semaphores - imageAvailable, and renderFinished. Frustum calculation is in right order on CPU. Validation layers are enabled.
The problem was that I lacked host synchronisation. Indeed, even within same command buffer, there are no host synchronisation guarantees (and that makes sense, since it enables us to use events).
I've just started up using Direct compute in an attempt to move a fluid simulation I have been working on, onto the GPU. I have found a very similar (if not identical) question here however seems the resolution to my problem is not the same as theirs; I do have my CopyResource the right way round for sure! As with the pasted question, I only get a buffer filled with 0's when copy back from the GPU. I really can't see the error as I don't understand how I can be reaching out of bounds limits. I'm going to apologise for the mass amount of code pasting about to occur but I want be sure I've not got any of the setup wrong.
Output Buffer, UAV and System Buffer set up
outputDesc.Usage = D3D11_USAGE_DEFAULT;
outputDesc.BindFlags = D3D11_BIND_UNORDERED_ACCESS;
outputDesc.ByteWidth = sizeof(BoundaryConditions) * numElements;
outputDesc.CPUAccessFlags = 0;
outputDesc.StructureByteStride = sizeof(BoundaryConditions);
outputDesc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
result =_device->CreateBuffer(&outputDesc, 0, &m_outputBuffer);
outputDesc.Usage = D3D11_USAGE_STAGING;
outputDesc.BindFlags = 0;
outputDesc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
result = _device->CreateBuffer(&outputDesc, 0, &m_outputresult);
D3D11_UNORDERED_ACCESS_VIEW_DESC uavDesc;
uavDesc.Format = DXGI_FORMAT_UNKNOWN;
uavDesc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
uavDesc.Buffer.FirstElement = 0;
uavDesc.Buffer.Flags = 0;
uavDesc.Buffer.NumElements = numElements;
result =_device->CreateUnorderedAccessView(m_outputBuffer, &uavDesc, &m_BoundaryConditionsUAV);
Running the Shader in my frame loop
HRESULT result;
D3D11_MAPPED_SUBRESOURCE mappedResource;
_deviceContext->CSSetShader(m_BoundaryConditionsCS, nullptr, 0);
_deviceContext->CSSetUnorderedAccessViews(0, 1, &m_BoundaryConditionsUAV, 0);
_deviceContext->Dispatch(1, 1, 1);
// Unbind output from compute shader
ID3D11UnorderedAccessView* nullUAV[] = { NULL };
_deviceContext->CSSetUnorderedAccessViews(0, 1, nullUAV, 0);
// Disable Compute Shader
_deviceContext->CSSetShader(nullptr, nullptr, 0);
_deviceContext->CopyResource(m_outputresult, m_outputBuffer);
D3D11_MAPPED_SUBRESOURCE mappedData;
result = _deviceContext->Map(m_outputresult, 0, D3D11_MAP_READ, 0, &mappedData);
BoundaryConditions* newbc = reinterpret_cast<BoundaryConditions*>(mappedData.pData);
for (int i = 0; i < 4; i++)
{
Debug::Instance()->Log(newbc[i].x.x);
}
_deviceContext->Unmap(m_outputresult, 0);
HLSL
struct BoundaryConditions
{
float3 x;
float3 y;
};
RWStructuredBuffer<BoundaryConditions> _boundaryConditions;
[numthreads(4, 1, 1)]
void ComputeBoundaryConditions(int3 id : SV_DispatchThreadID)
{
_boundaryConditions[id.x].x = float3(id.x,id.y,id.z);
}
I dispatch the Compute shader after I begin a frame and before I end the frame. I have played around with moving the shaders dispatch call outside of the end scene and before the present ect but nothing seems to effect the process. Can't seem to figure this one out!
Holy Smokes I fixed the error! I was creating the compute shader to a different ID3D11ComputeShader pointer! D: Works like a charm! Pheew Sorry and thanks Adam!
I get a bit frustrating trying to figure out why I have to call a DirectX 11 shader twice to see the desired result.
Here's my current state:
I have a 3d object built from vertex and index buffer. This object is then instanced several times. Later, there will be a lot more than just one object, but for now, I'm testing it with this one only. In my render routine, I iterate over all instances, change the world matrices (so that all object instances are put together and form "one big, whole object") and call the shader method to render the data to the screen.
Here's the code so far, that doesn't work:
m_pLevel->Simulate(0.1f);
std::list<CLevelElementInstance*>& lst = m_pLevel->GetInstances();
float x = -(*lst.begin())->GetPosition().x, y = -(*lst.begin())->GetPosition().y, z = -(*lst.begin())->GetPosition().z;
int i = 0;
for (std::list<CLevelElementInstance*>::iterator it = lst.begin(); it != lst.end(); it++)
{
// Extract base element from current instance
CLevelElement* elem = (*it)->GetBaseElement();
// Write vertex and index buffer to video memory
elem->Render(m_pDirect3D->GetDeviceContext());
// Call shader
m_pTextureShader->Render(m_pDirect3D->GetDeviceContext(), elem->GetIndexCount(), XMMatrixTranslation(x, y, z + (i * 8)), viewMatrix, projectionMatrix, elem->GetTexture());
++i;
}
My std::list consists of 4 3d objects, which are all the same. They only differ in their position in 3d space. All of the objects are 8.0f x 8.0f x 8.0f, so for simplicity I just line them up. (As can be seen on the shader render line, where I just add 8 units to the Z dimension)
The result is the following: I only see two elements rendered on the screen. And between them, there's an empty space the size of an element.
At first, I thought I did some errors with the 3d math, but after a lot of time spent debugging my code, I could not find any errors.
And here is now the confusing part:
If I change the content of the for-loop and add another call to the shader, I suddenly see all four elements; and they're all on their correct positions in 3d space:
m_pLevel->Simulate(0.1f);
std::list<CLevelElementInstance*>& lst = m_pLevel->GetInstances();
float x = -(*lst.begin())->GetPosition().x, y = -(*lst.begin())->GetPosition().y, z = -(*lst.begin())->GetPosition().z;
int i = 0;
for (std::list<CLevelElementInstance*>::iterator it = lst.begin(); it != lst.end(); it++)
{
// Extract base element from current instance
CLevelElement* elem = (*it)->GetBaseElement();
// Write vertex and index buffer to video memory
elem->Render(m_pDirect3D->GetDeviceContext());
// Call shader
m_pTextureShader->Render(m_pDirect3D->GetDeviceContext(), elem->GetIndexCount(), XMMatrixTranslation(x, y, z + (i * 8)), viewMatrix, projectionMatrix, elem->GetTexture());
// Call shader a second time - this seems to have no effect but to allow the next iteration to perform it's shader rendering...
m_pTextureShader->Render(m_pDirect3D->GetDeviceContext(), elem->GetIndexCount(), XMMatrixTranslation(x, y, z + (i * 8)), viewMatrix, projectionMatrix, elem->GetTexture());
++i;
}
Does anybody has an idea of what is going on here?
If it helps, here's the code of the shader:
bool CTextureShader::Render(ID3D11DeviceContext* _pDeviceContext, const int _IndexCount, XMMATRIX& _pWorldMatrix, XMMATRIX& _pViewMatrix, XMMATRIX& _pProjectionMatrix, ID3D11ShaderResourceView* _pTexture)
{
bool result = SetShaderParameters(_pDeviceContext, _pWorldMatrix, _pViewMatrix, _pProjectionMatrix, _pTexture);
if (!result)
return false;
RenderShader(_pDeviceContext, _IndexCount);
return true;
}
bool CTextureShader::SetShaderParameters(ID3D11DeviceContext* _pDeviceContext, XMMATRIX& _WorldMatrix, XMMATRIX& _ViewMatrix, XMMATRIX& _ProjectionMatrix, ID3D11ShaderResourceView* _pTexture)
{
HRESULT result;
D3D11_MAPPED_SUBRESOURCE mappedResource;
MatrixBufferType* dataPtr;
unsigned int bufferNumber;
_WorldMatrix = XMMatrixTranspose(_WorldMatrix);
_ViewMatrix = XMMatrixTranspose(_ViewMatrix);
_ProjectionMatrix = XMMatrixTranspose(_ProjectionMatrix);
result = _pDeviceContext->Map(m_pMatrixBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
if (FAILED(result))
return false;
dataPtr = (MatrixBufferType*)mappedResource.pData;
dataPtr->world = _WorldMatrix;
dataPtr->view = _ViewMatrix;
dataPtr->projection = _ProjectionMatrix;
_pDeviceContext->Unmap(m_pMatrixBuffer, 0);
bufferNumber = 0;
_pDeviceContext->VSSetConstantBuffers(bufferNumber, 1, &m_pMatrixBuffer);
_pDeviceContext->PSSetShaderResources(0, 1, &_pTexture);
return true;
}
void CTextureShader::RenderShader(ID3D11DeviceContext* _pDeviceContext, const int _IndexCount)
{
_pDeviceContext->IASetInputLayout(m_pLayout);
_pDeviceContext->VSSetShader(m_pVertexShader, NULL, 0);
_pDeviceContext->PSSetShader(m_pPixelShader, NULL, 0);
_pDeviceContext->PSSetSamplers(0, 1, &m_pSampleState);
_pDeviceContext->DrawIndexed(_IndexCount, 0, 0);
}
If it helps, I can also post the code from the shaders here.
Any help would be appreciated - I'm totally stuck here :-(
The problem is that you are transposing your data every frame, so it's only "right" every other frame:
_WorldMatrix = XMMatrixTranspose(_WorldMatrix);
_ViewMatrix = XMMatrixTranspose(_ViewMatrix);
_ProjectionMatrix = XMMatrixTranspose(_ProjectionMatrix);
Instead, you should be doing something like:
XMMATRIX worldMatrix = XMMatrixTranspose(_WorldMatrix);
XMMATRIX viewMatrix = XMMatrixTranspose(_ViewMatrix);
XMMATRIX projectionMatrix = XMMatrixTranspose(_ProjectionMatrix);
result = _pDeviceContext->Map(m_pMatrixBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
if (FAILED(result))
return false;
dataPtr = (MatrixBufferType*)mappedResource.pData;
dataPtr->world = worldMatrix;
dataPtr->view = viewMatrix;
dataPtr->projection = projectionMatrix;
_pDeviceContext->Unmap(m_pMatrixBuffer, 0);
I wonder if the problem is that the temporary created by the call to XMMatrixTranslation(x, y, z + (i * 8)) is being passed into a function by reference, and then passed into another function by reference where it is modified.
My knowledge of the c++ spec isn't complete enough for me to tell whether that's undefined behaviour or not (but I know that there are tricky rules in this area - assigning a temporary to a non-const temporary is not supported, for example). Regardless, it's close enough to being suspicious that even if it is well defined c++ it still might be a dusty corner that could trip up a non-compliant compiler.
To rule out this possibility, try doing it like:
XMMatrix worldMatrix = XMMatrixTranslation(x, y, z + (i * 8));
m_pTextureShader->Render(m_pDirect3D->GetDeviceContext(), elem->GetIndexCount(), worldMatrix, viewMatrix, projectionMatrix, elem->GetTexture());
For some reason on machines that only conditionally (D3DPTEXTURECAPS_NONPOW2CONDITIONAL) supports the "non power of two" (NONPOW2) textures I'm getting this kind of image distortion:
(On machines that fully supports NONPOW2 textures everything works fine.)
This is a 1280x720 resolution RGB24 (D3DFMT_X8R8G8B8).
I know there are four rules when dealing with conditionally supported textures:
1.The texture addressing mode for the texture stage is set to D3DTADDRESS_CLAMP.
mpDevice->SetSamplerState(0, D3DSAMP_ADDRESSU, D3DTADDRESS_CLAMP);
mpDevice->SetSamplerState(0, D3DSAMP_ADDRESSV, D3DTADDRESS_CLAMP);
mpDevice->SetSamplerState(0, D3DSAMP_ADDRESSW, D3DTADDRESS_CLAMP);
2.Texture wrapping for the texture stage is disabled (D3DRS_WRAP n set to 0).
mpDevice->SetRenderState(D3DRS_WRAP0, 0);
3.Mipmapping is not in use (use magnification filter only).
mpDevice->SetSamplerState(0, D3DSAMP_MIPFILTER, D3DTEXF_NONE);
4.Texture formats must not be D3DFMT_DXT1 through D3DFMT_DXT5.
I'm using D3DFMT_X8R8G8B8...
Texture is created like this:
mpDevice->CreateTexture(width, height, 1, D3DUSAGE_DYNAMIC, D3DFMT_X8R8G8B8, D3DPOOL_DEFAULT, &pTexture, nullptr);
And updated like this:
D3DLOCKED_RECT lockedRect;
pTexture->LockRect(0, &lockedRect, nullptr, D3DLOCK_DISCARD);
// const IppiSize roiSize = { width, height };
// ippiCopy_8u_C3AC4R((Ipp8u*) pPixels, width * 3, (Ipp8u*) lockedRect.pBits, width * 4, roiSize); // RGB to RGBA
// ippiCopy_8u_AC4C3R((Ipp8u*) pPixels, width * 4, (Ipp8u*) lockedRect.pBits, width * 3, roiSize); // RGBA to RGB
// TEMP.
unsigned char* pDst = (unsigned char*) lockedRect.pBits;
const unsigned char* pSrc = pPixels;
for (int k = 0; k < width; k++)
{
for (int j = 0; j < height; j++)
{
pDst[0] = pSrc[0];
pDst[1] = pSrc[1];
pDst[2] = pSrc[2];
pDst[3] = 255;
pDst += 4;
pSrc += 3;
}
}
pTexture->UnlockRect(0);
Does anyone had any similar problem? Or have any idea on what is going on?
Maybe there's something wrong with the RGB data itself?
Thanks.
We are trying to use librocket http://librocket.com/ together with Ogre http://www.ogre3d.org/. They're both part of gamekit http://code.google.com/p/gamekit/ which I use for this project.
This all works fine together as long as I don't load an image with librocket. As soon as I do that the viewport on the iPad is not fullscreen anymore but small in the lower corner. Like this: http://uploads.undef.ch/machine/ipad.png
I can't make a connection between loading/rendering a texture and resizing of the viewport. And I can't find anything wrong with the RenderInterface. http://uploads.undef.ch/machine/RenderInterfaceOgre3D.cpp
Is there any OpenGLES command that could have an affect on the active viewport size?
This is the relevant code that loads an image and displays it:
// Called by Rocket when a texture is required by the library.
bool RenderInterfaceOgre3D::LoadTexture(Rocket::Core::TextureHandle& texture_handle, Rocket::Core::Vector2i& texture_dimensions, const Rocket::Core::String& source)
{
Ogre::TextureManager* texture_manager = Ogre::TextureManager::getSingletonPtr();
Ogre::TexturePtr ogre_texture = texture_manager->getByName(Ogre::String(source.CString()));
if (ogre_texture.isNull())
{
ogre_texture = texture_manager->load(Ogre::String(source.CString()),
DEFAULT_ROCKET_RESOURCE_GROUP,
Ogre::TEX_TYPE_2D,
0);
}
if (ogre_texture.isNull())
return false;
texture_dimensions.x = ogre_texture->getWidth();
texture_dimensions.y = ogre_texture->getHeight();
texture_handle = reinterpret_cast<Rocket::Core::TextureHandle>(new RocketOgre3DTexture(ogre_texture));
return true;
}
// Called by Rocket when it wants to render geometry that it does not wish to optimise.
void RenderInterfaceOgre3D::RenderGeometry(Rocket::Core::Vertex* vertices, int num_vertices, int* indices, int num_indices, Rocket::Core::TextureHandle texture, const Rocket::Core::Vector2f& translation)
{
// We've chosen to not support non-compiled geometry in the Ogre3D renderer.
// But if you want, you can uncomment this code, so borders will be shown.
/*
Rocket::Core::CompiledGeometryHandle gh = CompileGeometry(vertices, num_vertices, indices, num_indices, texture);
RenderCompiledGeometry(gh, translation);
ReleaseCompiledGeometry(gh);
*/
}
// Called by Rocket when it wants to compile geometry it believes will be static for the forseeable future.
Rocket::Core::CompiledGeometryHandle RenderInterfaceOgre3D::CompileGeometry(Rocket::Core::Vertex* vertices, int num_vertices, int* indices, int num_indices, Rocket::Core::TextureHandle texture)
{
RocketOgre3DCompiledGeometry* geometry = new RocketOgre3DCompiledGeometry();
geometry->texture = texture == NULL ? NULL : (RocketOgre3DTexture*) texture;
geometry->render_operation.vertexData = new Ogre::VertexData();
geometry->render_operation.vertexData->vertexStart = 0;
geometry->render_operation.vertexData->vertexCount = num_vertices;
geometry->render_operation.indexData = new Ogre::IndexData();
geometry->render_operation.indexData->indexStart = 0;
geometry->render_operation.indexData->indexCount = num_indices;
geometry->render_operation.operationType = Ogre::RenderOperation::OT_TRIANGLE_LIST;
// Set up the vertex declaration.
Ogre::VertexDeclaration* vertex_declaration = geometry->render_operation.vertexData->vertexDeclaration;
size_t element_offset = 0;
vertex_declaration->addElement(0, element_offset, Ogre::VET_FLOAT3, Ogre::VES_POSITION);
element_offset += Ogre::VertexElement::getTypeSize(Ogre::VET_FLOAT3);
vertex_declaration->addElement(0, element_offset, Ogre::VET_COLOUR, Ogre::VES_DIFFUSE);
element_offset += Ogre::VertexElement::getTypeSize(Ogre::VET_COLOUR);
vertex_declaration->addElement(0, element_offset, Ogre::VET_FLOAT2, Ogre::VES_TEXTURE_COORDINATES);
#if GK_PLATFORM == GK_PLATFORM_APPLE_IOS
// Create the vertex buffer.
Ogre::HardwareVertexBufferSharedPtr vertex_buffer = Ogre::HardwareBufferManager::getSingleton().createVertexBuffer(vertex_declaration->getVertexSize(0), num_vertices, Ogre::HardwareBuffer::HBU_DYNAMIC_WRITE_ONLY_DISCARDABLE,true);
geometry->render_operation.vertexData->vertexBufferBinding->setBinding(0, vertex_buffer);
// Fill the vertex buffer.
RocketOgre3DVertex* ogre_vertices = (RocketOgre3DVertex*) vertex_buffer->lock(0, vertex_buffer->getSizeInBytes(), Ogre::HardwareBuffer::HBL_DISCARD);
#else
Ogre::HardwareVertexBufferSharedPtr vertex_buffer = Ogre::HardwareBufferManager::getSingleton().createVertexBuffer(vertex_declaration->getVertexSize(0), num_vertices, Ogre::HardwareBuffer::HBU_STATIC_WRITE_ONLY);
geometry->render_operation.vertexData->vertexBufferBinding->setBinding(0, vertex_buffer);
// Fill the vertex buffer.
RocketOgre3DVertex* ogre_vertices = (RocketOgre3DVertex*) vertex_buffer->lock(0, vertex_buffer->getSizeInBytes(), Ogre::HardwareBuffer::HBL_NORMAL);
#endif
for (int i = 0; i < num_vertices; ++i)
{
ogre_vertices[i].x = vertices[i].position.x;
ogre_vertices[i].y = vertices[i].position.y;
ogre_vertices[i].z = 0;
Ogre::ColourValue diffuse(vertices[i].colour.red / 255.0f, vertices[i].colour.green / 255.0f, vertices[i].colour.blue / 255.0f, vertices[i].colour.alpha / 255.0f);
render_system->convertColourValue(diffuse, &ogre_vertices[i].diffuse);
ogre_vertices[i].u = vertices[i].tex_coord[0];
ogre_vertices[i].v = vertices[i].tex_coord[1];
}
vertex_buffer->unlock();
#if GK_PLATFORM == GK_PLATFORM_APPLE_IOS
// Create the index buffer.
Ogre::HardwareIndexBufferSharedPtr index_buffer = Ogre::HardwareBufferManager::getSingleton().createIndexBuffer(Ogre::HardwareIndexBuffer::IT_16BIT, num_indices, Ogre::HardwareBuffer::HBU_STATIC_WRITE_ONLY);
geometry->render_operation.indexData->indexBuffer = index_buffer;
geometry->render_operation.useIndexes = true;
#else
Ogre::HardwareIndexBufferSharedPtr index_buffer = Ogre::HardwareBufferManager::getSingleton().createIndexBuffer(Ogre::HardwareIndexBuffer::IT_32BIT, num_indices, Ogre::HardwareBuffer::HBU_STATIC_WRITE_ONLY);
geometry->render_operation.indexData->indexBuffer = index_buffer;
geometry->render_operation.useIndexes = true;
#endif
// Fill the index buffer.
unsigned short * ogre_indices = (unsigned short*)index_buffer->lock(0, index_buffer->getSizeInBytes(), Ogre::HardwareBuffer::HBL_NORMAL);
#if GK_PLATFORM == GK_PLATFORM_APPLE_IOS
//unsigned short short_indices[num_indices];
for(int i=0;i<num_indices;i++)
ogre_indices[i] = indices[i];
//memcpy(ogre_indices, short_indices, sizeof(unsigned short) * num_indices);
#else
memcpy(ogre_indices, indices, sizeof(unsigned int) * num_indices);
#endif
index_buffer->unlock();
return reinterpret_cast<Rocket::Core::CompiledGeometryHandle>(geometry);
}
// Called by Rocket when it wants to render application-compiled geometry.
void RenderInterfaceOgre3D::RenderCompiledGeometry(Rocket::Core::CompiledGeometryHandle geometry, const Rocket::Core::Vector2f& translation)
{
Ogre::Matrix4 transform;
transform.makeTrans(translation.x, translation.y, 0);
render_system->_setWorldMatrix(transform);
render_system = Ogre::Root::getSingleton().getRenderSystem();
RocketOgre3DCompiledGeometry* ogre3d_geometry = (RocketOgre3DCompiledGeometry*) geometry;
if (ogre3d_geometry->texture != NULL)
{
render_system->_setTexture(0, true, ogre3d_geometry->texture->texture);
// Ogre can change the blending modes when textures are disabled - so in case the last render had no texture,
// we need to re-specify them.
render_system->_setTextureBlendMode(0, colour_blend_mode);
render_system->_setTextureBlendMode(0, alpha_blend_mode);
}
else
render_system->_disableTextureUnit(0);
render_system->_render(ogre3d_geometry->render_operation);
}