loading from RWTexture2D<float4> in a compute shader - hlsl

I understand there's a limitation in HLSL shader model 5.0 where one cannot load data from a non-scalar typed RWTexture2D resource. That is to say, the following is illegal:
RWTexture2D<float4> __color;
float4 c = __color[PixelCoord]; // error here
So what exactly is the workaround? I'm trying to accumulate into a float4 buffer in a compute shader, like so:
c = computeColor( ... );
__color[PixelCoord] += c;

Try doing:
float4 c = __color.Load( int3( UV, 0 ) );
Where UV is the xy coordinate in screen space (0 -> Resolution) of the texel you want to sample.
If you need to write to it, make sure it is bound from a UAV and not a shader resource view.


How to "fully bind" a constant buffer view to a descriptor range?

I am currently learning DirectX 12 and trying to get a demo application running. I am currently stuck at creating a pipeline state object using a root signature. I am using dxc to compile my vertex shader:
./dxc -T vs_6_3 -E main -Fo "basic.vert.dxi" -D DXIL "basic.vert"
My shader looks like this:
#pragma pack_matrix(row_major)
struct VertexData
float4 Position : SV_POSITION;
float4 Color : COLOR;
struct VertexInput
float3 Position : POSITION;
float4 Color : COLOR;
struct CameraData
float4x4 ViewProjection;
ConstantBuffer<CameraData> camera : register(b0, space0);
VertexData main(in VertexInput input)
VertexData vertex;
vertex.Position = mul(float4(input.Position, 1.0), camera.ViewProjection);
vertex.Color = input.Color;
return vertex;
Now I want to define a root signature for my shader. The definition looks something like this:
CD3DX12_DESCRIPTOR_RANGE1 descriptorRange;
CD3DX12_ROOT_PARAMETER1 rootParameter;
rootParameter.InitAsDescriptorTable(1, &descriptorRange, D3D12_SHADER_VISIBILITY_VERTEX);
ComPtr<ID3DBlob> signature, error;
rootSignatureDesc.Init_1_1(1, &rootParameter, 0, nullptr, D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT);
::D3D12SerializeVersionedRootSignature(&rootSignatureDesc, &signature, &error);
ComPtr<ID3D12RootSignature> rootSignature;
device->CreateRootSignature(0, signature->GetBufferPointer(), signature->GetBufferSize(), IID_PPV_ARGS(&rootSignature));
Finally, I pass the root signature along other state variables to the pipeline state object:
D3D12_GRAPHICS_PIPELINE_STATE_DESC pipelineStateDescription = {};
// ...
pipelineStateDescription.pRootSignature = rootSignature.Get();
ComPtr<ID3D12PipelineState> pipelineState;
device->CreateGraphicsPipelineState(&pipelineStateDescription, IID_PPV_ARGS(&pipelineState));
However, no matter what I do, the device keeps on complaining about the root signature not matching the vertex shader:
D3D12 ERROR: ID3D12Device::CreateGraphicsPipelineState: Root Signature doesn't match Vertex Shader: Shader CBV descriptor range (BaseShaderRegister=0, NumDescriptors=1, RegisterSpace=0) is not fully bound in root signature
I am confused about what this error is trying to tell me, since I clearly have a constant buffer bound to register(b0, space0). Or does it mean that I have to allocate a descriptor from a heap before creating the pipeline state object?
I also tried defining a root signature within the shader itself:
#define ShaderRootSignature \
"DescriptorTable( CBV(b0, space = 0, numDescriptors = 1, flags = DATA_STATIC ) ), "
... and compiling it using [RootSignature(ShaderRootSignature)], or specifying -rootsig-define "ShaderRootSignature" for dxc. Then I tried loading the signature as suggested here, however both approaches fail, since the root signature could not be read from the shader bytecode.
Any clarification on how to interpret the error message would be much appreciated, since I really do not know what the binding in a root signature means in this context. Thanks in advance! 🙂
Long story short: shader visibility in DX12 is not a bit field, like in Vulkan, so setting the visibility to D3D12_SHADER_VISIBILITY_VERTEX | D3D12_SHADER_VISIBILITY_PIXEL results in the parameter only being visible to the pixel shader. Setting it to D3D12_SHADER_VISIBILITY_ALL solved my problem.

Dynamic indexing into uniform array of sampler2D doesn't work

I need to index into array of 2 uniform sampler2D. The index is dynamic per frame.That's,I have a dynamic uniform buffer which provides that index to a fragment shader. I use Vulkan API 1.2. In device feature listing I have:
shaderSampledImageArrayDynamicIndexing = 1
I am not sure 100% but It looks like this feature is core in 1.2.Nevertheless I did try to enable it during device creation like this:
VkPhysicalDeviceFeatures features = {};
features.shaderSampledImageArrayDynamicIndexing = VK_TRUE;
Then plugging into device creation:
VkDeviceCreateInfo deviceCreateInfo = {};
deviceCreateInfo.pQueueCreateInfos = queueCreateInfos;
deviceCreateInfo.queueCreateInfoCount = 1;
deviceCreateInfo.pEnabledFeatures = &features ;
deviceCreateInfo.enabledExtensionCount = NUM_DEVICE_EXTENSIONS;
deviceCreateInfo.ppEnabledExtensionNames = deviceExtensionNames;
In the shader it looks like this:
layout(std140,set=0,binding=1)uniform Material
vec4 fparams0;
vec4 fparams1;
uvec4 iparams; //.z - array texture idx
uvec4 iparams1;
layout (set=1,binding = 0)uniform sampler2D u_ColorMaps[2];
layout (location = 0)in vec2 texCoord;
layout(location = 0) out vec4 outColor;
void main()
outColor = texture(u_ColorMaps[material.iparams.z],texCoord);
What I get is a combination of image pixels with some weird color. If I change to fixed indices - it works correctly. material.iparams.z param has been verified,it provides correct index number every frame (0 or 1). No idea what else is missing.Validation layers say nothing.
Mys setup: Windows, RTX3000 ,NVIDIA beta driver 443.41 (Vulkan 1.2)
I also found that dynamically indexed sampler return a value in Red channel (r)
which is close to one and zeros in GB. I don't set red color anyway,also the textures I fetch don't contain red. Here are two sreenshot, the upper is correct result which I get when indexing with constant value. Second is what happens when I index with dynamic uint which comes from dynamic UBO:
The problem was due to usage of Y′CBCR samplers. It appears that Vulkan disallows indexing dynamically into array of such uniforms.
Here is what Vulkan specs says:
If the combined image sampler enables sampler Y′CBCR conversion or
samples a subsampled image, it must be indexed only by constant
integral expressions when aggregated into arrays in shader code,
irrespective of the shaderSampledImageArrayDynamicIndexing feature.
So,the solution for me was to provide two separately bound samplers and use dynamic indices with if()..else condition to decide which sampler to use. Push constants would also work,but in this case I have to re-record command buffers all the time. Hopefully this info will be helpful to other people working with video formats in Vulkan API.

DirectX11 pixel shader in pipeline is missing

I'm writing a program which displays a MS3D model using DirectX, and unfortunately, the result shows nothing on the screen.
When I use the Graphics Debugger from Visual Studio 13, I notice that the pixel shader is missing from the pipeline, as it is shown in the below picture
This is my pixel shader source code:
cbuffer SkinningTransforms
matrix WorldMatrix;
matrix ViewProjMatrix;
// Inter-stage structures
struct VS_INPUT
float3 position : POSITION;
int4 bone : BONEID;
float4 weights : BONEWEIGHT;
float3 normal : NORMAL;
float3 tangent : TANGENT;
float2 tex : TEXCOORD;
struct VS_OUTPUT
float4 position : SV_Position;
float3 normal : NORMAL;
float3 light : LIGHT;
float2 tex : TEXCOORDS;
Texture2D ColorTexture : register( t0 );
SamplerState LinearSampler : register( s0 );
VS_OUTPUT output;
//Transform vertex and pass them to the pixel shader
return output;
float4 PSMAIN( in VS_OUTPUT input ) : SV_Target
// Calculate the lighting
float3 n = normalize( input.normal );
float3 l = normalize( input.light );
float4 texColor = ColorTexture.Sample( LinearSampler, input.tex );
float4 color = texColor * (max(dot(n,l),0) + 0.05f );
return( color );
As I was known from Graphics Debugger, all of the graphics event are right. I listed in below important events, which might be relating to Pixel Shader:
106:(obj:4) ID3D11Device::CreateDepthStencilView(obj:24,NULL,obj:25)*
108:(obj:5) ID3D11DeviceContext::OMSetRenderTargets(8,{obj:1,NULL,NULL,NULL,NULL,NULL,NULL,NULL},obj:25)*
109:(obj:5) ID3D11DeviceContext::ClearRenderTargetView(obj:1,addr:21)*
111:(obj:5) ID3D11DeviceContext::ClearDepthStencilView(obj:25,1,1.000f,0)*
119:(obj:4) ID3D11Device::CreateSamplerState(addr:24,obj:27)*
134:(obj:4) ID3D11Device::CreatePixelShader(addr:27,21056,NULL,obj:30)*
135:CreateObject(D3D11 Pixel Shader,obj:30)
136:(obj:5) ID3D11DeviceContext::PSSetShader(obj:30,NULL,0)*
137:(obj:5) ID3D11DeviceContext::PSSetSamplers(0,1,{obj:27})*
139:(obj:4) ID3D11Device::CreateTexture2D(addr:28,addr:5,obj:31)*
140:CreateObject(D3D11 Texture2D,obj:31)
142:(obj:4) ID3D11Device::CreateShaderResourceView(obj:31,NULL,obj:32)*
143:CreateObject(D3D11 Shader Resource View,obj:32)
144:(obj:5) ID3D11DeviceContext::PSSetShaderResources(0,1,{obj:32})*
146:(obj:4) ID3D11Device::CreateRasterizerState(addr:29,obj:33)*
147:CreateObject(D3D11 Rasterizer State,obj:33)
152:(obj:5) ID3D11DeviceContext::RSSetState(obj:33)*
154:(obj:5) ID3D11DeviceContext::RSSetViewports(1,addr:30)*
156:(obj:4) ID3D11Device::CreateBlendState(addr:11,obj:34)*
157:CreateObject(D3D11 Blend State,obj:34)
159:(obj:5) ID3D11DeviceContext::OMSetBlendState(obj:34,addr:31,-1)*
162:(obj:4) ID3D11Device::CreateDepthStencilState(addr:32,obj:35)*
163:CreateObject(D3D11 Depth-Stencil State,obj:35)
165:(obj:5) ID3D11DeviceContext::OMSetDepthStencilState(obj:35,0)*
I debugged all of the function in the above list, and all of them return OK. Nothing wrong.
My question is what is the reason the pixex shader is missing from pipleline, which in turn may result in the empty screen.
Adding to the other answers, constant buffer organization can be the cause of this problem. In my case, the pixel shader was missing from the pipeline but also the vertex shader wasn't transforming the vertices correctly. Upon inspection it was revealed that the world matrix had incorrect values because the boolean value at the top of the constant buffer was causing data misalignment. HLSL packs data into 16 byte boundries which are so called vectors that have 4 components. A boolean is 4 bytes which is the same with a float.
cbuffer cbPerObject : register( b1 )
bool gUseTexture ;
row_major float4x4 gWorld ;
row_major float4x4 gWorldInvTranspose ;
row_major float4x4 gWorldViewProj ;
row_major float4x4 gTexTransform ;
Material gMaterial ;
} ;
So in the constant buffer above, the boolean + the first 3 components of the first row of the world matrix gets mapped to the first vector and this causes everything to get shifted by 3 components, misaligning the world matrix (and the other matrices following and possibly other data).
Two possible solutions :
Move the boolean to the end of the structure. I did this and it worked.
Add a 3-component sized padding variable between the world matrix and the boolean.
I tried this by adding an XMFLOAT3 in the c++ structure and a float3 in HLSL. This worked too.
Long story short, pay attention to HLSL packing.
EDIT : At the time I thought these methods worked, as all variables except the boolean had correct values. I didn't use the bool at the time so I assumed that was fine too. Turns out it's not.
HLSL bools and c++ bools have different sizes. HLSL bools are 4 bytes, whereas c++ bools are implementation defined (1 byte on my machine for example). Anyways, they will most likely be different and it causes problems.
Either use Windows BOOL type or another appropriately-sized value like an int or a uint.
Take a look at https://gamedev.stackexchange.com/a/22605.
Also the second to last post here explains the situation clearly (this link is referenced in the answer in the gamedev link above also).
Beware though because packing is still an issue. Even if you use a BOOL or a uint or whatever, if you place it in the beginning in the above structure as before, you will get incorrect values in your constant buffer. So take both of these issues (data alignment and the boolean problem) into consideration when working with constant buffers.
As I wrote in my comment, I had a similar problem.
In my case, the pixel shader was correctly bound (see http://msdn.microsoft.com/en-us/library/jj191650.aspx). Furthermore, I ensured by debugging the vertex shader that the result of the Transformation should be visible and hence should generate visible fragments.
In this case (which seems to be the same you describe), make sure that your rasteriser state is correct. You might want to check that it is actually set (using the graphics object view of the immediate context) and that it lets your geometry through. For debugging purposes, I found it helpful disabling backface culling. I use
ZeroMemory(&rasterDesc, sizeof(rasterDesc));
rasterDesc.CullMode = D3D11_CULL_MODE::D3D11_CULL_NONE;
rasterDesc.FillMode = D3D11_FILL_MODE::D3D11_FILL_SOLID;

How can I feed compute shader results into vertex shader w/o using a vertex buffer?

Before I go into details I want outline the problem:
I use RWStructuredBuffers to store the output of my compute shaders (CS). Since vertex and pixel shaders can’t read from RWStructuredBuffers, I map a StructuredBuffer onto the same slot (u0/t0) and (u4/t4):
cbuffer cbWorld : register (b1)
float4x4 worldViewProj;
int dummy;
struct VS_IN
float4 pos : POSITION;
float4 col : COLOR;
struct PS_IN
float4 pos : SV_POSITION;
float4 col : COLOR;
RWStructuredBuffer<float4> colorOutputTable : register (u0); // 2D color data
StructuredBuffer<float4> output2 : register (t0); // same as u0
RWStructuredBuffer<int> counterTable : register (u1); // depth data for z values
RWStructuredBuffer<VS_IN>vertexTable : register (u4); // triangle list
StructuredBuffer<VS_IN>vertexTable2 : register (t4); // same as u4
I use a ShaderRecourceView to grant pixel and/or vertex shader access to the buffers. This concept works fine for my pixel shader, the vertex shader however seems to read only 0 values (I use SV_VertexID as index to the buffers):
PS_IN VS_3DA ( uint vid : SV_VertexID )
PS_IN output = (PS_IN)0;
PS_IN input = vertexTable2[vid];
output.pos = mul(input.pos, worldViewProj);
output.col = input.col;
return output;
No error messages or warnings from the hlsl compiler, the renderloop runs with 60 fps (using vsync), but the screen remains black. Since I blank the screen with Color.White before Draw(..) is called, the render pipeline seems to be active.
When I read the triangle data content via an UAView from the GPU into “vertArray” and feed it back into a vertex buffer, everything works however:
let vertices = Buffer.Create(device, BindFlags.VertexBuffer, vertArray)
context.InputAssembler.SetVertexBuffers(0, new VertexBufferBinding(vertices, Utilities.SizeOf<Vector4>() * 2, 0))
PS_IN VS_3D (VS_IN input )
PS_IN output = (PS_IN)0;
output.pos = mul(input.pos, worldViewProj);
output.col = input.col;
return output;
Here the definition of the 2D - Vertex / Pixelshaders. Please note that PS_2D accesses the buffer "output2" in slot t0 - and that's exactly the "trick" what I want to replicate for then 3D vertex shader "VS_3DA":
float4 PS_2D ( float4 input : SV_Position) : SV_Target
uint2 pixel = uint2(input.x, input.y);
return output2[ pixel.y * width + pixel.x];
float4 VS_2D ( uint vid : SV_VertexID ) : SV_POSITION
if (vid == 0)
return float4(-1, -1, 0, 1);
if (vid == 1)
return float4( 1, -1, 0, 1);
if (vid == 2)
return float4(-1, 1, 0, 1);
return float4( 1, 1, 0, 1);
For three days I have searched and experimented to no avail. All informations I gathered seem to confirm that my approach using then SV_VertexID should work.
Can anybody give advice? Thanks for reading my post!
I like the concept of DirectX 11 compute shaders very much and I want to employ it for algebraic computing. As a test case I render fractals (Mandelbrot sets) in 3D. Everything works as expected – except one last brick in the wall is missing.
The computation takes the following steps:
Using a CS to compute a 2D texture (output is “counterTable” and “colorOutbutTable” (works)
Optionally render this texture to screen (works)
Using another CS to generate a mesh (triangle list). This CS takes x, y, and color values from step 1, computes the z coordinate, and finally creates a quad for each pixel. The result is stored in “vertexTable”. (works)
Feeding the triangles list to the vertex shader (problem!!!)
Render to screen (works - using a vertex buffer).
For programming I use F# 3.0 and SharpDX as .NET wrapper.
The ShaderRessourceView for both shaders (pixel & vertex) is set up with the same parameters (except the size parameters):
let mutable descr = new BufferDescription()
descr.BindFlags <- BindFlags.UnorderedAccess ||| BindFlags.ShaderResource
descr.Usage <- ResourceUsage.Default
descr.CpuAccessFlags <- CpuAccessFlags.None
descr.StructureByteStride <- xxx / / depends on shader
descr.SizeInBytes <- yyy / / depends on shader
descr.OptionFlags <- ResourceOptionFlags.BufferStructured
Nothing special here.
Creation of 2D buffer (binds to buffer "output2" in slot t0):
outputBuffer2D <- new Buffer(device, descr)
outputView2D <- new UnorderedAccessView (device, outputBuffer2D)
shaderResourceView2D <- new ShaderResourceView (device, outputBuffer2D)
Creation of 3D buffer (binds to "vertexTable2" in slot t4):
vertexBuffer3D <- new Buffer(device, descr)
shaderResourceView3D <- new ShaderResourceView (device, vertexBuffer3D)
// UAView not required here
Setting resources for 2D:
context.InputAssembler.PrimitiveTopology <- PrimitiveTopology.TriangleStrip
context.VertexShader.Set (vertexShader2D)
context.PixelShader.Set (pixelShader2D)
render 2D:
context.PixelShader.SetShaderResource(COLOR_OUT_SLOT, shaderResourceView2D)
context.PixelShader.SetConstantBuffer(CONSTANT_SLOT_GLOBAL, constantBuffer2D )
context.ClearRenderTargetView (renderTargetView2D, Color.White.ToColor4())
swapChain.Present(1, PresentFlags.None)
Setting resources for 3D:
context.InputAssembler.PrimitiveTopology <- PrimitiveTopology.TriangleList
context.OutputMerger.SetTargets(depthView3D, renderTargetView2D)
context.VertexShader.SetShaderResource(TRIANGLE_SLOT, shaderResourceView3D )
context.VertexShader.SetConstantBuffer(CONSTANT_SLOT_3D, constantBuffer3D)
render 3D (doesn’t work – black screen as output result)
context.ClearDepthStencilView(depthView3D, DepthStencilClearFlags.Depth, 1.0f, 0uy)
context.Draw(dataXsize * dataYsize * 6, 0)
swapChain.Present(1, PresentFlags.None)
Finally the slot numbers:
static let CONSTANT_SLOT_3D = 1
static let COLOR_OUT_SLOT = 0
static let COUNTER_SLOT = 1
static let COLOR_SLOT = 2
static let TRIANGLE_SLOT = 4
Ok first thing I would suggest, is to turn on debug layer (Use Debug flag when you create your device), then go to project properties, debug tab, and tick "Enable unmanaged code debugging" or "Enable native code debugging".
When you start to debug the program the runtime will give you potential warnings if something wrong with pipeline state.
One potential issue (which looks the most likely one from what you posted):
Make sure to clean your compute shader UAV slots after dispatching. If you try to bind vertexTable2 to your vertex shader, but the resource is still bound as compute shader output, the runtime will automatically set your ShaderView to null (which will in turn return 0 when you try to read it).
To clean your Compute Shader, call this on your device context one you're done with dispatch:
ComputeShader.SetUnorderedAccessView(TRIANGLE_SLOT, null)
Please also note that PixelShader can access RWStructuredBuffer (technically you can use RWStructuredBuffer for any shader type if you have feature level 11.1, that means recent ATI card and Windows 8+).
Feeding the triangles list to the vertex shader (problem!!!)
Instead of using structured buffers (which don't let you bind as a vb), I would look into using raw buffers. It requires casting in the shader, but allows you to use the same buffer in your cs and vs.
When creating the buffer, do:
D3D11_BUFFER_DESC desc = {};
desc.ByteWidth = byteSize;
You could then bind as a shader resource:
desc.ViewDimension = D3D11_SRV_DIMENSION_BUFFEREX;
desc.BufferEx.FirstElement = 0;
desc.BufferEx.Flags = D3D11_BUFFEREX_SRV_FLAG_RAW;
desc.BufferEx.NumElements = descBuf.ByteWidth / 4;
or Unordered Access View:
desc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
desc.Buffer.FirstElement = 0;
desc.Format = DXGI_FORMAT_R32_TYPELESS; // Format must be DXGI_FORMAT_R32_TYPELESS, when creating Raw Unordered Access View
desc.Buffer.Flags = D3D11_BUFFER_UAV_FLAG_RAW;
desc.Buffer.NumElements = descBuf.ByteWidth / 4;
In the shader you would use something like this:
ByteAddressBuffer Buffer0 : register(t0);
ByteAddressBuffer Buffer1 : register(t1);
RWByteAddressBuffer BufferOut : register(u0);
int i0 = asint( Buffer0.Load( DTid.x*8 ) );
float f0 = asfloat( Buffer0.Load( DTid.x*8+4 ) );
int i1 = asint( Buffer1.Load( DTid.x*8 ) );
float f1 = asfloat( Buffer1.Load( DTid.x*8+4 ) );
BufferOut.Store( DTid.x*8, asuint(i0 + i1) );
BufferOut.Store( DTid.x*8+4, asuint(f0 + f1) );
Sample code above was taken from the BasicCompute11 sample from the DirectX June 2010 SDK. It demonstrates using both structured buffers and raw buffers.

C++ shader question

I am using Nvidia CG and Direct3D9 and have the question about the following code.
It compiles, but doesn't "loads" (using cgLoadProgram wrapper) and the resulting failure is described simplyas D3D failure happened.
It's a part of the pixel shader compiled with shader model set to 3.0
What may be interesting is that this shader loads fine in the following cases:
1) Manually unrolling the while statement (to many if { } statements).
2) Removing the line with the tex2D function in the loop.
3) Switching to shader model 2_X and manually unrolling the loop.
Problem part of the shader code:
float2 tex = float2(1, 1);
float2 dtex = float2(0.01, 0.01);
float h = 1.0 - tex2D(height_texture1, tex);
float height = 1.00;
while ( h < height )
height -= 0.1;
tex += dtex;
// Remove the next line and it works (not as expected,
// of course)
h = tex2D( height_texture1, tex );
If someone knows why this can happen or could test the similiar code in non-CG environment or could help me in some other way, I'm waiting for you ;)
I think you need to determine the gradients before the loop using ddx/ddy on the texture coordinates and then use tex2D(sampler2D samp, float2 s, float2 dx, float2 dy)
The GPU always renders quads not pixels (even on pixel borders - superfluous pixels are discarded by the render backend). This is done because it allows it to always calculate the screen space texture derivates even when you use calculated texture coordinates. It just needs to take the difference between the values at the pixel centers.
But this doesn't work when using dynamic branching like in the code in the question, because the shader processors at the individual pixels could diverge in control flow. So you need to calculate the derivates manually via ddx/ddy before the program flow can diverge.