Accessing Index buffer in shaders (Directx 11) - c++

I have a vertex and index buffer and I am rendering a mesh to just one pixel and I want to know which triangle of the mesh is rendered and access its index in index buffer on cpu for further process(Base on my mesh only one triangle can rendered to that pixel).
I first implement it with SV_PrimitiveId and I hope it would generate 0 for first three indexes of index buffer (first triangle) and generate 1 for second three indexes and so on.This way I could copy data from gpu and read that id and find the triangle but the problem was that ids did not correspond to my index buffer(ie. As I run the program it gives for example third triangle id 7, the other time 10 and so on).
I want to know is there anyway to determine which triangle is pixel shader drawing and find its index in index buffer to find it on cpu?

This should work:
C++:
...
Microsoft::WRL::ComPtr<ID3D11Texture2D> pPrimitiveIDs;
Microsoft::WRL::ComPtr<ID3D11RenderTargetView> pPIDsRTV;
Microsoft::WRL::ComPtr<ID3D11Texture2D> pPIDsStaging;
...
const int number_of_rtvs = 2;
ID3D11RenderTargetView* rtvs[number_of_rtvs] =
{
pScreenRTV.Get(),
pPIDsRTV.Get(),
};
pDeviceContext->OMSetRenderTargets(number_of_rtvs, rtvs, pDepthStencilView.Get());
...
pDeviceContext->CopyResource(pPIDStaging.Get(), pPrimitiveIDs.Get());
D3D11_MAPPED_SUBRESOURCE MappedResource;
pDeviceContext->Map(pPIDStaging.Get(), 0, D3D11_MAP_READ, 0, &MappedResource);
// here is the pid
// in case of a 1x1 back buffer you would just read the first value
UINT pid = *((UINT*)MappedResource.pData + MouseX + WindowWidth * MouseY);
pDeviceContext->Unmap(pPIDStaging.Get(), 0);
...
Pixel Shader:
struct PSOutput
{
float4 color : SV_Target0;
uint pid : SV_Target1;
};
PSOutput main(..., uint pid : SV_PrimitiveId)
{
...
PSOutput output =
{
color,
pid,
};
return output;
}

Related

fast rasterization on with opencl

i am writing a rasterizer for real-time 3d rendering with opencl.
my current architecture:
vertex shader: 1 thread per vertex
rasterizer: 1 thread per face that loops over all pixels covered by the face
fragment shader: 1 thread per pixel
this works well when the faces occupy a small screen space but when i have one covering a large portion of the screen, the frame rate tanks on account of the fact that the rasterization thread must synchronously loop over all pixels the face covers.
I think this could be solved by a tiled approach. The screen would be divided into subsections (tiles), and one thread would be launched per tile. Only the faces whose bounding box overlap the tile would be processed.
I have some questions about this method though:
Should I find the tile's overlapping faces the CPU or GPU?
What data structure should be used to store the face lists? They will have variable length, however I believe OpenCL buffers are fixed length.
Sample of host code of current implementation:
// set up vertex shader args
queue.enqueueNDRangeKernel(vertexShader, cl::NullRange, numVerts, cl::NullRange);
// set up rasterizer args
queue.enqueueNDRangeKernel(rasterizer, cl::NullRange, numFaces, cl::NullRange);
// set up fragment shader args
queue.enqueueNDRangeKernel(fragmentShader, cl::NullRange, numPixels, cl::NullRange);
// read frame buffer to draw to screen
queue.enqueueReadBuffer(buffer_screen, CL_TRUE, 0, width * height * 3 * sizeof(unsigned char), screen);
Sample of rasterizer kernel:
float2 bboxmin = (float2)(INFINITY,INFINITY);
float2 bboxmax = (float2)(-INFINITY,-INFINITY);
float2 clampCoords = (float2)(width-1, height-1);
// get bounding box
for (int i=0; i<3; i++) {
for (int j=0; j<2; j++) {
bboxmin[j] = max(0.f, min(bboxmin[j], vs[i][j]));
bboxmax[j] = min(clampCoords[j], max(bboxmax[j], vs[i][j]));
}
}
// loop over all pixels in bounding box
// this is the part that needs to be improved
int2 pix;
for (pix.x=bboxmin.x; pix.x<=bboxmax.x; pix.x++) {
for (pix.y=bboxmin.y; pix.y<=bboxmax.y; pix.y++) {
float3 bc_screen = barycentric(vs[0].xy, vs[1].xy, vs[2].xy, (float2)(pix.x,pix.y), offset);
float3 bc_clip = (float3)(bc_screen.x/vsVP[0][3], bc_screen.y/vsVP[1][3], bc_screen.z/vsVP[2][3]);
bc_clip = bc_clip/(bc_clip.x+bc_clip.y+bc_clip.z);
float frag_depth = dot(homoZs, bc_clip);
int pixInd = pix.x+pix.y*width;
if (bc_screen.x<0 || bc_screen.y<0 || bc_screen.z<0 || zbuffer[pixInd]>frag_depth) continue;
zbuffer[pixInd] = frag_depth;
}
}
A workaround is to cancel rasterization if a face gets too large and just return. This will lead to some visual artifacts, but at least the frame rate won't suffer.

How to efficiently count the times that a color was used during whole Pixel Shader Stage?

I am thinking to achieve it in the pixel shader.Here is part of my code:
Firstly, I create a Texture1D as a color table
D3D11_TEXTURE1D_DESC t1d;
t1d.Width = ModelInfo::ColorCount;
t1d.ArraySize = 1;
t1d.MipLevels = 1;
t1d.CPUAccessFlags = 0;
t1d.MiscFlags = 0;
t1d.Format = DXGI_FORMAT_R32G32B32A32_FLOAT;
t1d.Usage = D3D11_USAGE_DEFAULT;
t1d.BindFlags = D3D11_BIND_SHADER_RESOURCE;
ZeroMemory(&InitData, sizeof(InitData));
InitData.pSysMem = ModelInfo::Colors;
hr = m_D3DDevice->CreateTexture1D(&t1d, &InitData, &m_ColorTable);
if (FAILED(hr))
return hr;
D3D11_SHADER_RESOURCE_VIEW_DESC viewDesc;
viewDesc.Format = DXGI_FORMAT_R32G32B32A32_FLOAT;
viewDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURE1D;
viewDesc.Texture1D.MostDetailedMip = 0;
viewDesc.Texture1D.MipLevels = 1;
hr = m_D3DDevice->CreateShaderResourceView(m_ColorTable, &viewDesc, &m_ColorResView);
if (FAILED(hr))
return hr;
And then I pass it to the Pixel shader
m_ImmediateContext->PSSetShaderResources(0, 1, &m_ColorResView);
In the pixel shader, i use this color table like this:
Texture1D RandomTex : register(t0);
float4 PS(VS_OUTPUT Input, uint Primitive : SV_PrimitiveID) : SV_Target
{
uint Index = Primitive % ColorCount.x;
return RandomTex[Index];
}
I want to use the alpha channel of each color in this color table to count how many times the color used during whole Pixel Shader Stage...
I want to to modify the color table in the pixel shader just like the code below.But it seem to be infeasible.
RandomTex[Index].a = RandomTex[Index].a + 1;
I have been finding a way to count the color efficiently rather than render it on a texture and count in on cpu using c++.
All method I have thought has to do some extra counting work on cpu because I find it is hard to do the operation like x++(maybe some parallel problem on gpu),besides those method that I thought need to render the texture twice which might be slower that counting it on cpu straightly.
I am digging into it for a long time.But no use.Please help or try to give some ideas how to achieve this.
You can attach a small write buffer to your pixel shader (with an unordered view), and use atomic operations in the pixel shader.
Here is the modified HLSL code
Texture1D RandomTex : register(t0);
RWStructuredBuffer<uint> RWColorCountData : register(u1);
float4 PS(VS_OUTPUT Input, uint Primitive : SV_PrimitiveID) : SV_Target
{
uint Index = Primitive % ColorCount.x;
InterlockedAdd(RWColorCountData[Index], 1);
return RandomTex[Index];
}
Here I use a StructuredBuffer, but you can also use a ByteAddressBuffer if you prefer. Also note that resource is attached to register u1, as first slot is still taken by your render target.
Your write buffer should have the same element count as your 1d texture, and needs to be attached to the pipeline (on top of the render target), with OMSetRenderTargetsAndUnorderedAccessViews
Every frame, you will also need to clear your buffer back to 0 (if required), otherwise, values will increment over time, for this you can use ClearUnorderedAccessViewUint
Please note in your case, as you are using a uint buffer and the function expects UINT Values[4], only Values[0] will be used as clear value.

OpenGL glMultiDrawElementsIndirect with Interleaved Buffers

Originally using glDrawElementsInstancedBaseVertex to draw the scene meshes. All the meshes vertex attributes are being interleaved in a single buffer object. In total there are only 30 unique meshes. So I've been calling draw 30 times with instance counts, etc. but now I want to batch the draw calls into one using glMultiDrawElementsIndirect. Since I have no experience with this command function, I've been reading articles here and there to understand the implementation with little success. (For testing purposes all meshes are instanced only once).
The command structure from the OpenGL reference page.
struct DrawElementsIndirectCommand
{
GLuint vertexCount;
GLuint instanceCount;
GLuint firstVertex;
GLuint baseVertex;
GLuint baseInstance;
};
DrawElementsIndirectCommand commands[30];
// Populate commands.
for (size_t index { 0 }; index < 30; ++index)
{
const Mesh* mesh{ m_meshes[index] };
commands[index].vertexCount = mesh->elementCount;
commands[index].instanceCount = 1; // Just testing with 1 instance, ATM.
commands[index].firstVertex = mesh->elementOffset();
commands[index].baseVertex = mesh->verticeIndex();
commands[index].baseInstance = 0; // Shouldn't impact testing?
}
// Create and populate the GL_DRAW_INDIRECT_BUFFER buffer... bla bla
Then later down the line, after setup I do some drawing.
// Some prep before drawing like bind VAO, update buffers, etc.
// Draw?
if (RenderMode == MULTIDRAW)
{
// Bind, Draw, Unbind
glBindBuffer(GL_DRAW_INDIRECT_BUFFER, m_indirectBuffer);
glMultiDrawElementsIndirect (GL_TRIANGLES, GL_UNSIGNED_INT, nullptr, 30, 0);
glBindBuffer(GL_DRAW_INDIRECT_BUFFER, 0);
}
else
{
for (size_t index { 0 }; index < 30; ++index)
{
const Mesh* mesh { m_meshes[index] };
glDrawElementsInstancedBaseVertex(
GL_TRIANGLES,
mesh->elementCount,
GL_UNSIGNED_INT,
reinterpret_cast<GLvoid*>(mesh->elementOffset()),
1,
mesh->verticeIndex());
}
}
Now the glDrawElements... still works fine like before when switched. But trying glMultiDraw... gives indistinguishable meshes but when I set the firstVertex to 0 for all commands, the meshes look almost correct (at least distinguishable) but still largely wrong in places?? I feel I'm missing something important about indirect multi-drawing?
//Indirect data
commands[index].firstVertex = mesh->elementOffset();
//Direct draw call
reinterpret_cast<GLvoid*>(mesh->elementOffset()),
That's not how it works for indirect rendering. The firstVertex is not a byte offset; it's the first vertex index. So you have to divide the byte offset by the size of the index to compute firstVertex:
commands[index].firstVertex = mesh->elementOffset() / sizeof(GLuint);
The result of that should be a whole number. If it wasn't, then you were doing unaligned reads, which probably hurt your performance. So fix that ;)

How can I feed compute shader results into vertex shader w/o using a vertex buffer?

Before I go into details I want outline the problem:
I use RWStructuredBuffers to store the output of my compute shaders (CS). Since vertex and pixel shaders can’t read from RWStructuredBuffers, I map a StructuredBuffer onto the same slot (u0/t0) and (u4/t4):
cbuffer cbWorld : register (b1)
{
float4x4 worldViewProj;
int dummy;
}
struct VS_IN
{
float4 pos : POSITION;
float4 col : COLOR;
};
struct PS_IN
{
float4 pos : SV_POSITION;
float4 col : COLOR;
};
RWStructuredBuffer<float4> colorOutputTable : register (u0); // 2D color data
StructuredBuffer<float4> output2 : register (t0); // same as u0
RWStructuredBuffer<int> counterTable : register (u1); // depth data for z values
RWStructuredBuffer<VS_IN>vertexTable : register (u4); // triangle list
StructuredBuffer<VS_IN>vertexTable2 : register (t4); // same as u4
I use a ShaderRecourceView to grant pixel and/or vertex shader access to the buffers. This concept works fine for my pixel shader, the vertex shader however seems to read only 0 values (I use SV_VertexID as index to the buffers):
PS_IN VS_3DA ( uint vid : SV_VertexID )
{
PS_IN output = (PS_IN)0;
PS_IN input = vertexTable2[vid];
output.pos = mul(input.pos, worldViewProj);
output.col = input.col;
return output;
}
No error messages or warnings from the hlsl compiler, the renderloop runs with 60 fps (using vsync), but the screen remains black. Since I blank the screen with Color.White before Draw(..) is called, the render pipeline seems to be active.
When I read the triangle data content via an UAView from the GPU into “vertArray” and feed it back into a vertex buffer, everything works however:
Program:
let vertices = Buffer.Create(device, BindFlags.VertexBuffer, vertArray)
context.InputAssembler.SetVertexBuffers(0, new VertexBufferBinding(vertices, Utilities.SizeOf<Vector4>() * 2, 0))
HLSL:
PS_IN VS_3D (VS_IN input )
{
PS_IN output = (PS_IN)0;
output.pos = mul(input.pos, worldViewProj);
output.col = input.col;
return output;
}
Here the definition of the 2D - Vertex / Pixelshaders. Please note that PS_2D accesses the buffer "output2" in slot t0 - and that's exactly the "trick" what I want to replicate for then 3D vertex shader "VS_3DA":
float4 PS_2D ( float4 input : SV_Position) : SV_Target
{
uint2 pixel = uint2(input.x, input.y);
return output2[ pixel.y * width + pixel.x];
}
float4 VS_2D ( uint vid : SV_VertexID ) : SV_POSITION
{
if (vid == 0)
return float4(-1, -1, 0, 1);
if (vid == 1)
return float4( 1, -1, 0, 1);
if (vid == 2)
return float4(-1, 1, 0, 1);
return float4( 1, 1, 0, 1);
}
For three days I have searched and experimented to no avail. All informations I gathered seem to confirm that my approach using then SV_VertexID should work.
Can anybody give advice? Thanks for reading my post!
=====================================================================
DETAILS:
I like the concept of DirectX 11 compute shaders very much and I want to employ it for algebraic computing. As a test case I render fractals (Mandelbrot sets) in 3D. Everything works as expected – except one last brick in the wall is missing.
The computation takes the following steps:
Using a CS to compute a 2D texture (output is “counterTable” and “colorOutbutTable” (works)
Optionally render this texture to screen (works)
Using another CS to generate a mesh (triangle list). This CS takes x, y, and color values from step 1, computes the z coordinate, and finally creates a quad for each pixel. The result is stored in “vertexTable”. (works)
Feeding the triangles list to the vertex shader (problem!!!)
Render to screen (works - using a vertex buffer).
For programming I use F# 3.0 and SharpDX as .NET wrapper.
The ShaderRessourceView for both shaders (pixel & vertex) is set up with the same parameters (except the size parameters):
let mutable descr = new BufferDescription()
descr.BindFlags <- BindFlags.UnorderedAccess ||| BindFlags.ShaderResource
descr.Usage <- ResourceUsage.Default
descr.CpuAccessFlags <- CpuAccessFlags.None
descr.StructureByteStride <- xxx / / depends on shader
descr.SizeInBytes <- yyy / / depends on shader
descr.OptionFlags <- ResourceOptionFlags.BufferStructured
Nothing special here.
Creation of 2D buffer (binds to buffer "output2" in slot t0):
outputBuffer2D <- new Buffer(device, descr)
outputView2D <- new UnorderedAccessView (device, outputBuffer2D)
shaderResourceView2D <- new ShaderResourceView (device, outputBuffer2D)
Creation of 3D buffer (binds to "vertexTable2" in slot t4):
vertexBuffer3D <- new Buffer(device, descr)
shaderResourceView3D <- new ShaderResourceView (device, vertexBuffer3D)
// UAView not required here
Setting resources for 2D:
context.InputAssembler.PrimitiveTopology <- PrimitiveTopology.TriangleStrip
context.OutputMerger.SetRenderTargets(renderTargetView2D)
context.OutputMerger.SetDepthStencilState(depthStencilState2D)
context.VertexShader.Set (vertexShader2D)
context.PixelShader.Set (pixelShader2D)
render 2D:
context.PixelShader.SetShaderResource(COLOR_OUT_SLOT, shaderResourceView2D)
context.PixelShader.SetConstantBuffer(CONSTANT_SLOT_GLOBAL, constantBuffer2D )
context.ClearRenderTargetView (renderTargetView2D, Color.White.ToColor4())
context.Draw(4,0)
swapChain.Present(1, PresentFlags.None)
Setting resources for 3D:
context.InputAssembler.PrimitiveTopology <- PrimitiveTopology.TriangleList
context.OutputMerger.SetTargets(depthView3D, renderTargetView2D)
context.VertexShader.SetShaderResource(TRIANGLE_SLOT, shaderResourceView3D )
context.VertexShader.SetConstantBuffer(CONSTANT_SLOT_3D, constantBuffer3D)
context.VertexShader.Set(vertexShader3D)
context.PixelShader.Set(pixelShader3D)
render 3D (doesn’t work – black screen as output result)
context.ClearDepthStencilView(depthView3D, DepthStencilClearFlags.Depth, 1.0f, 0uy)
context.Draw(dataXsize * dataYsize * 6, 0)
swapChain.Present(1, PresentFlags.None)
Finally the slot numbers:
static let CONSTANT_SLOT_GLOBAL = 0
static let CONSTANT_SLOT_3D = 1
static let COLOR_OUT_SLOT = 0
static let COUNTER_SLOT = 1
static let COLOR_SLOT = 2
static let TRIANGLE_SLOT = 4
Ok first thing I would suggest, is to turn on debug layer (Use Debug flag when you create your device), then go to project properties, debug tab, and tick "Enable unmanaged code debugging" or "Enable native code debugging".
When you start to debug the program the runtime will give you potential warnings if something wrong with pipeline state.
One potential issue (which looks the most likely one from what you posted):
Make sure to clean your compute shader UAV slots after dispatching. If you try to bind vertexTable2 to your vertex shader, but the resource is still bound as compute shader output, the runtime will automatically set your ShaderView to null (which will in turn return 0 when you try to read it).
To clean your Compute Shader, call this on your device context one you're done with dispatch:
ComputeShader.SetUnorderedAccessView(TRIANGLE_SLOT, null)
Please also note that PixelShader can access RWStructuredBuffer (technically you can use RWStructuredBuffer for any shader type if you have feature level 11.1, that means recent ATI card and Windows 8+).
Feeding the triangles list to the vertex shader (problem!!!)
Instead of using structured buffers (which don't let you bind as a vb), I would look into using raw buffers. It requires casting in the shader, but allows you to use the same buffer in your cs and vs.
When creating the buffer, do:
D3D11_BUFFER_DESC desc = {};
desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE | D3D11_BIND_VERTEX_BUFFER;
desc.ByteWidth = byteSize;
desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS;
You could then bind as a shader resource:
D3D11_SHADER_RESOURCE_VIEW_DESC desc = {};
desc.ViewDimension = D3D11_SRV_DIMENSION_BUFFEREX;
desc.BufferEx.FirstElement = 0;
desc.Format = DXGI_FORMAT_R32_TYPELESS;
desc.BufferEx.Flags = D3D11_BUFFEREX_SRV_FLAG_RAW;
desc.BufferEx.NumElements = descBuf.ByteWidth / 4;
or Unordered Access View:
D3D11_UNORDERED_ACCESS_VIEW_DESC desc = {};
desc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
desc.Buffer.FirstElement = 0;
desc.Format = DXGI_FORMAT_R32_TYPELESS; // Format must be DXGI_FORMAT_R32_TYPELESS, when creating Raw Unordered Access View
desc.Buffer.Flags = D3D11_BUFFER_UAV_FLAG_RAW;
desc.Buffer.NumElements = descBuf.ByteWidth / 4;
In the shader you would use something like this:
ByteAddressBuffer Buffer0 : register(t0);
ByteAddressBuffer Buffer1 : register(t1);
RWByteAddressBuffer BufferOut : register(u0);
int i0 = asint( Buffer0.Load( DTid.x*8 ) );
float f0 = asfloat( Buffer0.Load( DTid.x*8+4 ) );
int i1 = asint( Buffer1.Load( DTid.x*8 ) );
float f1 = asfloat( Buffer1.Load( DTid.x*8+4 ) );
BufferOut.Store( DTid.x*8, asuint(i0 + i1) );
BufferOut.Store( DTid.x*8+4, asuint(f0 + f1) );
Sample code above was taken from the BasicCompute11 sample from the DirectX June 2010 SDK. It demonstrates using both structured buffers and raw buffers.

Precise Texture Overlay

I'm trying to set up a two-stage render of objects in a 3D engine I'm working on written in C++ with DirectX9 to facilitate transparency (and other things). I thought it was all working nicely until I noticed some dodgyness on the edge of objects rendered before objects using this two stage method.
The two stage method is simple:
Draw model to off-screen ("side") texture of same size using same zbuffer (no MSAA is used anywhere)
Draw off-screen ("side") texture over the top of the main render target with a suitable blend and no alpha test or write
In the image below the left view is with the two stage render of the gray object (a lamppost) with the body in-front of it rendered directly to the target texture. The right view is with the two-stage render disabled, so both are rendered directly onto the target surface.
On close inspection it is as if the side texture is offset by exactly 1 pixel "down" and 1 pixel "right" when rendered over the target surface (but is rendered correctly in-place). This can be seen in an overlay of the off screen texture (which I get my program to write out to a bitmap file via D3DXSaveTextureToFile) over a screen shot below.
One last image so you can see where the edge in the side texture is coming from (it's because rendering to the side texture does use z test). Left is screen short, right is side texture (as overlaid above).
All this leads me to believe that my "overlaying" isn't very effective. The code that renders the side texture over the main render target is shown below (note that the same viewport is used for all scene rendering (on and off screen)). The "effect" object is an instance of a thin wrapper over LPD3DXEFFECT, with the "effect" field (sorry about shoddy naming) being a LPD3DXEFFECT itself.
void drawSideOver(LPDIRECT3DDEVICE9 dxDevice, drawData* ddat)
{ // "ddat" drawdata contains lots of render state information, but all we need here is the handles for the targetSurface and sideSurface
D3DXMATRIX idMat;
D3DXMatrixIdentity(&idMat); // create identity matrix
dxDevice->SetRenderTarget(0, ddat->targetSurface); // switch to targetSurface
dxDevice->SetRenderState(D3DRS_ZENABLE, false); // disable z test and z write
dxDevice->SetRenderState(D3DRS_ZWRITEENABLE, false);
vertexOver overVerts[4]; // create square
overVerts[0] = vertexOver(-1, -1, 0, 0, 1);
overVerts[1] = vertexOver(-1, 1, 0, 0, 0);
overVerts[2] = vertexOver(1, -1, 0, 1, 1);
overVerts[3] = vertexOver(1, 1, 0, 1, 0);
effect.setTexture(ddat->sideTex); // use side texture as shader texture ("tex")
effect.effect->SetTechnique("over"); // change to "over" technique
effect.setViewProj(&idMat); // set viewProj to identity matrix so 1/-1 map directly
effect.effect->CommitChanges();
setAlpha(dxDevice); // this sets up the alpha blending which works fine
UINT numPasses, pass;
effect.effect->Begin(&numPasses, 0);
effect.effect->BeginPass(0);
dxDevice->SetVertexDeclaration(vertexDecOver);
dxDevice->DrawPrimitiveUP(D3DPT_TRIANGLESTRIP, 2, overVerts, sizeof(vertexOver));
effect.effect->EndPass();
effect.effect->End();
dxDevice->SetRenderState(D3DRS_ZENABLE, true); // revert these so we don't mess everything up drawn after this
dxDevice->SetRenderState(D3DRS_ZWRITEENABLE, true);
}
The C++ side definition for the VertexOver struct and constructor (HLSL side shown below somewhere):
struct vertexOver
{
public:
float x;
float y;
float z;
float w;
float tu;
float tv;
vertexOver() { }
vertexOver(float xN, float yN, float zN, float tuN, float tvN)
{
x = xN;
y = yN;
z = zN;
w = 1.0;
tu = tuN;
tv = tvN;
}
};
Inefficiency in re-creating and passing the vertices down to the GPU each draw aside, what I really want to know is why this method doesn't quite work, and if there are any better methods for overlaying textures like this with an alpha blend that won't exhibit this issue
I figured that the texture sampling may matter somewhat in this matter, but messing about with options didn't seem to help much (for example, using a LINEAR filter just makes it fuzzy as you might expect implying that the offset isn't as clear-cut as a 1 pixel discrepancy). Shader code:
struct VS_Input_Over
{
float4 pos : POSITION0;
float2 txc : TEXCOORD0;
};
struct VS_Output_Over
{
float4 pos : POSITION0;
float2 txc : TEXCOORD0;
float4 altPos : TEXCOORD1;
};
struct PS_Output
{
float4 col : COLOR0;
};
Texture tex;
sampler texSampler = sampler_state { texture = <tex>;magfilter = NONE; minfilter = NONE; mipfilter = NONE; AddressU = mirror; AddressV = mirror;};
// side/over shaders (these make up the "over" technique (pixel shader version 2.0)
VS_Output_Over VShade_Over(VS_Input_Over inp)
{
VS_Output_Over outp = (VS_Output_Over)0;
outp.pos = mul(inp.pos, viewProj);
outp.altPos = outp.pos;
outp.txc = inp.txc;
return outp;
}
PS_Output PShade_Over(VS_Output_Over inp)
{
PS_Output outp = (PS_Output)0;
outp.col = tex2D(texSampler, inp.txc);
return outp;
}
I've looked about for a "Blended Blit" or something but I can't find anything, and other related searches have only brought up forums implying that rendering a quad with an orthographic projection is the way to go about doing this.
Sorry if I've given far too much detail for this issue but it's both interesting and infuriating and any feedback would be greatly appreciated.
It looks for me that you problem is the mapping of texels to pixels. You must offset a screen-aligned quad with a half pixel to match the texels direct to the screenpixels. This issue is explaines here: Directly Mapping Texels to Pixels (MSDN)
For anyone else hitting a similar wall, my specific problem solved by adjusting the U and V values of the verticies sent to the GPU for the overlaid texture triangles thus:
for (int i = 0; i < 4; i++)
{
overVerts[i].tu += 0.5 / (float)ddat->targetVp->Width; // ddat->targetVp is the viewport in use, and the viewport is the same size as the texture
overVerts[i].tv += 0.5 / (float)ddat->targetVp->Height;
}
See Directly Mapping Texels to Pixels as provided by Gnietschow's answer for an explanation as to why this makes sense.