I'm trying to set up a two-stage render of objects in a 3D engine I'm working on written in C++ with DirectX9 to facilitate transparency (and other things). I thought it was all working nicely until I noticed some dodgyness on the edge of objects rendered before objects using this two stage method.
The two stage method is simple:
Draw model to off-screen ("side") texture of same size using same zbuffer (no MSAA is used anywhere)
Draw off-screen ("side") texture over the top of the main render target with a suitable blend and no alpha test or write
In the image below the left view is with the two stage render of the gray object (a lamppost) with the body in-front of it rendered directly to the target texture. The right view is with the two-stage render disabled, so both are rendered directly onto the target surface.
On close inspection it is as if the side texture is offset by exactly 1 pixel "down" and 1 pixel "right" when rendered over the target surface (but is rendered correctly in-place). This can be seen in an overlay of the off screen texture (which I get my program to write out to a bitmap file via D3DXSaveTextureToFile) over a screen shot below.
One last image so you can see where the edge in the side texture is coming from (it's because rendering to the side texture does use z test). Left is screen short, right is side texture (as overlaid above).
All this leads me to believe that my "overlaying" isn't very effective. The code that renders the side texture over the main render target is shown below (note that the same viewport is used for all scene rendering (on and off screen)). The "effect" object is an instance of a thin wrapper over LPD3DXEFFECT, with the "effect" field (sorry about shoddy naming) being a LPD3DXEFFECT itself.
void drawSideOver(LPDIRECT3DDEVICE9 dxDevice, drawData* ddat)
{ // "ddat" drawdata contains lots of render state information, but all we need here is the handles for the targetSurface and sideSurface
D3DXMATRIX idMat;
D3DXMatrixIdentity(&idMat); // create identity matrix
dxDevice->SetRenderTarget(0, ddat->targetSurface); // switch to targetSurface
dxDevice->SetRenderState(D3DRS_ZENABLE, false); // disable z test and z write
dxDevice->SetRenderState(D3DRS_ZWRITEENABLE, false);
vertexOver overVerts[4]; // create square
overVerts[0] = vertexOver(-1, -1, 0, 0, 1);
overVerts[1] = vertexOver(-1, 1, 0, 0, 0);
overVerts[2] = vertexOver(1, -1, 0, 1, 1);
overVerts[3] = vertexOver(1, 1, 0, 1, 0);
effect.setTexture(ddat->sideTex); // use side texture as shader texture ("tex")
effect.effect->SetTechnique("over"); // change to "over" technique
effect.setViewProj(&idMat); // set viewProj to identity matrix so 1/-1 map directly
effect.effect->CommitChanges();
setAlpha(dxDevice); // this sets up the alpha blending which works fine
UINT numPasses, pass;
effect.effect->Begin(&numPasses, 0);
effect.effect->BeginPass(0);
dxDevice->SetVertexDeclaration(vertexDecOver);
dxDevice->DrawPrimitiveUP(D3DPT_TRIANGLESTRIP, 2, overVerts, sizeof(vertexOver));
effect.effect->EndPass();
effect.effect->End();
dxDevice->SetRenderState(D3DRS_ZENABLE, true); // revert these so we don't mess everything up drawn after this
dxDevice->SetRenderState(D3DRS_ZWRITEENABLE, true);
}
The C++ side definition for the VertexOver struct and constructor (HLSL side shown below somewhere):
struct vertexOver
{
public:
float x;
float y;
float z;
float w;
float tu;
float tv;
vertexOver() { }
vertexOver(float xN, float yN, float zN, float tuN, float tvN)
{
x = xN;
y = yN;
z = zN;
w = 1.0;
tu = tuN;
tv = tvN;
}
};
Inefficiency in re-creating and passing the vertices down to the GPU each draw aside, what I really want to know is why this method doesn't quite work, and if there are any better methods for overlaying textures like this with an alpha blend that won't exhibit this issue
I figured that the texture sampling may matter somewhat in this matter, but messing about with options didn't seem to help much (for example, using a LINEAR filter just makes it fuzzy as you might expect implying that the offset isn't as clear-cut as a 1 pixel discrepancy). Shader code:
struct VS_Input_Over
{
float4 pos : POSITION0;
float2 txc : TEXCOORD0;
};
struct VS_Output_Over
{
float4 pos : POSITION0;
float2 txc : TEXCOORD0;
float4 altPos : TEXCOORD1;
};
struct PS_Output
{
float4 col : COLOR0;
};
Texture tex;
sampler texSampler = sampler_state { texture = <tex>;magfilter = NONE; minfilter = NONE; mipfilter = NONE; AddressU = mirror; AddressV = mirror;};
// side/over shaders (these make up the "over" technique (pixel shader version 2.0)
VS_Output_Over VShade_Over(VS_Input_Over inp)
{
VS_Output_Over outp = (VS_Output_Over)0;
outp.pos = mul(inp.pos, viewProj);
outp.altPos = outp.pos;
outp.txc = inp.txc;
return outp;
}
PS_Output PShade_Over(VS_Output_Over inp)
{
PS_Output outp = (PS_Output)0;
outp.col = tex2D(texSampler, inp.txc);
return outp;
}
I've looked about for a "Blended Blit" or something but I can't find anything, and other related searches have only brought up forums implying that rendering a quad with an orthographic projection is the way to go about doing this.
Sorry if I've given far too much detail for this issue but it's both interesting and infuriating and any feedback would be greatly appreciated.
It looks for me that you problem is the mapping of texels to pixels. You must offset a screen-aligned quad with a half pixel to match the texels direct to the screenpixels. This issue is explaines here: Directly Mapping Texels to Pixels (MSDN)
For anyone else hitting a similar wall, my specific problem solved by adjusting the U and V values of the verticies sent to the GPU for the overlaid texture triangles thus:
for (int i = 0; i < 4; i++)
{
overVerts[i].tu += 0.5 / (float)ddat->targetVp->Width; // ddat->targetVp is the viewport in use, and the viewport is the same size as the texture
overVerts[i].tv += 0.5 / (float)ddat->targetVp->Height;
}
See Directly Mapping Texels to Pixels as provided by Gnietschow's answer for an explanation as to why this makes sense.
Related
So, basically, I'm trying to make a OBS Filter that displaces the pixels based on a lightmap/luminance map. I decided to learn how to make a filter by following this tutorial. But, in this tutorial, they don't explain much in terms of pixel displacement. So, I made a function that basically gets the brightness value of a texture I input and tested it by changing the pixel's alpha value with the red value of the texture:
float4 get_displacement(float2 position)
{
float2 pattern_uv = position / pattern_size;
float4 pattern_sample = pattern_texture.Sample(linear_wrap, pattern_uv / scale);
return pattern_sample;
}
float4 pixel_shader(pixel_data pixel) : TARGET
{
float4 source_sample = image.Sample(linear_wrap, pixel.uv);
if (pattern_size.x <= 0){
return source_sample;
}
float2 position = pixel.uv * float2(width, height);
float4 lightmap = get_displacement(position);
return float4(source_sample.rgb, lightmap.r);
return source_sample;
}
Which results to this (Note: The green is from a colour source that's behind the image to show the alpha value)
But, for some reason, when I try it with the vertex_shader, the function that decides where the pixel is rendered at, it seems to not work:
pixel_data vertex_shader(vertex_data vertex)
{
pixel_data pixel;
pixel.uv = vertex.uv;
if (pattern_size.x <= 0){
pixel.pos = mul(float4(vertex.pos.xyz, 1.0), ViewProj);
return pixel;
}
float2 position = vertex.uv * float2(width, height);
float4 lightmap = get_displacement(position);
pixel.pos = mul(float4(vertex.pos.x + (lightmap.r * testRamp1), vertex.pos.yz, 1.0), ViewProj);
return pixel;
}
(Note: testRamp1 is used as a value that I can change from a slider inside of OBS via some filter Properties)
The result that I'm expecting is something similar to this
To see if the issue was from me changing the XY position, I tested it using this function:
pixel_data vertex_shader(vertex_data vertex)
{
pixel_data pixel;
pixel.uv = vertex.uv;
pixel.pos = mul(float4(vertex.pos.x + 100, vertex.pos.yz, 1.0), ViewProj);
return pixel;
}
And it gave me an expected result.
I also changed the 100 with the testRamp1 value, and it works just the same based on the value of the slider.
So, I then tested if it was from the pixels needing to all move the same distance as each other. So, I change the function to this:
pixel_data vertex_shader(vertex_data vertex)
{
pixel_data pixel;
pixel.uv = vertex.uv;
pixel.pos = mul(float4(vertex.pos.x + (vertex.uv.x * testRamp1), vertex.pos.yz, 1.0), ViewProj);
return pixel;
}
Which then gives me either a squashed image when testRamp1 is set to a negative value, and it gives me a stretched image when it's set it to a positive value.
But as soon as I try to get the value of an image, may it be the pattern or from the source image, it no longer works(not even the filter parameters appear). For example, I used, this function to use the values of the source image:
pixel_data vertex_shader(vertex_data vertex)
{
pixel_data pixel;
float4 source_sample = image.Sample(linear_wrap, vertex.uv);
pixel.uv = vertex.uv;
pixel.pos = mul(float4(vertex.pos.x + (source_sample.r * testRamp1), vertex.pos.yz, 1.0), ViewProj);
return pixel;
}
At this point, I'm at a loss of words as to what could be causing this issue
First of all, the vertex shader is not what you want to use for this kind of effect. What you actually want to do, is to sample the image in the pixel shader, but offset the UV values slightly by your displacement before you pass them to the Sample function.
The primary reason, why you don't want to do this in the vertex shader, is, that the number of vertices is usually much smaller than the number of pixels - in the worst case, you only have 4 vertices in total (one for each corner of your screen), so the granuality of things you can do in the vertex shader is rather coarse. (Note: I'm not too familiar with OBS filters, and don't know how many vertices OBS dispatches, but certainly much less than the number of pixels you have on your screen).
Now, the reason why your vertex shader didn't work at all, is a bit more technical. In short, you can't use Sample in a vertex shader, you'd have to use SampleLevel or SampleGrad instead (note that these functions require more parameters). This is because Sample automatically calculates a UV gradient between adjacent pixels, to figure out the level of detail that is needed for your texture (whether or not it actually has multiple levels of detail). But the vertex shader operates on vertices, not on pixels, so the concept of an "adjacent pixel" doesn't make sense in a vertex shader - thus, the Sample method doesn't work.
I have a DirectX 12 desktop project on Windows 11 that implements post-processing using a combination of DXTK post-process effects.
The aim of the post-proc sequence is to end up with individual bloom and blur textures (along with a depth texture rendered in a depth pass) which are sampled in a 'big triangle' pixel shader to achieve a depth of field effect for the final backbuffer screen image.
The DXTK PostProcesses operate on the full-size (1920x1080) screen texture. Presently this isn't impacting performance (benchmarked at 60fps), but I imagine it could be an issue when I eventually want to support 4K resolutions in future, where full-size image post-processing could be expensive.
Since the recommended best practice is to operate on a scaled down copy of the source image,
I hoped to achieve this by using half-size (i.e. quarter resolution) working textures with the DownScale_2x2 BasicPostProcess option. But after several attempts experimenting with the effect, only the top-left quarter of the original source image is being rendered to the downsized texture... not the full image as expected per the documentation:
DownScale_2x2: Downscales each 2x2 block of pixels to an average. This is intended to write to a render target that is half the size of the source texture in each dimension.
Other points of note:
scene geometry is first rendered to a _R16G16B16A16_FLOAT MSAA render target and resolved to single-sample 16fp target
postprocessing operates on resolved single-sample 16fp target (where only the intermediate 'Pass1' & 'Pass2' working render targets are set to half the backbuffer length & width)
final processed image is tonemapped to the _R10G10B10A2_UNORM swapchain backbuffer for presentation.
The following code snippets show how I'm implementing the DownScale_2x2 shader into my post-process. Hopefully it's enough to resolve the issue and I can update with more info if necessary.
Resource initialization under CreateDeviceDependentResources():
namespace GameConstants {
constexpr DXGI_FORMAT BACKBUFFERFORMAT(DXGI_FORMAT_R10G10B10A2_UNORM); // back buffer to support hdr rendering
constexpr DXGI_FORMAT HDRFORMAT(DXGI_FORMAT_R16G16B16A16_FLOAT); // format for hdr render targets
constexpr DXGI_FORMAT DEPTHFORMAT(DXGI_FORMAT_D32_FLOAT); // format for render target depth buffer
constexpr UINT MSAACOUNT(4u); // requested multisample count
}
...
//
// Render targets
//
mMsaaHelper = std::make_unique<MSAAHelper>(GameConstants::HDRFORMAT, GameConstants::DEPTHFORMAT, GameConstants::MSAACOUNT);
mMsaaHelper->SetClearColor(GameConstants::CLEARCOLOR);
mDistortionRenderTex = std::make_unique<RenderTexture>(GameConstants::BACKBUFFERFORMAT);
mHdrRenderTex = std::make_unique<RenderTexture>(GameConstants::HDRFORMAT);
mPass1RenderTex = std::make_unique<RenderTexture>(GameConstants::HDRFORMAT);
mPass2RenderTex = std::make_unique<RenderTexture>(GameConstants::HDRFORMAT);
mBloomRenderTex = std::make_unique<RenderTexture>(GameConstants::HDRFORMAT);
mBlurRenderTex = std::make_unique<RenderTexture>(GameConstants::HDRFORMAT);
mDistortionRenderTex->SetClearColor(GameConstants::CLEARCOLOR);
mHdrRenderTex->SetClearColor(GameConstants::CLEARCOLOR);
mPass1RenderTex->SetClearColor(GameConstants::CLEARCOLOR);
mPass2RenderTex->SetClearColor(GameConstants::CLEARCOLOR);
mBloomRenderTex->SetClearColor(GameConstants::CLEARCOLOR);
mBlurRenderTex->SetClearColor(GameConstants::CLEARCOLOR);
mMsaaHelper->SetDevice(device); // Set the MSAA device. Note this updates GetSampleCount.
mDistortionRenderTex->SetDevice(device,
mPostProcSrvDescHeap->GetCpuHandle(SRV_PostProcDescriptors::DistortionMaskSRV),
mRtvDescHeap->GetCpuHandle(RTV_Descriptors::DistortionMaskRTV));
mHdrRenderTex->SetDevice(device,
mPostProcSrvDescHeap->GetCpuHandle(SRV_PostProcDescriptors::HdrSRV),
mRtvDescHeap->GetCpuHandle(RTV_Descriptors::HdrRTV));
mPass1RenderTex->SetDevice(device,
mPostProcSrvDescHeap->GetCpuHandle(SRV_PostProcDescriptors::Pass1SRV),
mRtvDescHeap->GetCpuHandle(RTV_Descriptors::Pass1RTV));
mPass2RenderTex->SetDevice(device,
mPostProcSrvDescHeap->GetCpuHandle(SRV_PostProcDescriptors::Pass2SRV),
mRtvDescHeap->GetCpuHandle(RTV_Descriptors::Pass2RTV));
mBloomRenderTex->SetDevice(device,
mPostProcSrvDescHeap->GetCpuHandle(SRV_PostProcDescriptors::BloomSRV),
mRtvDescHeap->GetCpuHandle(RTV_Descriptors::BloomRTV));
mBlurRenderTex->SetDevice(device,
mPostProcSrvDescHeap->GetCpuHandle(SRV_PostProcDescriptors::BlurSRV),
mRtvDescHeap->GetCpuHandle(RTV_Descriptors::BlurRTV));
...
RenderTargetState ppState(GameConstants::HDRFORMAT, DXGI_FORMAT_UNKNOWN); // 2d postproc rendering
...
// Set other postprocessing effects
mBloomExtract = std::make_unique<BasicPostProcess>(device, ppState, BasicPostProcess::BloomExtract);
mBloomPass = std::make_unique<BasicPostProcess>(device, ppState, BasicPostProcess::BloomBlur);
mBloomCombine = std::make_unique<DualPostProcess>(device, ppState, DualPostProcess::BloomCombine);
mGaussBlurPass = std::make_unique<BasicPostProcess>(device, ppState, BasicPostProcess::GaussianBlur_5x5);
mDownScalePass = std::make_unique<BasicPostProcess>(device, ppState, BasicPostProcess::DownScale_2x2);
Resource resizing under CreateWindowSizeDependentResources():
// Get current backbuffer dimensions
CD3DX12_RECT outputRect(mDeviceResources->GetOutputSize());
// Determine the render target size in pixels
mBackbufferSize.x = std::max<UINT>(outputRect.right - outputRect.left, 1u);
mBackbufferSize.y = std::max<UINT>(outputRect.bottom - outputRect.top, 1u);
...
mMsaaHelper->SetWindow(outputRect);
XMUINT2 halfSize(mBackbufferSize.x / 2u, mBackbufferSize.y / 2u);
mBloomRenderTex->SetWindow(outputRect);
mBlurRenderTex->SetWindow(outputRect);
mDistortionRenderTex->SetWindow(outputRect);
mHdrRenderTex->SetWindow(outputRect);
mPass1RenderTex->SizeResources(halfSize.x, halfSize.y);
mPass2RenderTex->SizeResources(halfSize.x, halfSize.y);
Post-processing implementation:
mMsaaHelper->Prepare(commandList);
Clear(commandList);
// Render 3d scene
mMsaaHelper->Resolve(commandList, mHdrRenderTex->GetResource(),
D3D12_RESOURCE_STATE_RENDER_TARGET, D3D12_RESOURCE_STATE_RENDER_TARGET);
//
// Postprocessing
//
// Set texture descriptor heap in prep for postprocessing if necessary.
// Unbind dsv for postprocess textures and sprites.
ID3D12DescriptorHeap* postProcHeap[] = { mPostProcSrvDescHeap->Heap() };
commandList->SetDescriptorHeaps(UINT(std::size(postProcHeap)), postProcHeap);
// downscale pass
CD3DX12_CPU_DESCRIPTOR_HANDLE rtvDownScaleDescriptor(mRtvDescHeap->GetCpuHandle(RTV_Descriptors::Pass1RTV));
commandList->OMSetRenderTargets(1u, &rtvDownScaleDescriptor, FALSE, nullptr);
mPass1RenderTex->BeginScene(commandList); // transition to render target state
mDownScalePass->SetSourceTexture(mPostProcSrvDescHeap->GetGpuHandle(SRV_PostProcDescriptors::HdrSRV), mHdrRenderTex->GetResource());
mDownScalePass->Process(commandList);
mPass1RenderTex->EndScene(commandList); // transition to pixel shader resource state
// blur horizontal pass
commandList->OMSetRenderTargets(1u, &rtvPass2Descriptor, FALSE, nullptr);
mPass2RenderTex->BeginScene(commandList); // transition to render target state
mGaussBlurPass->SetSourceTexture(mPostProcSrvDescHeap->GetGpuHandle(SRV_PostProcDescriptors::Pass1SRV), mPass1RenderTex->GetResource());
//mGaussBlurPass->SetSourceTexture(mPostProcSrvDescHeap->GetGpuHandle(SRV_PostProcDescriptors::HdrSRV), mHdrRenderTex->GetResource());
mGaussBlurPass->SetGaussianParameter(1.f);
mGaussBlurPass->SetBloomBlurParameters(TRUE, 4.f, 1.f); // horizontal blur
mGaussBlurPass->Process(commandList);
mPass2RenderTex->EndScene(commandList); // transition to pixel shader resource
// blur vertical pass
CD3DX12_CPU_DESCRIPTOR_HANDLE rtvBlurDescriptor(mRtvDescHeap->GetCpuHandle(RTV_Descriptors::BlurRTV));
commandList->OMSetRenderTargets(1u, &rtvBlurDescriptor, FALSE, nullptr);
mBlurRenderTex->BeginScene(commandList); // transition to render target state
mGaussBlurPass->SetSourceTexture(mPostProcSrvDescHeap->GetGpuHandle(SRV_PostProcDescriptors::Pass2SRV), mPass2RenderTex->GetResource());
mGaussBlurPass->SetBloomBlurParameters(FALSE, 4.f, 1.f); // vertical blur
mGaussBlurPass->Process(commandList);
mBlurRenderTex->EndScene(commandList); // transition to pixel shader resource
// render the final image to hdr texture
CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHdrDescriptor(mRtvDescHeap->GetCpuHandle(RTV_Descriptors::HdrRTV));
commandList->OMSetRenderTargets(1u, &rtvHdrDescriptor, FALSE, nullptr);
//mHdrRenderTex->BeginScene(commandList); // transition to render target state
commandList->SetGraphicsRootSignature(mRootSig.Get()); // bind root signature
commandList->SetPipelineState(mPsoDepthOfField.Get()); // set PSO
...
commandList->SetGraphicsRootConstantBufferView(RootParameterIndex::PSDofCB, psDofCB.GpuAddress());
commandList->SetGraphicsRootDescriptorTable(RootParameterIndex::PostProcDT, mPostProcSrvDescHeap->GetFirstGpuHandle());
// use the big triangle optimization to draw a fullscreen quad
commandList->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);
commandList->DrawInstanced(3u, 1u, 0u, 0u);
...
PIXBeginEvent(commandList, PIX_COLOR_DEFAULT, L"Tone Map");
// Set swapchain backbuffer as the tonemapping render target and unbind depth/stencil for sprites (UI)
CD3DX12_CPU_DESCRIPTOR_HANDLE rtvDescriptor(mDeviceResources->GetRenderTargetView());
commandList->OMSetRenderTargets(1u, &rtvDescriptor, FALSE, nullptr);
CD3DX12_GPU_DESCRIPTOR_HANDLE postProcTexture(mPostProcSrvDescHeap->GetGpuHandle(SRV_PostProcDescriptors::HdrSRV));
ApplyToneMapping(commandList, postProcTexture);
Vertex shader:
/*
We use the 'big triangle' optimization that only requires three vertices to completely
cover the full screen area.
v0 v1 ID NDC UV
*____* -- ------- ----
| | / 0 (-1,+1) (0,0)
|_|/ 1 (+3,+1) (2,0)
| / 2 (-1,-3) (0,2)
|/
*
v2
*/
TexCoordVertexOut VS(uint id : SV_VertexID)
{
TexCoordVertexOut vout;
vout.texCoord = float2((id << 1u) & 2u, id & 2u);
// See Luna p.687
float x = vout.texCoord.x * 2.f - 1.f;
float y = -vout.texCoord.y * 2.f + 1.f;
// Procedurally generate each NDC vertex.
// The big triangle produces a quad covering the screen in NDC space.
vout.posH = float4(x, y, 0.f, 1.f);
// Transform quad corners to view space near plane.
float4 ph = mul(vout.posH, InvProj);
vout.posV = ph.xyz / ph.w;
return vout;
}
Pixel shader:
float4 PS(TexCoordVertexOut pin) : SV_TARGET
//float4 PS(float2 texCoord : TEXCOORD0) : SV_TARGET
{
...
// Get downscale texture sample
float3 colorDownScale = Pass1Tex.Sample(PointSampler, pin.texCoord).rgb;
...
return float4(colorDownScale, 1.f); // only top-quarter of source input is rendered!
//return float4(colorOutput, 1.f);
//return float4(distortCoords, 0.f, 1.f);
//return float4(colorHDR, 1.f);
//return float4(colorBlurred, 1.f);
//return float4(colorBloom, 1.f);
//return float4((p.z * 0.01f).rrr, 1.f); // multiply by a contrast factor
}
The PostProcess class uses a 'full-screen quad' rendering model. Since we can rely on Direct3D 10.0 or later class hardware, it makes use of the 'self-generating quad' model to avoid the need for a VB.
As such, the self-generating quad is going to be positioned wherever you have the viewport set. The scissors settings are also needed since it uses the "big-triangle" optimization to avoid having a diagonal seam across the image IF you have the viewport positioned anywhere except the full render target.
I have this detail in the Writing custom shaders tutorial, but I forgot to replicate it in the PostProcess docs on the wiki.
TL;DR: When you go to render to the smaller render target, use:
auto vp = m_deviceResources->GetScreenViewport();
Viewport halfvp(vp);
halfvp.height /= 2.f;
halfvp.width /= 2.f;
commandList->RSSetViewports(1, halfvp.Get12());
Then when we switch back to your full-size rendertarget, use:
commandList->RSSetViewports(1, &vp);
Updated the wiki page.
I am new to Direct X, but I've been successfully able to use the Windows Desktop Duplication API to capture video. The API also allows you to retrieve mouse cursor information, including the position, height, width, and the raw pixel data (in system memory) of the cursor image. The mouse cursor is not drawn on the captured screen image by default, it needs to be handled manually.
I'm trying to "copy" this mouse cursor data to the main screen capture image to create a single image with a visible mouse cursor. So far I have been able to make the cursor show up by creating an ID3D11Texture2D from the cursor pixel data, then preforming an ID3D11DeviceContext::CopySubresourceRegion to copy the cursor to the main screen image, also stored as an ID3D11Texture2D. The main screen image texture is always in the DXGI_FORMAT_B8G8R8A8_UNORM format, and the raw cursor pixel data seems to be in the same format, at least for the DXGI_OUTDUPL_POINTER_SHAPE_TYPE_COLOR shape.
My current issue seems to be related to the alpha handling of this copy. The cursor shows up, but when the rectangle is copied the alpha surrounding the cursor is instead filled in black. Here is an example of what it looks like: Black border around mouse
Also, it is important to me that this happens in video memory, as the final texture goes straight from video memory into a video encoder.
I'm willing to change my method if CopySubresourceRegion is not the right tool for the job. Any ideas on how I can get this cursor onto the main screen image texture with proper alpha?
The only way to access the alpha blending capabilities of you GPU is with draw commands. Copy calls only do replacement, as you see.
You already have your mouse cursor in an 'ID3D11Texture2D', what you need now, is an 'ID3D11ShaderResourceView' to use it as a texture, an 'ID3DVertexShader' and 'ID3DPixelShader' pair to render in a surface. An 'ID3D11RenderTargetView' from your destination surface.
A set of 'ID3D11RasterizerState', 'ID3D11DepthStencilState' and 'ID3D11BlendState' to configure the gpu state with no depth test, alpha blend, and other meaning full setting, most of them at default for you should be ok.
Then you need to draw a quad with all that to display your cursor. Depending on how you write the shader, you will need either a constant buffer, a vertex buffer and an input layout or both too.
For that kind of quad blit, i usually prefer to only deal with a single constant buffer and rebuild the vertex position from SV_VertexID inside the vertex shader, but it is up to you.
This is how you can write the blit shader without a vertex buffer to manage, a single Draw(4,0) with a strip primitive topology is enough :
struct IA {
uint vid : SV_VertexID;
};
struct VSPS {
float4 pos : SV_Position;
float2 uv : COLOR;
};
struct Root {
float left;
float top;
float right;
float bottom;
};
ConstantBuffer<Root> root_ : register(b0);
Texture2D<float4> texture_ : register(t0);
SamplerState sampler_ : register(s0);
void mainvs( in IA input, out VSPS output ) {
float x = input.vid < 2 ? 0.f : 1.f;
float y = (input.vid & 1) ? 1.f : 0.f;
output.uv = float2(x, y);
float px = input.vid < 2 ? root_.left : root_.right;
float py = (input.vid & 1) ? root_.bottom : root_.top;;
output.pos = float4(px,py,0.f,1.f);
output.pos.y = 1 - output.pos.y;
output.pos.xy *= 2;
output.pos.xy -= 1;
}
float4 mainps( in VSPS input ) : SV_TARGET {
return texture_.Sample( sampler_, input.uv );
}
(Edit) I made working geometry picking with framebuffer. My goal is draw huge scene in one draw call, but I need to draw to multisample color texture attachment (GL_COLOR_ATTACHMENT0) and draw to (eddited) non-multisample picking texture attachment (GL_COLOR_ATTACHMENT1). The problem is if I use multisample texture to pick, picking is corrupted because of multi-sampling.
I write geometry ID to fragment shader like this:
//...
// Given geometry id
uniform int in_object_id;
// Drawed to screen (GL_COLOR_ATTACHMENT0)
out vec4 out_frag_color0;
// Drawed to pick texture (GL_COLOR_ATTACHMENT1)
out vec4 out_frag_color1;
// ...
void main() {
out_frag_color0 = ...; // Calculating lighting and other stuff
//...
const int max_byte1 = 256;
const int max_byte2 = 65536;
const float fmax_byte = 255.0;
int a1 = in_object_id % max_byte1;
int a2 = (in_object_id / max_byte1) % max_byte1;
int a3 = (in_object_id / max_byte2) % max_byte1;
//out_frag_color0 = vec4(a3 / fmax_byte, a2 / fmax_byte, a1 / fmax_byte, 1);
out_frag_color1 = vec4(a3 / fmax_byte, a2 / fmax_byte, a1 / fmax_byte, 1);
}
(Point of that code is use RGB space for store geometry ID which is then read back a using for changing color of cube)
This happens when I move cursor by one pixel to left:
Because of alpha value of cube pixel:
Without multisample is works well. But multisampling multiplies my output color and geometry id is then corrupted, so it selects random cube with multiplied value.
(Edit) I can't attach one multisample texture target to color0 and non-multisample texture target to color1, it's not supported. How can I do this in one draw call?
Multisampling is not my friend I am not sure If I understand it well (whole framebuffering). Anyway, this way to pick geometries looks horrible for me (I meant calculating ID to color). Am I doing it well? How can I solve multisample problem? Is there better way?
PS: Sorry for low english. :)
Thanks.
You can't do multisampled and non-multisampled rendering in a single draw call.
As you already found, using two color targets in an FBO, with only one of them being multisampled, is not supported. From the "Framebuffer Completeness" section in the spec:
The value of RENDERBUFFER_SAMPLES is the same for all attached renderbuffers; the value of TEXTURE_SAMPLES is the same for all attached textures; and, if the attached images are a mix of renderbuffers and textures, the value of RENDERBUFFER_SAMPLES matches the value of TEXTURE_SAMPLES.
You also can't render to multiple framebuffers at the same time. There is always one single current framebuffer.
The only reasonable option I can think of is to do picking in a separate pass. Then you can easily switch the framebuffer/attachment to a non-multisampled renderbuffer, and avoid all these issues.
Using a separate pass for picking seems cleaner to me anyway. This also allows you to use a specialized shader for each case, instead of always producing two outputs even if one of them is mostly unused.
I think it is posible...
You have to set the picking texture to multisampled and after rendering the scene, you can render 2 triangles over the screen and inside another fragmentshader you can readout each sample... to do that you have to use the GLSL command:
texelFetch(sampler, pixelposition/*[0-texturesize]*/, /*important*/layernumber);
Then you can render it into a single-sampled texture and read the color via glReadPixel.
I haven't tested it now, but I think it works
I am using Nvidia CG and Direct3D9 and have the question about the following code.
It compiles, but doesn't "loads" (using cgLoadProgram wrapper) and the resulting failure is described simplyas D3D failure happened.
It's a part of the pixel shader compiled with shader model set to 3.0
What may be interesting is that this shader loads fine in the following cases:
1) Manually unrolling the while statement (to many if { } statements).
2) Removing the line with the tex2D function in the loop.
3) Switching to shader model 2_X and manually unrolling the loop.
Problem part of the shader code:
float2 tex = float2(1, 1);
float2 dtex = float2(0.01, 0.01);
float h = 1.0 - tex2D(height_texture1, tex);
float height = 1.00;
while ( h < height )
{
height -= 0.1;
tex += dtex;
// Remove the next line and it works (not as expected,
// of course)
h = tex2D( height_texture1, tex );
}
If someone knows why this can happen or could test the similiar code in non-CG environment or could help me in some other way, I'm waiting for you ;)
Thanks.
I think you need to determine the gradients before the loop using ddx/ddy on the texture coordinates and then use tex2D(sampler2D samp, float2 s, float2 dx, float2 dy)
The GPU always renders quads not pixels (even on pixel borders - superfluous pixels are discarded by the render backend). This is done because it allows it to always calculate the screen space texture derivates even when you use calculated texture coordinates. It just needs to take the difference between the values at the pixel centers.
But this doesn't work when using dynamic branching like in the code in the question, because the shader processors at the individual pixels could diverge in control flow. So you need to calculate the derivates manually via ddx/ddy before the program flow can diverge.