Keeping Texture2D object(s) for interpolation HLSL - hlsl

Recently someone added a module to OBS Studio which lets anyone incorporate their own shaders into OBS. I've never touched on writing shaders, but after reading some material, I get the gist, it's a function that returns a bit of memory representing the RGBA values of a specific pixel.
Here's the issue, I'm too new to this, it looks to me like there's a few different high level shader languages? I have no clue which one OBS Studio is using, the author of https://github.com/nleseul/obs-shaderfilter doesn't seem to know either. Any pointers to what syntax / what documentation would be of course greatly appreciated.
What I'm aiming to do is a very dumbed down motion blur. Namely though my goal is that I'd like to keep a few frames in a buffer of some sort to work with, I figure that'd be a pretty useful thing to do for other effects...and that's where I'm stuck. Here's what I've got following from Shaders for Game Programmers and Artists pg.87 adapted to work w/ the shader plugin*
uniform float4 blur_factor;
texture2d previmage;
//texture2d image; //this is the input data we're playing around with
float4 mainImage(VertData v_in) : TARGET
{
float4 originalFrame = image.Sample(textureSampler, v_in.uv);
float4 oldFrame = previmage.Sample(textureSampler, v_in.uv);
//this function's going pixel by pixel, I should be pushing
//to the frame buffer here* or there should be some way of modifying
//the frame buffer pixel by pixel, second option makes more sense to do
if(v_in.uv.x == 1 && v_in.uv.y == 1){
//it couldn't have been this easy...it's never this easy
//uncommenting the line below clearly causes a problem, I don't have a debugger for this
//previmage = image;
//this doesn't work either, wishful thinking
//previmage = texture2d(image);
}
//this may not actually be the function to use for a motion blur but at the very least it's mixing two textures together so that I'd have a proof of concept working*
return lerp(originalFrame,oldFrame,blur_factor);
}

So, fortunately it turns out that when a shader fails the error is actually output to OBS's logs. I had a gut feeling that something was up w/ declaring variables globally. For the initial code I wrote I got this:
10:53:49.413: C:\Program Files (x86)\obs-studio\bin\64bit\ (Pixel shader, technique Draw, pass 0)(38,40-50): error X3004: undeclared identifier 'blur_factor'
10:53:49.413: C:\Program Files (x86)\obs-studio\bin\64bit\ (Pixel shader, technique Draw, pass 0)(38,12-51): error X3013: 'lerp': no matching 3 parameter intrinsic function
10:53:49.413: C:\Program Files (x86)\obs-studio\bin\64bit\ (Pixel shader, technique Draw, pass 0)(38,12-51): error X3013: Possible intrinsic functions are:
10:53:49.413: C:\Program Files (x86)\obs-studio\bin\64bit\ (Pixel shader, technique Draw, pass 0)(38,12-51): error X3013: lerp(float|half|min10float|min16float, float|half|min10float|min16float, float|half|min10float|min16float)
So in short...lerp was in fact NOT what I was looking to use, didn't properly call it with the right data types to begin with. So after tossing that I decided to write the shader assuming I could make an array of texture2d's ...
texture2d previmage[8];
float4 mainImage(VertData v_in) : TARGET{
//...etc etc
//manipulating the array, copying over the texture2d OBS provides
}
Then I get this:
11:12:46.880: C:\Program Files (x86)\obs-studio\bin\64bit\ (Pixel shader, technique Draw, pass 0)(27,3-31): error X3025: global variables are implicitly constant, enable compatibility mode to allow modification
Oh...so...global variables in HLSL have a weird behavior that I wouldn't know about, and it makes this approach a bit mute...can't modify constant values, and I need some sort of frame buffer.
Potential solution? Static Variables!
float4 mainImage(VertData v_in) : TARGET
{
int i = 0;
static texture2d previmage[8] = {image,image,image,image,image,image,image,image};
if(v_in.uv.x == 1 && v_in.uv.y == 1){
for(i = 0; i <= 6; i++){
previmage[i+1] = previmage[i];
}
previmage[0] = image;
}
float4 sum = 0;
float4 samples[8];
for(i = 0; i < 8; i++){
samples[i] = previmage[i].Sample(textureSampler, v_in.uv);
sum += samples[i];
}
return sum / 8.0;
}
Now...well at least the thing isn't throwing errors and it's rendering to the screen, unfortunately...I don't / can't see the effect...there's potentially a semantic error in here* or...maybe 8 frames isn't enough to really make for a noticeable motion blur.

Related

Uniform affecting shader flow and performances

i was experimenting with OpenGL fragment shaders by doing a huge blur (300*300) done in two passes, one horizontal, one vertical.
I noticed that passing the direction as a uniform (vec2) is about 10 time slower than to directly write it in the code (140 to 12 fps).
ie:
vec2 dir = vec2(0, 1) / textureSize(tex, 0);
int size = 150;
for(int i = -size; i != size; ++i) {
float w = // compute weight here...
acc += w * texture(tex, + coord + vec2(i) * dir);
}
appear to be faster than:
uniform vec2 dir;
/*
...
*/
int size = 150;
for(int i = -size; i != size; ++i) {
float w = // compute weight here...
acc += w * texture(tex, + coord + vec2(i) * dir);
}
Creating two programs with different uniforms doesn't change anything.
Does anyone know why there is such a huge difference and why doesn't the driver see that "inlining" dir might be much faster ?
EDIT : Taking size as a uniform also have an impact, but not as much as dir.
If you are interested in seeing what it looks like (FRAPS provides the fps counter):
uniform blur.
"inline" blur.
no blur.
Quick notes : i am running on a nVidia 760M GTX using OpenGL 4.2 and glsl 420. Also puush's jpeg is responsible for the colors in the images.
A good guess would be that the UBOs are stored in shared memory, but might require an occasional round-trip to global memory (vram), while the non-uniform version stores that little piece of data in registers or constant memory.
However, since the OpenGL standard does not dictate where your data is stored, you would have to look at a profiler, and try to gain better understanding of how NVIDIA's GL implementation works.
I'd recommend, you start by profiling, using NVIDIA PerfKit or NVIDIA NSIGHT for VS. Even if you think, it's too much trouble for now. If you want to write high-performance code, you should start getting used to the process. You will see how easy it gets eventually.
EDIT:
So why is it so much slower? Because in this case, one failed optimization (data not in registers) can cause other (if not most other) optimizations to also fail. And, coincidentally, optimizations are absolutely necessary for GPU code to run fast.

Simple curiosity about performance using OpenGL and GLSL

I develop a small 3D engine using OpenGL and GLSL.
Here's a part of the rendering code :
void video::RenderBatch::Render(void)
{
type::EffectPtr pShaderEffect = EffectManager::GetSingleton()
.FindEffectByName(this->m_pMaterial->GetAssocEffectName());
pShaderEffect->Bind();
{
///VERTEX ATTRIBUTES LOCATIONS.
{
pShaderEffect->BindAttribLocation(scene::VERTEX_POSITION, "VertexPosition");
pShaderEffect->BindAttribLocation(scene::VERTEX_TEXTURE, "VertexTexture");
pShaderEffect->BindAttribLocation(scene::VERTEX_NORMAL, "VertexNormal");
}
//SEND MATRIX UNIFORMS.
{
glm::mat3 normalMatrix = glm::mat3(glm::vec3(this->m_ModelViewMatrix[0]),
glm::vec3(this->m_ModelViewMatrix[1]), glm::vec3(this->m_ModelViewMatrix[2]));
pShaderEffect->SetUniform("ModelViewProjMatrix", this->m_ModelViewProjMatrix);
pShaderEffect->SetUniform("ModelViewMatrix", this->m_ModelViewMatrix);
pShaderEffect->SetUniform("NormalMatrix", normalMatrix);
}
this->SendLightUniforms(pShaderEffect); //LIGHT MATERIALS TO BE SENT JUST ONCE */
pShaderEffect->SendMaterialUniforms( //SEND MATERIALS IF CHANGED
this->m_pMaterial->GetName());
this->m_pVertexArray->Lock();
{
this->m_pIndexBuffer->Lock();
{
RenderData renderData = this->GetVisibleGeometryData();
{
glMultiDrawElements(GL_TRIANGLES, (GLsizei*)&renderData.count[0], GL_UNSIGNED_INT,
(const GLvoid **)&renderData.indices[0], renderData.count.size());
}
}
this->m_pIndexBuffer->Unlock();
}
this->m_pVertexArray->Unlock();
}
pShaderEffect->Release();
}
I noticed the call of the function 'SetUniform' creates a great loss of FPS (more than 1000 FPS without it to +- 65 FPS with it!). Just ONE simple call of this function suffice!
Here's the code of the function 'this->SetUniform' (for matrix 4x4):
void video::IEffectBase::SetUniform(char const *pName, glm::mat4 mat)
{
int location = glGetUniformLocation(this->m_Handle, pName);
if (location >= 0)
glUniformMatrix4fv(location, 1, GL_FALSE, glm::value_ptr(mat));
}
In reality just the call of the function 'glGetUniformLocation' or the function 'glUniformMatrix4fv' suffice to have a such loss of FPS. Is it normal to go over 1000 FPS to 65 FPS with a unique call of this function ? However buffer binding or shader program binding don't have a such effect! (if I comment all the 'SetUniform' calls I still have more than 1000 FPS even with all the bindings (state changes)!).
So, to sum up the situation, all the functions I need to send uniform informations to the shader program (matrix and material data and so on...) seem to have a huge impact to the frame rate. However in this example my scene is only composed of a unique cube mesh! Nothing terrible to render for the GPU!
But I don't think the problem comes from the GPU because the impact of my program on it is just laughable (according to 'GPUShark'):
Only 6%! But just the display of the window (without the geometry) suffices to reach 6%! So the rendering of my cube have almost none impact on the GPU. So I think the problem comes from the CPU/GPU data transfer... I think it's normal to have a loss of performance using these function but go from more than 1000 FPS to 65 FPS it's incredible! And just to draw a simple geometry!
Is there a way to have better performance or is it normal to have a such loss of FPS using this technique of sending data?
What do you think about that ?
Thank you very much for your help!
Don't call glGetUniformLocation every time you need to set a uniform's value. Uniform locations don't change for a given shader (unless you recompile it), so look up the uniforms once after compiling the shader and save the location values for use in your Render function.

DirectX using multiple Render Targets as input to each other

I have a fairly simple DirectX 11 framework setup that I want to use for various 2D simulations. I am currently trying to implement the 2D Wave Equation on the GPU. It requires I keep the grid state of the simulation at 2 previous timesteps in order to compute the new one.
How I went about it was this - I have a class called FrameBuffer, which has the following public methods:
bool Initialize(D3DGraphicsObject* graphicsObject, int width, int height);
void BeginRender(float clearRed, float clearGreen, float clearBlue, float clearAlpha) const;
void EndRender() const;
// Return a pointer to the underlying texture resource
const ID3D11ShaderResourceView* GetTextureResource() const;
In my main draw loop I have an array of 3 of these buffers. Every loop I use the textures from the previous 2 buffers as inputs to the next frame buffer and I also draw any user input to change the simulation state. I then draw the result.
int nextStep = simStep+1;
if (nextStep > 2)
nextStep = 0;
mFrameArray[nextStep]->BeginRender(0.0f,0.0f,0.0f,1.0f);
{
mGraphicsObj->SetZBufferState(false);
mQuad->GetRenderer()->RenderBuffers(d3dGraphicsObj->GetDeviceContext());
ID3D11ShaderResourceView* texArray[2] = { mFrameArray[simStep]->GetTextureResource(),
mFrameArray[prevStep]->GetTextureResource() };
result = mWaveShader->Render(d3dGraphicsObj, mQuad->GetRenderer()->GetIndexCount(), texArray);
if (!result)
return false;
// perform any extra input
I_InputSystem *inputSystem = ServiceProvider::Instance().GetInputSystem();
if (inputSystem->IsMouseLeftDown()) {
int x,y;
inputSystem->GetMousePos(x,y);
int width,height;
mGraphicsObj->GetScreenDimensions(width,height);
float xPos = MapValue((float)x,0.0f,(float)width,-1.0f,1.0f);
float yPos = MapValue((float)y,0.0f,(float)height,-1.0f,1.0f);
mColorQuad->mTransform.position = Vector3f(xPos,-yPos,0);
result = mColorQuad->Render(&viewMatrix,&orthoMatrix);
if (!result)
return false;
}
mGraphicsObj->SetZBufferState(true);
}
mFrameArray[nextStep]->EndRender();
prevStep = simStep;
simStep = nextStep;
ID3D11ShaderResourceView* currTexture = mFrameArray[nextStep]->GetTextureResource();
// Render texture to screen
mGraphicsObj->SetZBufferState(false);
mQuad->SetTexture(currTexture);
result = mQuad->Render(&viewMatrix,&orthoMatrix);
if (!result)
return false;
mGraphicsObj->SetZBufferState(true);
The problem is nothing is happening. Whatever I draw appears on the screen(I draw using a small quad) but no part of the simulation is actually ran. I can provide the shader code if required, but I am certain it works since I've implemented this before on the CPU using the same algorithm. I'm just not certain how well D3D render targets work and if I'm just drawing wrong every frame.
EDIT 1:
Here is the code for the begin and end render functions of the frame buffers:
void D3DFrameBuffer::BeginRender(float clearRed, float clearGreen, float clearBlue, float clearAlpha) const {
ID3D11DeviceContext *context = pD3dGraphicsObject->GetDeviceContext();
context->OMSetRenderTargets(1, &(mRenderTargetView._Myptr), pD3dGraphicsObject->GetDepthStencilView());
float color[4];
// Setup the color to clear the buffer to.
color[0] = clearRed;
color[1] = clearGreen;
color[2] = clearBlue;
color[3] = clearAlpha;
// Clear the back buffer.
context->ClearRenderTargetView(mRenderTargetView.get(), color);
// Clear the depth buffer.
context->ClearDepthStencilView(pD3dGraphicsObject->GetDepthStencilView(), D3D11_CLEAR_DEPTH, 1.0f, 0);
void D3DFrameBuffer::EndRender() const {
pD3dGraphicsObject->SetBackBufferRenderTarget();
}
Edit 2 Ok, I after I set up the DirectX debug layer I saw that I was using an SRV as a render target while it was still bound to the Pixel stage in out of the shaders. I fixed that by setting shader resources to NULL after I render with the wave shader, but the problem still persists - nothing actually gets ran or updated. I took the render target code from here and slightly modified it, if its any help: http://rastertek.com/dx11tut22.html
Okay, as I understand correct you need a multipass-rendering to texture.
Basiacally you do it like I've described here: link
You creating SRVs with both D3D11_BIND_SHADER_RESOURCE and D3D11_BIND_RENDER_TARGET bind flags.
You ctreating render targets from textures
You set first texture as input (*SetShaderResources()) and second texture as output (OMSetRenderTargets())
You Draw()*
then you bind second texture as input, and third as output
Draw()*
etc.
Additional advices:
If your target GPU capable to write to UAVs from non-compute shaders, you can use it. It is much more simple and less error prone.
If your target GPU suitable, consider using compute shader. It is a pleasure.
Don't forget to enable DirectX debug layer. Sometimes we make obvious errors and debug output can point to them.
Use graphics debugger to review your textures after each draw call.
Edit 1:
As I see, you call BeginRender and OMSetRenderTargets only once, so, all rendering goes into mRenderTargetView. But what you need is to interleave:
SetSRV(texture1);
SetRT(texture2);
Draw();
SetSRV(texture2);
SetRT(texture3);
Draw();
SetSRV(texture3);
SetRT(backBuffer);
Draw();
Also, we don't know what is mRenderTargetView yet.
so, before
result = mColorQuad->Render(&viewMatrix,&orthoMatrix);
somewhere must be OMSetRenderTargets .
Probably, it s better to review your Begin()/End() design, to make resource binding more clearly visible.
Happy coding! =)

C++ shader question

I am using Nvidia CG and Direct3D9 and have the question about the following code.
It compiles, but doesn't "loads" (using cgLoadProgram wrapper) and the resulting failure is described simplyas D3D failure happened.
It's a part of the pixel shader compiled with shader model set to 3.0
What may be interesting is that this shader loads fine in the following cases:
1) Manually unrolling the while statement (to many if { } statements).
2) Removing the line with the tex2D function in the loop.
3) Switching to shader model 2_X and manually unrolling the loop.
Problem part of the shader code:
float2 tex = float2(1, 1);
float2 dtex = float2(0.01, 0.01);
float h = 1.0 - tex2D(height_texture1, tex);
float height = 1.00;
while ( h < height )
{
height -= 0.1;
tex += dtex;
// Remove the next line and it works (not as expected,
// of course)
h = tex2D( height_texture1, tex );
}
If someone knows why this can happen or could test the similiar code in non-CG environment or could help me in some other way, I'm waiting for you ;)
Thanks.
I think you need to determine the gradients before the loop using ddx/ddy on the texture coordinates and then use tex2D(sampler2D samp, float2 s, float2 dx, float2 dy)
The GPU always renders quads not pixels (even on pixel borders - superfluous pixels are discarded by the render backend). This is done because it allows it to always calculate the screen space texture derivates even when you use calculated texture coordinates. It just needs to take the difference between the values at the pixel centers.
But this doesn't work when using dynamic branching like in the code in the question, because the shader processors at the individual pixels could diverge in control flow. So you need to calculate the derivates manually via ddx/ddy before the program flow can diverge.

HLSL and ID3DXFont/ID3DXSprite

I've started at the beginning, and my code will capably display the grand total of some text. I've been adding support for sprites. The trouble that I've run in to, is that it doesn't seem to recognize my HLSL. I set the technique, began it, began the pass, drew the sprites, flushed them, ended the pass, the technique. And D3D comes up with this little "Using FF to PS converter" in the VS output. Same for VS. I'm not trying to do anything advanced with my HLSL - just use it and get a little more familiar with it and make sure I know how to implement it. That's C++0x auto, by the way, so automatic type deduction (because I'm lazy).
#define D3DCALL(a) { auto __ = a; if (FAILED(__)) DXTrace(__FILE__, __LINE__, __, WIDEN(#a), TRUE); }
D3DCALL(spriteeffect->SetTechnique(spritetechnique));
D3DCALL(spriteeffect->Begin(&passes, NULL));
D3DCALL(spriteeffect->BeginPass(0)); // We know this is zero.
D3DCALL(sprite->Begin(D3DXSPRITE_OBJECTSPACE | D3DXSPRITE_DO_NOT_ADDREF_TEXTURE | D3DXSPRITE_SORT_TEXTURE | D3DXSPRITE_ALPHABLEND | D3DXSPRITE_SORT_DEPTH_FRONTTOBACK));
RenderAndCleanUp(common->sprites);
D3DCALL(sprite->End());
D3DCALL(spriteeffect->EndPass());
D3DCALL(spriteeffect->End());
where RenderAndCleanUp is a simple templated function that loops through the sprites, destroys those that need to be, and renders the rest, and common->sprites is a simple vector of all the sprite objects. Since DXTrace never goes off, I'll guarantee that none of the functions fail. I've also set the control panel to max debugging.
I checked the D3DXHANDLEs and they're all non-NULL. It doesn't report any compilation errors, or any errors or warnings.
// Contains the HLSL for sprites.
// Based on transform.fx, by Frank Luna.
// FX parameter (global variable to the shader).
uniform extern float4x4 gWVP;
// Structure
struct OutputVS
{
float4 posH : POSITION0;
float4 color : COLOR0;
};
// Vertex shader
OutputVS SpriteVS(float3 post : POSITION0,
float4 col : COLOR0)
{
// Zero out our output.
OutputVS outVS = (OutputVS)0;
outVS.posH = mul(float4(post, 1.0f), gWVP); // Transform
outVS.color = col;
// Done--return the output.
return outVS;
}
// Pixel shader - take the original colour of the pixel and just return it. Nothing fancy.
float4 SpritePS( float4 col : COLOR0 ) : COLOR
{
return col;
}
technique Sprite
{
pass P0
{
// Specify the vertex and pixel shader associated
// with this pass.
vertexShader = compile vs_3_0 SpriteVS();
pixelShader = compile ps_3_0 SpritePS();
}
}
This is native C++ looking at Direct3D9.
AFAIR D3DXSprite and D3DXFont rendering is implemented inside of D3DX itself. So, it sets its own shaders, states (emulates fixed-function pipeline) and renders text/sprites. So, your shaders and states have no effect on these objects.
You may implement your own text/sprite rendering subsystem, it's not so hard. Another vote for this is that Microsoft officially deprecated D3DX.