I want to have discard in my fragment shader prevent that fragment from writing to the stencil buffer; I had an idea for a visual effect in my game, and it might be implementable if I can rely on that behaviour for discard.
My fragment shader doesn't do anything very fancy; it just samples a texture, and if the alpha is near 0, it discard otherwise it writes.
void main()
{
vec4 texColor = texture(tex, texCoord);
if(texColor.a < 0.5f)
{
discard;
}
FragColor = texColor;
}
However, it sounds like I cannot rely on discard preventing the stencil buffer from getting written to:
If Stencil Testing is active, then discarded fragments can still affect the stencil buffer. The stencil test can modify the stencil buffer, even on a stencil or depth test failure. And since the stencil test happens before the depth test, the depth test failures cannot prevent the stencil test from updating the stencil buffer.
I do need the stencil test active while I'm rendering. Can I not rely on discard preventing writes to the stencil buffer?
By the normal rules of OpenGL's order of operations, the stencil test happens after the execution of the fragment shader. As such, a fragment shader which executes a discard statement will prevent writes to framebuffer images. With one exception.
That exception being that the order of operations can be changed by explicit request of the FS. If no such request is given, then things must proceed as defined by the original order (which is why using discard turns off early fragment tests as an optimization).
The paragraph in question is talking about how part of the operation called the "stencil test" includes potentially updating to the stencil buffer. That is, the stencil test can both discard a fragment and change the stencil buffer's value based on how it was discarded. This is unlike the depth test, which if it fails, never updates the depth buffer's value.
Related
Given a fragment shader that modifies the original depth of the fragment and writes to gl_FragDepth.
Will the code after writing to gl_Fragdepth still be executed if the depth test fails at that point?
Example:
in float depth;
layout(depth_less) out float gl_FragDepth;
void main() {
float newDepth = depth - someModification;
gl_FragDepth = newDepth;
A();
}
Will A be executed if newDepth is greater than the current value in gl_FragDepth?
If so, what would be the alternative to stop the shader from doing unneccessary computations in A - without using a custom depth buffer?
In your example, A() will always be executed as long as it contributes to any output value, e.g. a color (otherwise, the compiler will remove it as an optimization). The depth test is a per-sample operation usually performed after the fragment shader. Under special circumstances, it is possible to do this test before the fragment shader, but this requires that the fragment shader itself does not write to gl_FragDepth.
Is the modification you do a uniform one or is it different for each fragment? If it is uniform, you could do it in a geometry shader - just apply the depth modification to the whole primitive. Then you could use the early-z. If it's on per-fragment basis, you could try binding the current depth render target as a readonly image, fetch the stored depth value, perform a manual comparison in your shader and discard the fragment if it fails. However, I neither know whether you can actually bind the currently bound framebuffer's render targets as images, even as readonly, nor whether this would be more or less performant than just executing A().
Let's say I have this scene
And I want to add depth information from a custom made fragment shader.
Now the intuitive thing to do would be to draw a quad over my teapot without depth test enabled but with glDepthMask( 1 ) and glColorMask( 0, 0, 0, 0 ). Write some fragments gl_FragDepth and discard some other fragments.
if ( gl_FragCoord.x < 100 )
gl_FragDepth = 0.1;
else
discard;
For some reason, on a NVidia Quadro 600 and K5000 it works as expected but on a NVidia K3000M and a Firepro(dont't remember which one), all the area covered by my discarded fragments is given the depth value of the quad.
Can't I leave the discarded fragments depth values unmodified?
EDIT I have found a solution to my problem. It turns out that as Andon M. Coleman and Matt Fishman pointed out, I have early_fragment_test enabled but not because I enabled it, but because I use imageStore, imageLoad.
With the little time I had to address the problem, I simply copied the content of my current depth buffer just before the "add depth pass" to a texture. Assigned it to a uniform sampler2D. And this is the code in my shader:
if ( gl_FragCoord.x < 100 )
gl_FragDepth = 0.1;
else
{
gl_FragDepth = texture( depthTex, gl_PointCoord ).r;
color = vec4(0.0,0.0,0.0,0.0);
return;
}
This writes a completely transparent pixel with an unchanged z value.
Well, it sounds like a driver bug to me -- discarded fragments should not hit the depth buffer. You could bind the original depth buffer as a texture, sample it using the gl_FragCoord, and then write the result back instead of using discard. That would add an extra texture lookup -- but it might be a suitable workaround.
EDIT: From section 6.4 of the GLSL 4.40 Specification:
The discard keyword is only allowed within fragment shaders. It can be
used within a fragment shader to abandon the operation on the current
fragment. This keyword causes the fragment to be discarded and no
updates to any buffers will occur. Control flow exits the shader, and
subsequent implicit or explicit
derivatives are undefined when this exit is non-uniform. It would
typically be used within a conditional statement, for example:
if (intensity < 0.0) discard;
A fragment shader may test a fragment’s
alpha value and discard the fragment based on that test. However, it
should be noted that coverage testing occurs after the fragment shader
runs, and the coverage test can change the alpha value.if
(emphasis mine)
Posting a separate answer, because I found some new info in the OpenGL spec:
If early fragment tests are enabled, any depth value computed by the
fragment shader has no effect. Additionally, the depth buffer, stencil
buffer, and occlusion query sample counts may be updated even for
fragments or samples that would be discarded after fragment shader
execution due to per-fragment operations such as alpha-to-coverage or
alpha tests.
Do you have early fragment testing enabled? More info: Early Fragment Test
I'm using a fragment shader which discards some fragments using the discard keyword. My shader also uses the early_fragment_tests ( image store load obliges ).
EDIT :
I do not write the gl_FragDepth, I let the standard OpenGL handle the depth value.
Will my depth buffer be updated with the fragment's depth before the discard keyword is executed?
EDIT :
It does not seems like it on my NVidia Quadro 600 and K5000.
Any clue where I could find that information? FYI, I searched http://www.opengl.org/registry/specs/ARB/shader_image_load_store.txt. I found close enough topics but not that particular one.
Will my depth buffer be updated with the fragment's depth before the discard keyword is executed?
No, this sort of behavior is explicitly forbidden in a shader that contains discard or that writes an arbitrary value to gl_FragDepth. This is because in such a shader, the depth of your fragment after it is shaded may be unrelated the position generated during initial rasterization (pre-shading).
Without writing to gl_FragDepth or discarding, the depth of a fragment is actually known long before the actual fragment shader executes and this forms the foundation for early depth tests. Rasterization/shading can be skipped for some (individual tiled regions) or all of a primitive if it can be determined that it would have failed a depth test before the fragment shader is evaluated, but if the fragment shader itself is what determines a fragment's depth, then all bets are off.
There is an exception to this rule in DX11 / OpenGL 4.x. If you write your shaders in such a way that you can guarantee the output depth will always preserve the result of a depth test (same result as the depth generated during rasterization), early fragment tests can be enabled in a shader that uses discard or writes to gl_FragDepth. This feature is known as conservative depth, and unless you use this it is generally understood that discard is going to break early depth optimizations across the board.
Now, since you should never write to the depth buffer before you know whether the value you are writing passes or fails a depth test (gl_FragDepth may be different) or if the fragment even survives (discard may be used), you can see why a primitive shaded by a fragment shader that contains discard cannot write to the depth buffer before the shader is evaluated.
I think the information you are looking for is on that page:
If early fragment tests are enabled, any depth value computed by the
fragment shader has no effect. Additionally, the depth buffer, stencil
buffer, and occlusion query sample counts may be updated even for
fragments or samples that would be discarded after fragment shader
execution due to per-fragment operations such as alpha-to-coverage or
alpha tests.
The word "may" in "the depth buffer, [etc.] may be updated", implies it is implementation dependent (or completely random).
In a deferred shading framework, I am using different framebufer objects to perform various render passes. In the first pass I write the DEPTH_STENCIL_ATTACHMENT for the whole scene to a texture, let's call it DepthStencilTexture.
To access the depth information stored in DepthStencilTexture from different render passes, for which I use different framebuffer objects, I know two ways:
1) I bind the DepthStencilTexture to the shader and I access it in the fragment shader, where I do the depth manually, like this
uniform vec2 WinSize; //windows dimensions
vec2 uv=gl_FragCoord.st/WinSize;
float depth=texture(DepthStencilTexture ,uv).r;
if(gl_FragCoord.z>depth) discard;
I also set glDisable(GL_DEPTH_TEST) and glDepthMask(GL_FALSE)
2) I bind the DepthStencilTexture to the framebuffer object as DEPTH_STENCIL_ATTACHMENT and set glEnable(GL_DEPTH_TEST) and glDepthMask(GL_FALSE) (edit: in this case I won't bind the DepthStencilTexture to the shader, to avoid loop feedback, see the answer by Nicol Bolas, and I if I need the depth in the fragment shader I will use gl_FragCorrd.z)
In certain situations, such as drawing light volumes, for which I need the Stencil Test and writing to the stencil buffer, I am going for the solution 2).
In other situations, in which I completely ignore the Stencil, and just need the depth stored in the DepthStencilTexture, does option 1) gives any advantages over the more "natural" option 2) ?
For example I have a (silly, I think) doubt about it . Sometimes in my fragment shaders Icompute the WorldPosition from the depth. In the case 1) it would be like this
uniform mat4 invPV; //inverse PV matrix
vec2 uv=gl_FragCoord.st/WinSize;
vec4 WorldPosition=invPV*vec4(uv, texture(DepthStencilTexture ,uv).r ,1.0f );
WorldPosition=WorldPosition/WorldPosition.w;
In the case 2) it would be like this (edit: this is wrong, gl_FragCoord.z is the current fragment's depth, not the actual depth stored in the texture)
uniform mat4 invPV; //inverse PV matrix
vec2 uv=gl_FragCoord.st/WinSize;
vec4 WorldPosition=invPV*vec4(uv, gl_FragCoord.z, 1.0f );
WorldPosition=WorldPosition/WorldPosition.w;
I am assuming that gl_FragCoord.z in case 2) will be the same as texture(DepthStencilTexture ,uv).r in case 1), or, in other words, the depth stored in the the DepthStencilTexture. Is it true? Is gl_FragCoord.z read from the currently bound DEPTH_STENCIL_ATTACHMENT also with glDisable(GL_DEPTH_TEST) and glDepthMask(GL_FALSE) ?
Going strictly by the OpenGL specification, option 2 is not allowed. Not if you're also reading from that texture.
Yes, I realize you're using write masks to prevent depth writes. It doesn't matter; the OpenGL specification is quite clear. In accord with 9.3.1 of OpenGL 4.4, a feedback loop is established when:
an image from texture object T is attached to the currently bound draw framebuffer object at attachment point A
the texture object T is currently bound to a texture unit U, and
the current programmable vertex and/or fragment processing state makes it
possible (see below) to sample from the texture object T bound to texture
unit U
That is the case in your code. So you technically have undefined behavior.
One reason this is undefined is so that simply changing write masks won't have to do things like clearing framebuffer and/or texture caches.
That being said, you can get away with option 2 if you employ NV_texture_barrier. Which, despite the name, is quite widely available on AMD hardware. The main thing to do here is to issue a barrier after you do all of your depth writing, so that all subsequent reads are guaranteed to work. The barrier will do all of the cache clearing and such you need.
Otherwise, option 1 is the only choice: doing the depth test manually.
I am assuming that gl_FragCoord.z in case 2) will be the same as texture(DepthStencilTexture ,uv).r in case 1), or, in other words, the depth stored in the the DepthStencilTexture. Is it true?
Neither is true. gl_FragCoord is the coordinate of the fragment being processed. This is the fragment generated by the rasterizer, based on the data for the primitive being rasterized. It has nothing to do with the contents of the framebuffer.
In my project, I used 'discard' call to perform customized stencil test, which tries to draw things only on a specified area defined by a stencil texture. Here is the code from fragment shader:
//get the stencil value from a texture
float value=texture2D( stencilTexture, gl_FragCoord.xy/1024.0).x;
//check if value equals the desired value, if not draw nothing
if(abs(value-desiredValue)>0.1)
{
discard;
}
This code works, but suffers from a performance problem because of the 'discard' call. Is there an alternative way to do this through GPU Shaders? Tell me how.
If you access a texture, you must suffer the performance penalties associated with accessing a texture. In the same way, if you want to stop a fragment from being rendered, you must suffer the performance penalties associated with stopping fragments from being rendered.
This will be true regardless of how you stop that fragment. Whether it's a true stencil test, your shader-based discard, or alpha testing, all of these will encounter the same general performance issues (for hardware where discard leads to any significant performance problems, which is mainly mobile hardware). The only exception is the depth test, and that's because of why certain hardware has problems with discard.
For platforms where discard has a substantial impact in performance, the rendering algorithm works most optimally if the hardware can assume that the depth is the final arbiter of whether a fragment will be rendered (and thus, the fragment with the highest/lowest depth always wins). Therefore, any method of culling the fragment other than the depth test will interfere with this optimization.