How can I "add" Depth information to the main frame buffer - opengl

Let's say I have this scene
And I want to add depth information from a custom made fragment shader.
Now the intuitive thing to do would be to draw a quad over my teapot without depth test enabled but with glDepthMask( 1 ) and glColorMask( 0, 0, 0, 0 ). Write some fragments gl_FragDepth and discard some other fragments.
if ( gl_FragCoord.x < 100 )
gl_FragDepth = 0.1;
else
discard;
For some reason, on a NVidia Quadro 600 and K5000 it works as expected but on a NVidia K3000M and a Firepro(dont't remember which one), all the area covered by my discarded fragments is given the depth value of the quad.
Can't I leave the discarded fragments depth values unmodified?
EDIT I have found a solution to my problem. It turns out that as Andon M. Coleman and Matt Fishman pointed out, I have early_fragment_test enabled but not because I enabled it, but because I use imageStore, imageLoad.
With the little time I had to address the problem, I simply copied the content of my current depth buffer just before the "add depth pass" to a texture. Assigned it to a uniform sampler2D. And this is the code in my shader:
if ( gl_FragCoord.x < 100 )
gl_FragDepth = 0.1;
else
{
gl_FragDepth = texture( depthTex, gl_PointCoord ).r;
color = vec4(0.0,0.0,0.0,0.0);
return;
}
This writes a completely transparent pixel with an unchanged z value.

Well, it sounds like a driver bug to me -- discarded fragments should not hit the depth buffer. You could bind the original depth buffer as a texture, sample it using the gl_FragCoord, and then write the result back instead of using discard. That would add an extra texture lookup -- but it might be a suitable workaround.
EDIT: From section 6.4 of the GLSL 4.40 Specification:
The discard keyword is only allowed within fragment shaders. It can be
used within a fragment shader to abandon the operation on the current
fragment. This keyword causes the fragment to be discarded and no
updates to any buffers will occur. Control flow exits the shader, and
subsequent implicit or explicit
derivatives are undefined when this exit is non-uniform. It would
typically be used within a conditional statement, for example:
if (intensity < 0.0) discard;
A fragment shader may test a fragment’s
alpha value and discard the fragment based on that test. However, it
should be noted that coverage testing occurs after the fragment shader
runs, and the coverage test can change the alpha value.if
(emphasis mine)

Posting a separate answer, because I found some new info in the OpenGL spec:
If early fragment tests are enabled, any depth value computed by the
fragment shader has no effect. Additionally, the depth buffer, stencil
buffer, and occlusion query sample counts may be updated even for
fragments or samples that would be discarded after fragment shader
execution due to per-fragment operations such as alpha-to-coverage or
alpha tests.
Do you have early fragment testing enabled? More info: Early Fragment Test

Related

MSAA and vertex interpolation cause out-of-range values

I am using GLSL with a vertex shader and a fragment shader.
The vertex shader outputs a highp float in the range of [0,1]
When it arrives in the fragment shader, I see values (at triangle edges) that exceed 1.1 no less!
This issue goes away if I...
Either Disable MSAA
Or disable the interpolation by using the GLSL interpolation qualifier flat.
How can a clamped 0-to-1 high precision float arrive in fragment shader as a value that is substantially larger than 1, if MSAA is enabled?
vertex shader code:
out highp float lightcontrib2;
...
lightcontrib2 = clamp( irrad, 0.0, 1.0 );
fragment shader code:
in highp float lightcontrib2;
...
if (lightcontrib2>1.1) { fragColor = vec4(1,0,1,1); return; }
And sure enough, with MSAA 4x, this is the image generated by OpenGL. (Observe the magneta coloured pixels in the centre of the window.)
I've ruled out Not-A-Number values.
GL_VERSION: 3.2.0 NVIDIA 450.51.06
How can a clamped 0-to-1 high precision float arrive in fragment shader as a value that is substantially larger than 1, if MSAA is enabled?
Multisampling at its core is a variation of supersampling: of taking multiple samples from a pixel-sized area of a primitive. Different locations within the space of that pixel-sized area are sampled to produce the resulting value.
When you're at the edge of a primitive however, some of the locations in that pixel-sized area are outside of the area that the primitive actually covers. In supersampling that's fine; you just don't use those samples.
However, multisampling is different. In multisampling, the depth samples are distinct from the fragment shader generated samples. That is, the system might execute the FS only once, but take 4 depth samples and test them against 4 samples in the depth buffer. Any samples that pass the depth test get their color values from the single FS invocation that was executed. If some of those 4 depth samples are outside of the primitive's area, that's fine; they don't count.
However, by divorcing the FS invocation values from the depth sampling, we now encounter an issue: exactly where did that single FS invocation execute within the pixel area?
And that's where we encounter the problem. If the FS invocation executes on a location that's outside of the area of the primitive, that normally gets tossed away. But if any depth samples are within the area of the primitive, then those depth samples still need to get color data. And the whole point of MSAA is to not execute the FS for each sample, so they may get their color data from an FS invocation executed on a different location.
Ideally, it would be from an FS invocation executed on a location within the primitive's area. But hardware can't guarantee that. Well, it can't guarantee it by default, at any rate. Not every algorithm has issues if an FS location happens to fall slightly outside of the primitive's area.
But some algorithms do have issues. This is why we have the centroid qualifier for fragment shader inputs. It ensures that a particular interpolated value will be generated within the area of the primitive.
As you might have guessed, this isn't the default because it's slower than non-centroid interpolation. So use it only when you need it.

Using 'discard' in GLSL 4.1 fragment shader with multisampling

I'm attempting depth peeling with multisampling enabled, and having some issues with incorrect data ending up in my transparent layers. I use the following to check if a sample (originally a fragment) is valid for this pass:
float depth = texelFetch(depthMinima, ivec2(gl_FragCoord.xy), gl_SampleID).r;
if (gl_FragCoord.z <= depth)
{
discard;
}
Where depthMinima is defined as
uniform sampler2DMS depthMinima;
I have enabled GL_SAMPLE_SHADING which, if I understand correctly, should result in the fragment shader being called on a per-sample basis. If this isn't the case, is there a way I can get this to happen?
The result is that the first layer or two look right, but beneath that (and I'm doing 8 layers) I start getting junk values - mostly plain blue, sometimes values from previous layers.
This works fine for single-sampled buffers, but not for multi-sampled buffers. Does the discard keyword still discard the entire fragment?
I have enabled GL_SAMPLE_SHADING which, if I understand correctly, should result in the fragment shader being called on a per-sample basis.
It's not enough to only enable GL_SAMPLE_SHADING. You also need to set:
glMinSampleShading(1.0f)
A value of 1.0 indicates that each sample in the framebuffer should be indpendently shaded. A value of 0.0 effectively allows the GL to ignore sample rate shading. Any value between 0.0 and 1.0 allows the GL to shade only a subset of the total samples within each covered fragment. Which samples are shaded and the algorithm used to select that subset of the fragment's samples is implementation dependent.
– glMinSampleShading
In other words 1.0 tells it to shade all samples. 0.5 tells it to shade at least half the samples.
// Check the current value
GLfloat value;
glGetFloatv(GL_MIN_SAMPLE_SHADING_VALUE, &value);
If either GL_MULTISAMPLE or GL_SAMPLE_SHADING is disabled then sample shading has no effect.
There'll be multiple fragment shader invocations for each fragment, to which each sample is a subset of the fragment. In other words. Sample shading specifies the minimum number of samples to process for each fragment.
If GL_MIN_SAMPLE_SHADING_VALUE is set to 1.0 then there'll be issued a fragment shader invocation for each sample (within the primitive).
If its set to 0.5 then there'll be a shader invocation for every second sample.
max(ceil(MIN_SAMPLE_SHADING_VALUE * SAMPLES), 1)
Each being evaluated at their sample location (gl_SamplePosition).
With gl_SampleID being the index of the sample that is currently being processed.
Should discard work on a per-sample basis, or does it still only work per-fragment?
With or without sample shading discard still only terminate a single invocation of the shader.
Resources:
ARB_sample_shading
Fragment Shader
Per-Sample Processing
I faced a similar problem when using depth_peeling on a multi-sample buffer.
Some artifacts appears due to the depth_test error when using a multi_sample depth texture from the previous peel and the current fragment depth.
vec4 previous_peel_depth_tex = texelFetch(previous_peel_depth, coord, 0);
the third argument is the sample you want to use for your comparison which will give a different value from the fragment center. Like the author said you can use gl_SampleID
vec4 previous_peel_depth_tex = texelFetch(previous_peel_depth, ivec2(gl_FragCoord.xy), gl_SampleID);
This solved my problem but with a huge performance drop, if you have 4 samples you will run your fragment shader 4 times, if 4 have peels it means 4x4 calls. You don't need to set the opengl flags if atleast glEnable(GL_MULTISAMPLE); is on
Any static use of [gl_SampleID] in a fragment shader causes the entire
shader to be evaluated per-sample
I decided to use a different approach and to add a bias when doing the depth comparison
float previous_linearized = linearize_depth(previous_peel_depth_tex.r, near, far);
float current_linearized = linearize_depth(gl_FragCoord.z, near, far);
float bias_meter = 0.05;
bool belong_to_previous_peel = delta_depth < bias_meter;
This solve my problem but some artifacts might still appears and you need to adjust your bias in your eye_space units (meter, cm, ...)

Shader execution after writing to gl_FragDepth

Given a fragment shader that modifies the original depth of the fragment and writes to gl_FragDepth.
Will the code after writing to gl_Fragdepth still be executed if the depth test fails at that point?
Example:
in float depth;
layout(depth_less) out float gl_FragDepth;
void main() {
float newDepth = depth - someModification;
gl_FragDepth = newDepth;
A();
}
Will A be executed if newDepth is greater than the current value in gl_FragDepth?
If so, what would be the alternative to stop the shader from doing unneccessary computations in A - without using a custom depth buffer?
In your example, A() will always be executed as long as it contributes to any output value, e.g. a color (otherwise, the compiler will remove it as an optimization). The depth test is a per-sample operation usually performed after the fragment shader. Under special circumstances, it is possible to do this test before the fragment shader, but this requires that the fragment shader itself does not write to gl_FragDepth.
Is the modification you do a uniform one or is it different for each fragment? If it is uniform, you could do it in a geometry shader - just apply the depth modification to the whole primitive. Then you could use the early-z. If it's on per-fragment basis, you could try binding the current depth render target as a readonly image, fetch the stored depth value, perform a manual comparison in your shader and discard the fragment if it fails. However, I neither know whether you can actually bind the currently bound framebuffer's render targets as images, even as readonly, nor whether this would be more or less performant than just executing A().

What happens to the depth buffer if I discard a fragment in a shader using early_fragment_tests?

I'm using a fragment shader which discards some fragments using the discard keyword. My shader also uses the early_fragment_tests ( image store load obliges ).
EDIT :
I do not write the gl_FragDepth, I let the standard OpenGL handle the depth value.
Will my depth buffer be updated with the fragment's depth before the discard keyword is executed?
EDIT :
It does not seems like it on my NVidia Quadro 600 and K5000.
Any clue where I could find that information? FYI, I searched http://www.opengl.org/registry/specs/ARB/shader_image_load_store.txt. I found close enough topics but not that particular one.
Will my depth buffer be updated with the fragment's depth before the discard keyword is executed?
No, this sort of behavior is explicitly forbidden in a shader that contains discard or that writes an arbitrary value to gl_FragDepth. This is because in such a shader, the depth of your fragment after it is shaded may be unrelated the position generated during initial rasterization (pre-shading).
Without writing to gl_FragDepth or discarding, the depth of a fragment is actually known long before the actual fragment shader executes and this forms the foundation for early depth tests. Rasterization/shading can be skipped for some (individual tiled regions) or all of a primitive if it can be determined that it would have failed a depth test before the fragment shader is evaluated, but if the fragment shader itself is what determines a fragment's depth, then all bets are off.
There is an exception to this rule in DX11 / OpenGL 4.x. If you write your shaders in such a way that you can guarantee the output depth will always preserve the result of a depth test (same result as the depth generated during rasterization), early fragment tests can be enabled in a shader that uses discard or writes to gl_FragDepth. This feature is known as conservative depth, and unless you use this it is generally understood that discard is going to break early depth optimizations across the board.
Now, since you should never write to the depth buffer before you know whether the value you are writing passes or fails a depth test (gl_FragDepth may be different) or if the fragment even survives (discard may be used), you can see why a primitive shaded by a fragment shader that contains discard cannot write to the depth buffer before the shader is evaluated.
I think the information you are looking for is on that page:
If early fragment tests are enabled, any depth value computed by the
fragment shader has no effect. Additionally, the depth buffer, stencil
buffer, and occlusion query sample counts may be updated even for
fragments or samples that would be discarded after fragment shader
execution due to per-fragment operations such as alpha-to-coverage or
alpha tests.
The word "may" in "the depth buffer, [etc.] may be updated", implies it is implementation dependent (or completely random).

Bind pre rendered depth texture to fbo or to fragment shader?

In a deferred shading framework, I am using different framebufer objects to perform various render passes. In the first pass I write the DEPTH_STENCIL_ATTACHMENT for the whole scene to a texture, let's call it DepthStencilTexture.
To access the depth information stored in DepthStencilTexture from different render passes, for which I use different framebuffer objects, I know two ways:
1) I bind the DepthStencilTexture to the shader and I access it in the fragment shader, where I do the depth manually, like this
uniform vec2 WinSize; //windows dimensions
vec2 uv=gl_FragCoord.st/WinSize;
float depth=texture(DepthStencilTexture ,uv).r;
if(gl_FragCoord.z>depth) discard;
I also set glDisable(GL_DEPTH_TEST) and glDepthMask(GL_FALSE)
2) I bind the DepthStencilTexture to the framebuffer object as DEPTH_STENCIL_ATTACHMENT and set glEnable(GL_DEPTH_TEST) and glDepthMask(GL_FALSE) (edit: in this case I won't bind the DepthStencilTexture to the shader, to avoid loop feedback, see the answer by Nicol Bolas, and I if I need the depth in the fragment shader I will use gl_FragCorrd.z)
In certain situations, such as drawing light volumes, for which I need the Stencil Test and writing to the stencil buffer, I am going for the solution 2).
In other situations, in which I completely ignore the Stencil, and just need the depth stored in the DepthStencilTexture, does option 1) gives any advantages over the more "natural" option 2) ?
For example I have a (silly, I think) doubt about it . Sometimes in my fragment shaders Icompute the WorldPosition from the depth. In the case 1) it would be like this
uniform mat4 invPV; //inverse PV matrix
vec2 uv=gl_FragCoord.st/WinSize;
vec4 WorldPosition=invPV*vec4(uv, texture(DepthStencilTexture ,uv).r ,1.0f );
WorldPosition=WorldPosition/WorldPosition.w;
In the case 2) it would be like this (edit: this is wrong, gl_FragCoord.z is the current fragment's depth, not the actual depth stored in the texture)
uniform mat4 invPV; //inverse PV matrix
vec2 uv=gl_FragCoord.st/WinSize;
vec4 WorldPosition=invPV*vec4(uv, gl_FragCoord.z, 1.0f );
WorldPosition=WorldPosition/WorldPosition.w;
I am assuming that gl_FragCoord.z in case 2) will be the same as texture(DepthStencilTexture ,uv).r in case 1), or, in other words, the depth stored in the the DepthStencilTexture. Is it true? Is gl_FragCoord.z read from the currently bound DEPTH_STENCIL_ATTACHMENT also with glDisable(GL_DEPTH_TEST) and glDepthMask(GL_FALSE) ?
Going strictly by the OpenGL specification, option 2 is not allowed. Not if you're also reading from that texture.
Yes, I realize you're using write masks to prevent depth writes. It doesn't matter; the OpenGL specification is quite clear. In accord with 9.3.1 of OpenGL 4.4, a feedback loop is established when:
an image from texture object T is attached to the currently bound draw framebuffer object at attachment point A
the texture object T is currently bound to a texture unit U, and
the current programmable vertex and/or fragment processing state makes it
possible (see below) to sample from the texture object T bound to texture
unit U
That is the case in your code. So you technically have undefined behavior.
One reason this is undefined is so that simply changing write masks won't have to do things like clearing framebuffer and/or texture caches.
That being said, you can get away with option 2 if you employ NV_texture_barrier. Which, despite the name, is quite widely available on AMD hardware. The main thing to do here is to issue a barrier after you do all of your depth writing, so that all subsequent reads are guaranteed to work. The barrier will do all of the cache clearing and such you need.
Otherwise, option 1 is the only choice: doing the depth test manually.
I am assuming that gl_FragCoord.z in case 2) will be the same as texture(DepthStencilTexture ,uv).r in case 1), or, in other words, the depth stored in the the DepthStencilTexture. Is it true?
Neither is true. gl_FragCoord is the coordinate of the fragment being processed. This is the fragment generated by the rasterizer, based on the data for the primitive being rasterized. It has nothing to do with the contents of the framebuffer.