Using 'discard' in GLSL 4.1 fragment shader with multisampling - opengl

I'm attempting depth peeling with multisampling enabled, and having some issues with incorrect data ending up in my transparent layers. I use the following to check if a sample (originally a fragment) is valid for this pass:
float depth = texelFetch(depthMinima, ivec2(gl_FragCoord.xy), gl_SampleID).r;
if (gl_FragCoord.z <= depth)
{
discard;
}
Where depthMinima is defined as
uniform sampler2DMS depthMinima;
I have enabled GL_SAMPLE_SHADING which, if I understand correctly, should result in the fragment shader being called on a per-sample basis. If this isn't the case, is there a way I can get this to happen?
The result is that the first layer or two look right, but beneath that (and I'm doing 8 layers) I start getting junk values - mostly plain blue, sometimes values from previous layers.
This works fine for single-sampled buffers, but not for multi-sampled buffers. Does the discard keyword still discard the entire fragment?

I have enabled GL_SAMPLE_SHADING which, if I understand correctly, should result in the fragment shader being called on a per-sample basis.
It's not enough to only enable GL_SAMPLE_SHADING. You also need to set:
glMinSampleShading(1.0f)
A value of 1.0 indicates that each sample in the framebuffer should be indpendently shaded. A value of 0.0 effectively allows the GL to ignore sample rate shading. Any value between 0.0 and 1.0 allows the GL to shade only a subset of the total samples within each covered fragment. Which samples are shaded and the algorithm used to select that subset of the fragment's samples is implementation dependent.
– glMinSampleShading
In other words 1.0 tells it to shade all samples. 0.5 tells it to shade at least half the samples.
// Check the current value
GLfloat value;
glGetFloatv(GL_MIN_SAMPLE_SHADING_VALUE, &value);
If either GL_MULTISAMPLE or GL_SAMPLE_SHADING is disabled then sample shading has no effect.
There'll be multiple fragment shader invocations for each fragment, to which each sample is a subset of the fragment. In other words. Sample shading specifies the minimum number of samples to process for each fragment.
If GL_MIN_SAMPLE_SHADING_VALUE is set to 1.0 then there'll be issued a fragment shader invocation for each sample (within the primitive).
If its set to 0.5 then there'll be a shader invocation for every second sample.
max(ceil(MIN_SAMPLE_SHADING_VALUE * SAMPLES), 1)
Each being evaluated at their sample location (gl_SamplePosition).
With gl_SampleID being the index of the sample that is currently being processed.
Should discard work on a per-sample basis, or does it still only work per-fragment?
With or without sample shading discard still only terminate a single invocation of the shader.
Resources:
ARB_sample_shading
Fragment Shader
Per-Sample Processing

I faced a similar problem when using depth_peeling on a multi-sample buffer.
Some artifacts appears due to the depth_test error when using a multi_sample depth texture from the previous peel and the current fragment depth.
vec4 previous_peel_depth_tex = texelFetch(previous_peel_depth, coord, 0);
the third argument is the sample you want to use for your comparison which will give a different value from the fragment center. Like the author said you can use gl_SampleID
vec4 previous_peel_depth_tex = texelFetch(previous_peel_depth, ivec2(gl_FragCoord.xy), gl_SampleID);
This solved my problem but with a huge performance drop, if you have 4 samples you will run your fragment shader 4 times, if 4 have peels it means 4x4 calls. You don't need to set the opengl flags if atleast glEnable(GL_MULTISAMPLE); is on
Any static use of [gl_SampleID] in a fragment shader causes the entire
shader to be evaluated per-sample
I decided to use a different approach and to add a bias when doing the depth comparison
float previous_linearized = linearize_depth(previous_peel_depth_tex.r, near, far);
float current_linearized = linearize_depth(gl_FragCoord.z, near, far);
float bias_meter = 0.05;
bool belong_to_previous_peel = delta_depth < bias_meter;
This solve my problem but some artifacts might still appears and you need to adjust your bias in your eye_space units (meter, cm, ...)

Related

MSAA and vertex interpolation cause out-of-range values

I am using GLSL with a vertex shader and a fragment shader.
The vertex shader outputs a highp float in the range of [0,1]
When it arrives in the fragment shader, I see values (at triangle edges) that exceed 1.1 no less!
This issue goes away if I...
Either Disable MSAA
Or disable the interpolation by using the GLSL interpolation qualifier flat.
How can a clamped 0-to-1 high precision float arrive in fragment shader as a value that is substantially larger than 1, if MSAA is enabled?
vertex shader code:
out highp float lightcontrib2;
...
lightcontrib2 = clamp( irrad, 0.0, 1.0 );
fragment shader code:
in highp float lightcontrib2;
...
if (lightcontrib2>1.1) { fragColor = vec4(1,0,1,1); return; }
And sure enough, with MSAA 4x, this is the image generated by OpenGL. (Observe the magneta coloured pixels in the centre of the window.)
I've ruled out Not-A-Number values.
GL_VERSION: 3.2.0 NVIDIA 450.51.06
How can a clamped 0-to-1 high precision float arrive in fragment shader as a value that is substantially larger than 1, if MSAA is enabled?
Multisampling at its core is a variation of supersampling: of taking multiple samples from a pixel-sized area of a primitive. Different locations within the space of that pixel-sized area are sampled to produce the resulting value.
When you're at the edge of a primitive however, some of the locations in that pixel-sized area are outside of the area that the primitive actually covers. In supersampling that's fine; you just don't use those samples.
However, multisampling is different. In multisampling, the depth samples are distinct from the fragment shader generated samples. That is, the system might execute the FS only once, but take 4 depth samples and test them against 4 samples in the depth buffer. Any samples that pass the depth test get their color values from the single FS invocation that was executed. If some of those 4 depth samples are outside of the primitive's area, that's fine; they don't count.
However, by divorcing the FS invocation values from the depth sampling, we now encounter an issue: exactly where did that single FS invocation execute within the pixel area?
And that's where we encounter the problem. If the FS invocation executes on a location that's outside of the area of the primitive, that normally gets tossed away. But if any depth samples are within the area of the primitive, then those depth samples still need to get color data. And the whole point of MSAA is to not execute the FS for each sample, so they may get their color data from an FS invocation executed on a different location.
Ideally, it would be from an FS invocation executed on a location within the primitive's area. But hardware can't guarantee that. Well, it can't guarantee it by default, at any rate. Not every algorithm has issues if an FS location happens to fall slightly outside of the primitive's area.
But some algorithms do have issues. This is why we have the centroid qualifier for fragment shader inputs. It ensures that a particular interpolated value will be generated within the area of the primitive.
As you might have guessed, this isn't the default because it's slower than non-centroid interpolation. So use it only when you need it.

How can I simulate "Glow Dodge" blending by using OpenGL Shader?

I wanted to simulate 'glow dodge' effect on clip studio using opengl and its shader.
So I found out that following equation is how 'glow dodge' works.
final.rgb = dest.rgb / (1 - source.rgb)
Then I came up with 2 ways to actually do it, but neither one doesn't seem to work.
First one was to calculate 1 / (1 - source.rgb) in the shader, and do multiply blending by using glBlendfunc(GL_ZERO, GL_SRC_COLOR), or glBlendfunc(GL_DST_COLOR, GL_ZERO).
but as khronos page says, all scale factors have range 0 to 1. which means I can't multiply the numbers over 1. so I can't use this method cuz most of the case goes more than 1.
Second one was to bring background texels by using glReadPixel(), then calculate everything in shader, then do additive blending by using glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA).
regardless the outcome I could get, glReadPixel() itself takes way too much time even with a 30 x 30 texel area. so I can't use this method.
I wonder if there's any other way to get an outcome as glowdodge blending mode should have.
With the extension EXT_shader_framebuffer_fetch a value from the framebuffer can be read:
This extension provides a mechanism whereby a fragment shader may read existing framebuffer data as input. This can be used to implement compositing operations that would have been inconvenient or impossible with fixed-function blending. It can also be used to apply a function to the framebuffer color, by writing a shader which uses the existing framebuffer color as its only input.
The extension is available for dektop OpenGL and OpenGL ES. An OpenGL extension has to be enabled (see Core Language (GLSL)- Extensions. The fragment color can be retrieved by the built in variable gl_LastFragData. e.g.:
#version 400
#extension GL_EXT_shader_framebuffer_fetch : require
out vec4 fragColor;
# [...]
void main
{
# [...]
fragColor = vec4(gl_LastFragData[0].rgb / (1.0 - source.rgb), 1.0);
}

How does texture lookup in non fragment shaders works?

The following is an excerpt from GLSL spec:
"Texture lookup functions are available in all shading stages. However, automatic level of detail is computed only for fragment shaders. Other shaders operate as though the base level of detail were computed as zero."
So this is how I see it:
Vertex shader:
vec4 texel = texture(SamplerObj, texCoord);
// since this is vertex shader, sampling will always take place
// from 0th Mipmap level of the texture.
Fragment shader:
vec4 texel = texture(SamplerObj, texCoord);
// since this is fragment shader, sampling will take place
// from Nth Mipmap level of the texture, where N is decided
// based on the distance of object on which texture is applied from camera.
Is my understanding correct?
That sounds right. You can specify an explicit LOD by using textureLod() instead of texture() in the vertex shader.
I believe you could also make it use a higher LOD by setting the GL_TEXTURE_MIN_LOD parameter on the texture. If you call e.g.:
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_LOD, 2.0f);
while the texture is bound, it should use mipmap level 2 when you sample the texture in the vertex shader. I have never tried this, but this is my understanding of how the behavior is defined.
// since this is fragment shader, sampling will take place
// from Nth Mipmap level of the texture, where N is decided
// based on the distance of object on which texture is applied from camera.
I think the bit about the distance isn't correct. The mipmap level to use is determined using the derivation of the texture coordinates for the neighbouring pixels. The sampler hardware can determine this because the generated code for the fragment shader typically uses SIMD instructions and generates values for multiple pixels simultaneously. For example, on Intel hardware a single thread usually operates on a 4x4 grid of pixels. That means that whenever a message is sent to the sampler hardware it is given a set 16 of texture coordinates and 16 texels are expected in reply. The sampler hardware can determine the derivation by looking at the difference between those 16 texture coordinates. That is probably why further down in the GLSL spec it says:
Implicit derivatives are undefined within non-uniform control flow and for non-fragment-shader texture fetches.
Non-uniform control flow would mess up the implicit derivatives because potentially not all of the fragments being processed in the thread would be sampling at the same time.

How can I "add" Depth information to the main frame buffer

Let's say I have this scene
And I want to add depth information from a custom made fragment shader.
Now the intuitive thing to do would be to draw a quad over my teapot without depth test enabled but with glDepthMask( 1 ) and glColorMask( 0, 0, 0, 0 ). Write some fragments gl_FragDepth and discard some other fragments.
if ( gl_FragCoord.x < 100 )
gl_FragDepth = 0.1;
else
discard;
For some reason, on a NVidia Quadro 600 and K5000 it works as expected but on a NVidia K3000M and a Firepro(dont't remember which one), all the area covered by my discarded fragments is given the depth value of the quad.
Can't I leave the discarded fragments depth values unmodified?
EDIT I have found a solution to my problem. It turns out that as Andon M. Coleman and Matt Fishman pointed out, I have early_fragment_test enabled but not because I enabled it, but because I use imageStore, imageLoad.
With the little time I had to address the problem, I simply copied the content of my current depth buffer just before the "add depth pass" to a texture. Assigned it to a uniform sampler2D. And this is the code in my shader:
if ( gl_FragCoord.x < 100 )
gl_FragDepth = 0.1;
else
{
gl_FragDepth = texture( depthTex, gl_PointCoord ).r;
color = vec4(0.0,0.0,0.0,0.0);
return;
}
This writes a completely transparent pixel with an unchanged z value.
Well, it sounds like a driver bug to me -- discarded fragments should not hit the depth buffer. You could bind the original depth buffer as a texture, sample it using the gl_FragCoord, and then write the result back instead of using discard. That would add an extra texture lookup -- but it might be a suitable workaround.
EDIT: From section 6.4 of the GLSL 4.40 Specification:
The discard keyword is only allowed within fragment shaders. It can be
used within a fragment shader to abandon the operation on the current
fragment. This keyword causes the fragment to be discarded and no
updates to any buffers will occur. Control flow exits the shader, and
subsequent implicit or explicit
derivatives are undefined when this exit is non-uniform. It would
typically be used within a conditional statement, for example:
if (intensity < 0.0) discard;
A fragment shader may test a fragment’s
alpha value and discard the fragment based on that test. However, it
should be noted that coverage testing occurs after the fragment shader
runs, and the coverage test can change the alpha value.if
(emphasis mine)
Posting a separate answer, because I found some new info in the OpenGL spec:
If early fragment tests are enabled, any depth value computed by the
fragment shader has no effect. Additionally, the depth buffer, stencil
buffer, and occlusion query sample counts may be updated even for
fragments or samples that would be discarded after fragment shader
execution due to per-fragment operations such as alpha-to-coverage or
alpha tests.
Do you have early fragment testing enabled? More info: Early Fragment Test

Texture lookup into rendered FBO is off by half a pixel

I have a scene that is rendered to texture via FBO and I am sampling it from a fragment shader, drawing regions of it using primitives rather than drawing a full-screen quad: I'm conserving resources by only generating the fragments I'll need.
To test this, I am issuing the exact same geometry as my texture-render, which means that the rasterization pattern produced should be exactly the same: When my fragment shader looks up its texture with the varying coordinate it was given it should match up perfectly with the other values it was given.
Here's how I'm giving my fragment shader the coordinates to auto-texture the geometry with my fullscreen texture:
// Vertex shader
uniform mat4 proj_modelview_mat;
out vec2 f_sceneCoord;
void main(void) {
gl_Position = proj_modelview_mat * vec4(in_pos,0.0,1.0);
f_sceneCoord = (gl_Position.xy + vec2(1,1)) * 0.5;
}
I'm working in 2D so I didn't concern myself with the perspective divide here. I just set the sceneCoord value using the clip-space position scaled back from [-1,1] to [0,1].
uniform sampler2D scene;
in vec2 f_sceneCoord;
//in vec4 gl_FragCoord;
in float f_alpha;
out vec4 out_fragColor;
void main (void) {
//vec4 color = texelFetch(scene,ivec2(gl_FragCoord.xy - vec2(0.5,0.5)),0);
vec4 color = texture(scene,f_sceneCoord);
if (color.a == f_alpha) {
out_fragColor = vec4(color.rgb,1);
} else
out_fragColor = vec4(1,0,0,1);
}
Notice I spit out a red fragment if my alpha's don't match up. The texture render sets the alpha for each rendered object to a specific index so I know what matches up with what.
Sorry I don't have a picture to show but it's very clear that my pixels are off by (0.5,0.5): I get a thin, one pixel red border around my objects, on their bottom and left sides, that pops in and out. It's quite "transient" looking. The giveaway is that it only shows up on the bottom and left sides of objects.
Notice I have a line commented out which uses texelFetch: This method works, and I no longer get my red fragments showing up. However I'd like to get this working right with texture and normalized texture coordinates because I think more hardware will support that. Perhaps the real question is, is it possible to get this right without sending in my viewport resolution via a uniform? There's gotta be a way to avoid that!
Update: I tried shifting the texture access by half a pixel, quarter of a pixel, one hundredth of a pixel, it all made it worse and produced a solid border of wrong values all around the edges: It seems like my gl_Position.xy+vec2(1,1))*0.5 trick sets the right values, but sampling is just off by just a little somehow. This is quite strange... See the red fragments? When objects are in motion they shimmer in and out ever so slightly. It means the alpha values I set aren't matching up perfectly on those pixels.
It's not critical for me to get pixel perfect accuracy for that alpha-index-check for my actual application but this behavior is just not what I expected.
Well, first consider dropping that f_sceneCoord varying and just using gl_FragCoord / screenSize as texture coordinate (you already have this in your example, but the -0.5 is rubbish), with screenSize being a uniform (maybe pre-divided). This should work almost exact, because by default gl_FragCoord is at the pixel center (meaning i+0.5) and OpenGL returns exact texel values when sampling the texture at the texel center ((i+0.5)/textureSize).
This may still introduce very very very slight deviations form exact texel values (if any) due to finite precision and such. But then again, you will likely want to use a filtering mode of GL_NEAREST for such one-to-one texture-to-screen mappings, anyway. Actually your exsiting f_sceneCoord approach may already work well and it's just those small rounding issues prevented by GL_NEAREST that create your artefacts. But then again, you still don't need that f_sceneCoord thing.
EDIT: Regarding the portability of texelFetch. That function was introduced with GLSL 1.30 (~SM4/GL3/DX10-hardware, ~GeForce 8), I think. But this version is already required by the new in/out syntax you're using (in contrast to the old varying/attribute syntax). So if you're not gonna change these, assuming texelFetch as given is absolutely no problem and might also be slightly faster than texture (which also requires GLSL 1.30, in contrast to the old texture2D), by circumventing filtering completely.
If you are working in perfect X,Y [0,1] with no rounding errors that's great... But sometimes - especially if working with polar coords, you might consider aligning your calculated coords to the texture 'grid'...
I use:
// align it to the nearest centered texel
curPt -= mod(curPt, (0.5 / vec2(imgW, imgH)));
works like a charm and I no longer get random rounding errors at the screen edges...