MSAA and vertex interpolation cause out-of-range values - glsl

I am using GLSL with a vertex shader and a fragment shader.
The vertex shader outputs a highp float in the range of [0,1]
When it arrives in the fragment shader, I see values (at triangle edges) that exceed 1.1 no less!
This issue goes away if I...
Either Disable MSAA
Or disable the interpolation by using the GLSL interpolation qualifier flat.
How can a clamped 0-to-1 high precision float arrive in fragment shader as a value that is substantially larger than 1, if MSAA is enabled?
vertex shader code:
out highp float lightcontrib2;
...
lightcontrib2 = clamp( irrad, 0.0, 1.0 );
fragment shader code:
in highp float lightcontrib2;
...
if (lightcontrib2>1.1) { fragColor = vec4(1,0,1,1); return; }
And sure enough, with MSAA 4x, this is the image generated by OpenGL. (Observe the magneta coloured pixels in the centre of the window.)
I've ruled out Not-A-Number values.
GL_VERSION: 3.2.0 NVIDIA 450.51.06

How can a clamped 0-to-1 high precision float arrive in fragment shader as a value that is substantially larger than 1, if MSAA is enabled?
Multisampling at its core is a variation of supersampling: of taking multiple samples from a pixel-sized area of a primitive. Different locations within the space of that pixel-sized area are sampled to produce the resulting value.
When you're at the edge of a primitive however, some of the locations in that pixel-sized area are outside of the area that the primitive actually covers. In supersampling that's fine; you just don't use those samples.
However, multisampling is different. In multisampling, the depth samples are distinct from the fragment shader generated samples. That is, the system might execute the FS only once, but take 4 depth samples and test them against 4 samples in the depth buffer. Any samples that pass the depth test get their color values from the single FS invocation that was executed. If some of those 4 depth samples are outside of the primitive's area, that's fine; they don't count.
However, by divorcing the FS invocation values from the depth sampling, we now encounter an issue: exactly where did that single FS invocation execute within the pixel area?
And that's where we encounter the problem. If the FS invocation executes on a location that's outside of the area of the primitive, that normally gets tossed away. But if any depth samples are within the area of the primitive, then those depth samples still need to get color data. And the whole point of MSAA is to not execute the FS for each sample, so they may get their color data from an FS invocation executed on a different location.
Ideally, it would be from an FS invocation executed on a location within the primitive's area. But hardware can't guarantee that. Well, it can't guarantee it by default, at any rate. Not every algorithm has issues if an FS location happens to fall slightly outside of the primitive's area.
But some algorithms do have issues. This is why we have the centroid qualifier for fragment shader inputs. It ensures that a particular interpolated value will be generated within the area of the primitive.
As you might have guessed, this isn't the default because it's slower than non-centroid interpolation. So use it only when you need it.

Related

Using 'discard' in GLSL 4.1 fragment shader with multisampling

I'm attempting depth peeling with multisampling enabled, and having some issues with incorrect data ending up in my transparent layers. I use the following to check if a sample (originally a fragment) is valid for this pass:
float depth = texelFetch(depthMinima, ivec2(gl_FragCoord.xy), gl_SampleID).r;
if (gl_FragCoord.z <= depth)
{
discard;
}
Where depthMinima is defined as
uniform sampler2DMS depthMinima;
I have enabled GL_SAMPLE_SHADING which, if I understand correctly, should result in the fragment shader being called on a per-sample basis. If this isn't the case, is there a way I can get this to happen?
The result is that the first layer or two look right, but beneath that (and I'm doing 8 layers) I start getting junk values - mostly plain blue, sometimes values from previous layers.
This works fine for single-sampled buffers, but not for multi-sampled buffers. Does the discard keyword still discard the entire fragment?
I have enabled GL_SAMPLE_SHADING which, if I understand correctly, should result in the fragment shader being called on a per-sample basis.
It's not enough to only enable GL_SAMPLE_SHADING. You also need to set:
glMinSampleShading(1.0f)
A value of 1.0 indicates that each sample in the framebuffer should be indpendently shaded. A value of 0.0 effectively allows the GL to ignore sample rate shading. Any value between 0.0 and 1.0 allows the GL to shade only a subset of the total samples within each covered fragment. Which samples are shaded and the algorithm used to select that subset of the fragment's samples is implementation dependent.
– glMinSampleShading
In other words 1.0 tells it to shade all samples. 0.5 tells it to shade at least half the samples.
// Check the current value
GLfloat value;
glGetFloatv(GL_MIN_SAMPLE_SHADING_VALUE, &value);
If either GL_MULTISAMPLE or GL_SAMPLE_SHADING is disabled then sample shading has no effect.
There'll be multiple fragment shader invocations for each fragment, to which each sample is a subset of the fragment. In other words. Sample shading specifies the minimum number of samples to process for each fragment.
If GL_MIN_SAMPLE_SHADING_VALUE is set to 1.0 then there'll be issued a fragment shader invocation for each sample (within the primitive).
If its set to 0.5 then there'll be a shader invocation for every second sample.
max(ceil(MIN_SAMPLE_SHADING_VALUE * SAMPLES), 1)
Each being evaluated at their sample location (gl_SamplePosition).
With gl_SampleID being the index of the sample that is currently being processed.
Should discard work on a per-sample basis, or does it still only work per-fragment?
With or without sample shading discard still only terminate a single invocation of the shader.
Resources:
ARB_sample_shading
Fragment Shader
Per-Sample Processing
I faced a similar problem when using depth_peeling on a multi-sample buffer.
Some artifacts appears due to the depth_test error when using a multi_sample depth texture from the previous peel and the current fragment depth.
vec4 previous_peel_depth_tex = texelFetch(previous_peel_depth, coord, 0);
the third argument is the sample you want to use for your comparison which will give a different value from the fragment center. Like the author said you can use gl_SampleID
vec4 previous_peel_depth_tex = texelFetch(previous_peel_depth, ivec2(gl_FragCoord.xy), gl_SampleID);
This solved my problem but with a huge performance drop, if you have 4 samples you will run your fragment shader 4 times, if 4 have peels it means 4x4 calls. You don't need to set the opengl flags if atleast glEnable(GL_MULTISAMPLE); is on
Any static use of [gl_SampleID] in a fragment shader causes the entire
shader to be evaluated per-sample
I decided to use a different approach and to add a bias when doing the depth comparison
float previous_linearized = linearize_depth(previous_peel_depth_tex.r, near, far);
float current_linearized = linearize_depth(gl_FragCoord.z, near, far);
float bias_meter = 0.05;
bool belong_to_previous_peel = delta_depth < bias_meter;
This solve my problem but some artifacts might still appears and you need to adjust your bias in your eye_space units (meter, cm, ...)

How vertex and fragment shaders communicate in OpenGL?

I really do not understand how fragment shader works.
I know that
vertex shader runs once per vertices
fragment shader runs once per fragment
Since fragment shader does not work per vertex but per fragment how can it send data to the fragment shader? The amount of vertices and amount of fragments are not equal.
How can it decide which fragment belong to which vertex?
To make sense of this, you'll need to consider the whole render pipeline. The outputs of the vertex shader (besides the special output gl_Position) is passed along as "associated data" of the vertex to the next stages in the pipeline.
While the vertex shader works on a single vertex at a time, not caring about primitives at all, further stages of the pipeline do take the primitive type (and the vertex connectivity info) into account. That's what typically called "primitive assembly". Now, we still have the single vertices with the associated data produced by the VS, but we also know which vertices are grouped together to define a basic primitive like a point (1 vertex), a line (2 vertices) or a triangle (3 vertices).
During rasterization, fragments are generated for every pixel location in the output pixel raster which belongs to the primitive. In doing so, the associated data of the vertices defining the primitve can be interpolated across the whole primitve. In a line, this is rather simple: a linear interpolation is done. Let's call the endpoints A and B with each some associated output vector v, so that we have v_A and v_B. Across the line, we get the interpolated value for v as v(x)=(1-x) * v_A + x * v_B at each endpoint, where x is in the range of 0 (at point A) to 1 (at point B). For a triangle, barycentric interpolation between the data of all 3 vertices is used. So while there is no 1:1 mapping between vertices and fragments, the outputs of the VS still define the values of the corrseponding input of the FS, just not in a direct way, but indirectly by the interpolation across the primitive type used.
The formula I have given so far are a bit simplified. Actually, by default, a perspective correction is applied, effectively by modifying the formula in such a way that the distortion effects of the perspective are taken into account. This simply means that the interpolation should act as it is applied linearily in object space (before the distortion by the projection was applied). For example, if you have a perspective projection and some primitive which is not parallel to the image plane, going 1 pixel to the right in screen space does mean moving a variable distance on the real object, depending on the distance of the actual point to the camera plane.
You can disable the perspective correction by using the noperspective qualifier for the in/out variables in GLSL. Then, the linear/barycentric interpolation is used as I described it.
You can also use the flat qualifier, which will disable the interpolation entirely. In that case, the value of just one vertex (the so called "provoking vertex") is used for all fragments of the whole primitive. Integer data can never by automatically interpolated by the GL and has to be qualified as flat when sent to the fragment shader.
The answer is that they don't -- at least not directly. There's an additional thing called "the rasterizer" that sits between the vertex processor and the fragment processor in the pipeline. The rasterizer is responsible for collecting the vertexes that come out of the vertex shader, reassembling them into primitives (usually triangles), breaking up those triangles into "rasters" of (partially) coverer pixels, and sending these fragments to the fragment shader.
This is a (mostly) fixed-function piece of hardware that you don't program directly. There are some configuration tweaks you can do that affects what it treats as a primitive and what it produces as fragments, but for the most part its just there between the vertex shader and fragment shader doing its thing.

How does texture lookup in non fragment shaders works?

The following is an excerpt from GLSL spec:
"Texture lookup functions are available in all shading stages. However, automatic level of detail is computed only for fragment shaders. Other shaders operate as though the base level of detail were computed as zero."
So this is how I see it:
Vertex shader:
vec4 texel = texture(SamplerObj, texCoord);
// since this is vertex shader, sampling will always take place
// from 0th Mipmap level of the texture.
Fragment shader:
vec4 texel = texture(SamplerObj, texCoord);
// since this is fragment shader, sampling will take place
// from Nth Mipmap level of the texture, where N is decided
// based on the distance of object on which texture is applied from camera.
Is my understanding correct?
That sounds right. You can specify an explicit LOD by using textureLod() instead of texture() in the vertex shader.
I believe you could also make it use a higher LOD by setting the GL_TEXTURE_MIN_LOD parameter on the texture. If you call e.g.:
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_LOD, 2.0f);
while the texture is bound, it should use mipmap level 2 when you sample the texture in the vertex shader. I have never tried this, but this is my understanding of how the behavior is defined.
// since this is fragment shader, sampling will take place
// from Nth Mipmap level of the texture, where N is decided
// based on the distance of object on which texture is applied from camera.
I think the bit about the distance isn't correct. The mipmap level to use is determined using the derivation of the texture coordinates for the neighbouring pixels. The sampler hardware can determine this because the generated code for the fragment shader typically uses SIMD instructions and generates values for multiple pixels simultaneously. For example, on Intel hardware a single thread usually operates on a 4x4 grid of pixels. That means that whenever a message is sent to the sampler hardware it is given a set 16 of texture coordinates and 16 texels are expected in reply. The sampler hardware can determine the derivation by looking at the difference between those 16 texture coordinates. That is probably why further down in the GLSL spec it says:
Implicit derivatives are undefined within non-uniform control flow and for non-fragment-shader texture fetches.
Non-uniform control flow would mess up the implicit derivatives because potentially not all of the fragments being processed in the thread would be sampling at the same time.

Use shader on texture instead of screen

I've written a simple GL fragment shader which performs an RGB gamma adjustment on an image:
uniform sampler2D tex;
uniform vec3 gamma;
void main()
{
vec3 texel = texture2D(tex, gl_TexCoord[0].st).rgb;
texel = pow(texel, gamma);
gl_FragColor.rgb = texel;
}
The texture paints most of the screen and it's occurred to me that this is applying the adjustment per output pixel on the screen, instead of per input pixel on the texture. Although this doesn't change its appearance, this texture is small compared to the screen.
For efficiency, how can I make the shader process the texture pixels instead of the screen pixels? If it helps, I am changing/reloading this texture's data on every frame anyway, so I don't mind if the texture gets permanently altered.
and it's occurred to me that this is applying the adjustment per output pixel on the screen
Almost. Fragment shaders are executed per output fragment (hence the name). A fragment is a the smallest unit of rasterization, before it's written into a pixel. Every pixel that's covered by a piece of visible rendered geometry is turned into one or more fragments (yes, there may be even more fragments than covered pixels, for example when drawing to an antialiased framebuffer).
For efficiency,
Modern GPUs won't even "notice" the slightly reduced load. This is a kind of microoptimization, that's on the brink of non-measureability. My advice: Don' worry about it.
how can I make the shader process the texture pixels instead of the screen pixels?
You could preprocess the texture, by first rendering it through a texture sized, not antialiased framebuffer object to a intermediate texture. However if your change is nonlinear, and a gamma adjustment is exactly that, then you should not do this. You want to process images in a linear color space and apply nonlinear transformation only as late as possible.

Texture lookup into rendered FBO is off by half a pixel

I have a scene that is rendered to texture via FBO and I am sampling it from a fragment shader, drawing regions of it using primitives rather than drawing a full-screen quad: I'm conserving resources by only generating the fragments I'll need.
To test this, I am issuing the exact same geometry as my texture-render, which means that the rasterization pattern produced should be exactly the same: When my fragment shader looks up its texture with the varying coordinate it was given it should match up perfectly with the other values it was given.
Here's how I'm giving my fragment shader the coordinates to auto-texture the geometry with my fullscreen texture:
// Vertex shader
uniform mat4 proj_modelview_mat;
out vec2 f_sceneCoord;
void main(void) {
gl_Position = proj_modelview_mat * vec4(in_pos,0.0,1.0);
f_sceneCoord = (gl_Position.xy + vec2(1,1)) * 0.5;
}
I'm working in 2D so I didn't concern myself with the perspective divide here. I just set the sceneCoord value using the clip-space position scaled back from [-1,1] to [0,1].
uniform sampler2D scene;
in vec2 f_sceneCoord;
//in vec4 gl_FragCoord;
in float f_alpha;
out vec4 out_fragColor;
void main (void) {
//vec4 color = texelFetch(scene,ivec2(gl_FragCoord.xy - vec2(0.5,0.5)),0);
vec4 color = texture(scene,f_sceneCoord);
if (color.a == f_alpha) {
out_fragColor = vec4(color.rgb,1);
} else
out_fragColor = vec4(1,0,0,1);
}
Notice I spit out a red fragment if my alpha's don't match up. The texture render sets the alpha for each rendered object to a specific index so I know what matches up with what.
Sorry I don't have a picture to show but it's very clear that my pixels are off by (0.5,0.5): I get a thin, one pixel red border around my objects, on their bottom and left sides, that pops in and out. It's quite "transient" looking. The giveaway is that it only shows up on the bottom and left sides of objects.
Notice I have a line commented out which uses texelFetch: This method works, and I no longer get my red fragments showing up. However I'd like to get this working right with texture and normalized texture coordinates because I think more hardware will support that. Perhaps the real question is, is it possible to get this right without sending in my viewport resolution via a uniform? There's gotta be a way to avoid that!
Update: I tried shifting the texture access by half a pixel, quarter of a pixel, one hundredth of a pixel, it all made it worse and produced a solid border of wrong values all around the edges: It seems like my gl_Position.xy+vec2(1,1))*0.5 trick sets the right values, but sampling is just off by just a little somehow. This is quite strange... See the red fragments? When objects are in motion they shimmer in and out ever so slightly. It means the alpha values I set aren't matching up perfectly on those pixels.
It's not critical for me to get pixel perfect accuracy for that alpha-index-check for my actual application but this behavior is just not what I expected.
Well, first consider dropping that f_sceneCoord varying and just using gl_FragCoord / screenSize as texture coordinate (you already have this in your example, but the -0.5 is rubbish), with screenSize being a uniform (maybe pre-divided). This should work almost exact, because by default gl_FragCoord is at the pixel center (meaning i+0.5) and OpenGL returns exact texel values when sampling the texture at the texel center ((i+0.5)/textureSize).
This may still introduce very very very slight deviations form exact texel values (if any) due to finite precision and such. But then again, you will likely want to use a filtering mode of GL_NEAREST for such one-to-one texture-to-screen mappings, anyway. Actually your exsiting f_sceneCoord approach may already work well and it's just those small rounding issues prevented by GL_NEAREST that create your artefacts. But then again, you still don't need that f_sceneCoord thing.
EDIT: Regarding the portability of texelFetch. That function was introduced with GLSL 1.30 (~SM4/GL3/DX10-hardware, ~GeForce 8), I think. But this version is already required by the new in/out syntax you're using (in contrast to the old varying/attribute syntax). So if you're not gonna change these, assuming texelFetch as given is absolutely no problem and might also be slightly faster than texture (which also requires GLSL 1.30, in contrast to the old texture2D), by circumventing filtering completely.
If you are working in perfect X,Y [0,1] with no rounding errors that's great... But sometimes - especially if working with polar coords, you might consider aligning your calculated coords to the texture 'grid'...
I use:
// align it to the nearest centered texel
curPt -= mod(curPt, (0.5 / vec2(imgW, imgH)));
works like a charm and I no longer get random rounding errors at the screen edges...