Any way to obtain percentage coverage of fragment (pixel) by primitive in hlsl/glsl fragment shader? - opengl

When the rasterizer invokes on primitive it split it into the collection of fragments (pixels). Next, the fragment shader called for every obtained pixel. Is there any way for me to have additional float parameter in my fragment shader, that will store information about how much the exact pixel is covered by the source primitive? This should have non-trivial value from 0-1 on triangle border pixels. Obviously it will be 1 on every "inside" triangle pixel.
I want rasterizer calculate and pass this value for me.
I thoight the "coservative rasterization" could help with that, but as I understand it uses for slightly different tasks (mostly for collision detection).
Also, as I understand there is no build-in method to do that. May be I can change the rasterized nature to do this? Is it possible?

When rendering to a multisampled framebuffer, you can look at the gl_SampleMaskIn[] bitmask array in the fragment shader to detect how many samples will be covered by the current fragment. This is about as close as you're going to get, and it's not great for what you want.
Obviously, it has the limitation of having the same granularity as the sample locations within a pixel. But the full mask also may be fewer than the number of samples in the framebuffer. If the renderer decides to generate multiple fragments per-pixel during multisample rasterization, the sample mask that any such fragments will only be for the samples that this particular fragment will write.
So if you have a 16-sample multisample framebuffer, the implementation may generate 4 fragments per-pixel, each covering a distinct set of 4 samples. So the sample bitmask for a fragment will never have more than 4 bits, even though you asked for 16x multisample rendering. And there's basically nothing you can do to detect if this is happening (outside of doing tests on specific hardware). All of this is implementation-defined.
Basically, what you want isn't really available; gl_SampleMask is the closest you can get, and how useful it is will be very implementation-dependent.

Maybe one could use GL_POLYGON_SMOOTH somehow for this, since as far as I understand it does exactly this, calculate the coverage of the current fragment and then modulates the fragment's alpha based on this

Related

Multisampling in pipeline

In multisampling, during rasterization there are more than one sample points in each pixel and sample points are decided which constitutes the primitive.
which attributes are same for each sample in a pixel? I read somewhere that color and texture values are same but depth and stensil values for samples in a pixel are different. But as fragment shader is executed for each sample points then they should be different.
Also, When does the multiple samples are resolved in pipeline, after fragment shader? And do they linearly average out?
First you have to realize how multisampling works and why it was created.
I am going to approach this from the perspective of anti-aliasing, because that was the primary use-case for multisampling up until multisample textures were introduced in GL3.
When something is multisampled, this means that each sample point contains multiple samples. These samples may be identical to one another if a primitive has relatively uniform characteristics (e.g. has the same depth everywhere) and smart GL/hardware implementations are capable of identifying such situations and reducing memory bandwidth by reading/writing shared samples intelligently (similar to color/depth buffer compression). However, the cost in terms of required storage for a 4x MSAA framebuffer is the same as 4x SSAA because GL has to accommodate the worst-case scenario, where each of the 4 samples is unique.
Which attributes are same for each sample in a pixel?
Each fragment may cover multiple sample points for attributes such as color, texture coordinates, etc. Rather than invoking the fragment shader 4x as frequently to achieve 4x anti-aliasing, a trick was devised where each attribute would be sampled at the fragment center (this is the default) and then a single output written to each of the covered sample locations. The default behavior is somewhat lacking in the situation where the center of a fragment is not part of the actual coverage area - for this, centroid sampling was introduced... vertex attributes will be interpolated at the center of a primitive's coverage area within a fragment rather than the center of the fragment itself.
Later, when it comes time to write a color to the framebuffer, all of these samples need to be averaged to produce a single pixel; we call this multisample resolve. This works well for some things, but it does not address issues of aliasing that occurs during fragment shading itself.
Texturing occurs during the execution of a fragment shader, and this means that the sample frequency for texturing remains the same, so MSAA generally does not help with texture aliasing. Thus, while supersample anti-aliasing improves both aliasing at geometric edges and texture / shader aliasing (things like specular highlights), multisampling generally only reduces "jaggies."
I read somewhere that color and texture values are same but depth and stensil values for samples in a pixel are different.
In short, anything that is computed in the fragment shader will be the same for all covered samples. Anything that can be determined before fragment shading (e.g. depth) may vary.
Fragment tests such as depth/stencil are evaluated for each sub-sample. But multisampled depth buffers carry some restrictions. Up until D3D 10.1, hardware was not required to support multisampled depth textures so you could not sample multisampled depth buffers in a fragment shader.
But as fragment shader is executed for each sample points then they should be different.
There is a feature called sample shading, which can force an implementation of MSAA to work more like SSAA by improving the ratio between fragments shaded and samples generated during rasterization. But by default, the behavior you are describing is not multisampling.
When does the multiple samples are resolved in pipeline, after fragment shader? And do they linearly average out?
Multisample resolution occurs after fragment shading, anytime you have to write the contents of a multisampled buffer into a singlesampled buffer. This includes things like glBlitFramebuffer (...). You can also manually implement multisample resolution yourself in the fragment shader, if you use multisampled textures.
Finally, regarding the process used for multisample resolution, that is implementation-specific as is the sample layout. If you ever open your display driver's control panel and look at the myriad of anti-aliasing options available you will see multiple options for sample layout and MSAA resolve algorithm.
I would highly suggest you take a look at this article. While it is related to D3D10+ and not OpenGL, the general rules apply to both APIs (D3D9 has slightly different rules) and the quality of the diagrams is phenomenal.
In particular, pay special attention to the section on MSAA rasterization rules for triangles, which states:
For a triangle, a coverage test is performed for each sample location (not for a pixel center). If more than one sample location is covered, a pixel shader runs once with attributes interpolated at the pixel center. The result is stored (replicated) for each covered sample location in the pixel that passes the depth/stencil test.

How does multisample really work?

I am very interested in understanding how multisampling works. I have found a large literature on how to enable or use it, but very little information concerning what it really does in order to achieve an antialiased rendering. What I have found, in many places, is conflicting information that only confused me more.
Please note that I know how to enable and use multisampling (I actually already use it), what I don't know is what kind of data really gets into the multisampled renderbuffers/textures, and how this data is used in the rendering pipeline.
I can understand very well how supersampling works, but multisampling still has some obscure areas that I would like to understand.
here is what the specs say: (OpenGL 4.2)
Pixel sample values, including color, depth, and stencil values, are stored in this
buffer (the multisample buffer). Samples contain separate color values for each fragment color.
...
During multisample rendering the contents of a pixel fragment are changed
in two ways. First, each fragment includes a coverage value with SAMPLES bits.
...
Second, each fragment includes SAMPLES depth values and sets of associated
data, instead of the single depth value and set of associated data that is maintained
in single-sample rendering mode.
So, each sample contains a distinct color, coverage bit, and depth. What's the difference from a normal supersampling? Seems like a "weighted" supersampling to me, where each final pixel value is determined by the coverage value of its samples instead of a simple average, but I am very unsure about this. And what about texture coordinates at sample level?
If I store, say, normals in a RGBF multisampled texture, will I read them back "antialiased" (that is, approaching 0) on the edges of a polygon?
A fragment shader is called once per fragment, unless it uses gl_SampleID, glSampleIn or has a 'sample' storage qualifier. How can a fragment shader be invoked once per fragment and get an antialiased rendering?
OpenGL on Silicon Graphics Systems:
http://www-f9.ijs.si/~matevz/docs/007-2392-003/sgi_html/ch09.html#LE68984-PARENT
mentions: When you use multisampling and read back color, you get the resolved color value (that is, the average of the samples). When you read back stencil or depth, you typically get back a single sample value rather than the average. This sample value is typically the one closest to the center of the pixel.
And there's this technical spec (1994) from the OpenGL site. It explains in full detail what is done If MULTISAMPLE_SGIS is enabled: http://opengl.org/registry/specs/SGIS/multisample.txt
See also this related question: How are depth values resolved in OpenGL textures when multisampling?
And the answers to this question, where GL_MULTISAMPLE_ARB is recommended: where is GL_MULTISAMPLE defined?. The specs for GL_MULTISAMPLE_ARB (2002) are here: http://www.opengl.org/registry/specs/ARB/multisample.txt

Defining a custom Blend Function (OpenGL)

For implementing a physically accurate motion blur by actually rendering at intermediate locations, it seems that to do this correctly I need a special blending function. Additive blending would only work on a black background, and the standard "transparency" function (GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA) may look okay for small numbers of samples, but it is physically inaccurate because samples rendered at the end will contribute more to the resulting color.
The function I need has to produce a color which is the weighted average of the original and destination colors, depending on the number of samples covering a fragment. However I can generalize this to better account for rendering differences between samples: Suppose I am to render a blurred object n times. Treating color as a 3-vector, Let D be the color DEST - SRC. I want each render to add D/n to the source color.
Can this be done using the fixed-function pipeline? The glBlendFunc reference is rather cryptic, at least to me. It seems like this can be done either trivially or is impossible. It seems like I would want to set alpha to 1/n. For the behavior I just described, am I in need of a GL_DEST_MINUS_SRC_COLOR option?
I also have a related question: At which stage does this blending operation occur? Before or after the fragment shader program? Would i be able to access the source and destination colors in a fragment shader?
I know that one way to accomplish what I want is by using an accumulation buffer. I do not want to do this because it is a waste of memory and fillrate.
The solution I ended up using to implement my effect is a combination of additive blending and a render target that I access as a texture from the fragment shader.

GLSL Interlacing

I would like to efficiently render in an interlaced mode using GLSL.
I can alrdy do this like:
vec4 background = texture2D(plane[5], gl_TexCoord[1].st);
if(is_even_row(gl_TexCoord[1].t))
{
vec4 foreground = get_my_color();
gl_FragColor = vec4(fore.rgb * foreground .a + background .rgb * (1.0-foreground .a), background .a + fore.a);
}
else
gl_FragColor = background;
However, as far as I have understood the nature of branching in GLSL is that both branches will actually be executed, since "even_row" is considered as run-time value.
Is there any trick I can use here in order to avoid unnecessarily calling the rather heavy function "get_color"? The behavior of is_even_row is quite static.
Or is there some other way to do this?
NOTE: glPolygonStipple will not work since I have custom blend functions in my GLSL code.
(comment to answer, as requested)
The problem with interlacing is that GPUs run shaders in 2x2 clusters, which means that you gain nothing from interlacing (a good software implementation might possibly only execute the actual pixels that are needed, unless you ask for partial derivatives).
At best, interlacing runs at the same speed, at worst it runs slower because of the extra work for the interlacing. Some years ago, there was an article in ShaderX4, which suggested interlaced rendering. I tried that method on half a dozen graphics cards (3 generations of hardware of each the "two big" manufacturers), and it ran slower (sometimes slightly, sometimes up to 50%) in every case.
What you could do is do all the expensive rendering in 1/2 the vertical resolution, this will reduce the pixel shader work (and texture bandwidth) by 1/2. You can then upscale the texture (GL_NEAREST), and discard every other line.
The stencil test can be used to discard pixels before the pixel shader is executed. Of course the hardware still runs shaders in 2x2 groups, so in this pass you do not gain anything. However, that does not matter if it's just the very last pass, which is a trivial shader writing out a single fetched texel. The more costly composition shaders (the ones that matter!) run at half resolution.
You find a detailled description including code here: fake dynamic branching. This demo avoids lighting pixels by discarding those that are outside the light's range using the stencil.
Another way which does not need the stencil buffer is to use "explicit Z culling". This may in fact be even easier and faster.
For this, clear Z, disable color writes (glColorMask), and draw a fullscreen quad whose vertices have some "close" Z coordinate, and have the shader kill fragments in every odd line (or use the deprecated alpha test if you want, or whatever). gl_FragCoord.y is a very simple way of knowing which line to kill, using a small texture that wraps around would be another (if you must use GLSL 1.0).
Now draw another fullscreen quad with "far away" Z values in the vertices (and with depth test, of course). Simply fetch your half-res texture (GL_NEAREST filtering), and write it out. Since the depth buffer has a value that is "closer" in every other row, it will discard those pixels.
How does glPolygonStipple compare to this? Polygon stipple is a deprecated feature, because it is not directly supported by the hardware and has to be emulated by the driver either by "secretly" rewriting the shader to include extra logic or by falling back to software.
This is probably not the right way to do interlacing. If you really need to achieve this effect, don't do it in the fragment shader like this. Instead, here is what you could do:
Initialize a full screen 1-bit stencil buffer, where each bit stores the parity of its corresponding row.
Render your scene like usual to a temporary FBO with 1/2 the vertical resoltion.
Turn on the stencil test, and switch the stencil func depending on which set of scan lines you are going to draw.
Blit a rescaled version of the aforementioned fbo (containing the contents of your frame) to the stencil buffer.
Note that you could skip the offscreen FBO step and draw directly using the stencil buffer, but this would waste some fill rate testing those pixels that are just going to clipped anyway. If your program is shader heavy, the solution I just mentioned would be optimal. If it is not, you may end up being marginally better off drawing directly to the screen.

Quick question about glColorMask and its work

I want to render depth buffer to do some nice shadow mapping. My drawing code though, consists of many shader switches. If I set glColorMask(0,0,0,0) and leave all shader programs, textures and others as they are, and just render the depth buffer, will it be 'OK' ? I mean, if glColorMask disables the "write of color components", does it mean that per-fragment shading IS NOT going to be performed?
For rendering a shadow map, you will normally want to bind a depth texture (preferrably square and power of two, because stereo drivers take this as hint!) to a FBO and use exactly one shader (as simple as possible) for everything. You do not want to attach a color buffer, because you are not interested in color at all, and it puts more unnecessary pressure on ROP (plus, some hardware can render double speed or more with depth-only). You do not want to switch between many shaders.
Depending on whether you do "classic" shadow mapping, or something more sophisticated such as exponential shadow maps, the shader that you will use is either as simple as it can be (constant color, and no depth write), or performs some (moderately complex) calculations on depth, but you normally do not want to perform any colour calculations, since that will mean needless calculations which will not be visible in any way.
No, the fragment operations will be performed anyway, but their result will be squashed by your zero color mask.
If you don't want some fragment operations to be performed - use the proper shader program which has an empty fragment shader attached and set the draw buffer to GL_NONE.
There is another way to disable fragment processing - to enable GL_RASTERIZER_DISCARD, but you won't get even the depth values in this case :)
No, the shader programs execute independent of the fixed function pipeline. Setting the glColorMask will have no effect on the shader programs.