Multisampling, how to read back "unique" texels - glsl

I am looking at how i am going to implement antialiasing in a deferred lighting renderer. So three passes, a geometry pass, a lighting accumulation pass, and then a 2nd geometry pass for shading.
With normal multisampling, MSAA, the goal is to only multisample pixels on polygon edges. And for each triangle only write the result of the fragment shader to the subpixels which it covers. But ofcourse it is a known problem that this is a little problematic with deferred lighting.
The goal is to avoid evaluating all the subpixels in the 2nd and 3th pass, since that would basically be supersampling. If anybody knows another (better/possible) way of achieving that, I would very much like to hear it. But here is my idea:
If you can make the fragment shader in the first pass only write to the first subpixel the triangle covers. It allowes you to ignore unwritten texels in the lighting pass. And then finally in the 2nd geometry pass, somehow read the back only the first subpixel that the triangle matches, which is the one we wrote to originally and then did lighting for (and now write to all of the covered texels as normal so the result can be resolved). This way only the "unique" texels will be evaluated in the 2nd and 3th pass.
Can somebody say how this can be done in glsl (or confirm it is not possible)? I do not really see a reason why this would theoretically not be possible, but also do not see any way to do it in glsl.

For a moment, I'm going to ignore the goal of your question and instead focus on the specific request:
Can you write to the "first" sample only from a fragment shader?
Yes. What you have to do is have your fragment shader declare an input integer array using the decoration SampleMask (or, in GLSL parlance, use gl_SampleMaskIn, an array of signed integers). You would then iterate through this array bit-by-bit, to find the first bit that is set.
This bit is the "first sample". So you then declare an output integer array using the decoration SampleMask (in GLSL parlance, gl_SampleMask, an array of signed integers). You set the "first sample" bit to 1 and all others to zero.
Can you know what the "first sample" that was written is for a particular pixel in a multisample image?
Not unless you write that data to some other piece of memory, like an SSBO or something. The multisample image does not know which samples have been written to, so it has no way to know which is first.
And even if you could:
Your whole idea will not work.
Multisampling is just supersampling based on a single simplifying assumption. Namely, that it is OK to give all of the samples generated by a triangle the same per-fragment values (except for depth). In all other respects, it is just supersampling: adding more samples per-pixel.
If two triangles overlap, then your "first sample" approach is meaningless. Why? Because there are two "first samples": the first sample from triangle 1 and the first sample from triangle 2. And triangle 2 may have overwritten the "first sample" from triangle 1.
Even if there was no overwriting of a first sample, you still don't know how many samples each triangle contributed. If one triangle contributed the right 50% of the pixel's samples, and an overlapping triangle contributed the bottom 50% of the pixel's samples, then you should only get 25% of the first triangle's contribution. How do you know to do that with your method?

Related

How to write integers alongside pixels in the framebuffer, and then use the written integer to ignore the depth buffer

What I want to do
I want to have a set triangles bleed through, or rather ignore the depth buffer, for another set triangles, but only if they have the same number.
Problem (optional reading)
I do not know how to do this without introducing a ton of bubbles into the pipeline. Right now I have very high throughput because I can throw my geometry onto the GPU, tell it to render, and forget about it. However, if I have to keep toggling the state when drawing, I'm worried I'm going to tank my performance. Other people who have done what I've just said (doing a ton of draw calls and state changes) have much worse performance than me. This performance hit is also significantly worse on older hardware, where we are talking on order of 50 - 100+ times performance loss by doing it the state-change way.
Unfortunately this triangle bleeding scenario happens many thousands of times, so the state machine will be getting flooded with "draw triangles, depth off, draw triangles that bleed through, depth on, ...", except N times, where N can get large (N >= 1000).
A good way of imagining this is having a set of triangles T_i, and a set of triangles that bleed through B_i where B_i only bleeds through T_i, and i ranges from 0...1000+. Note that if we are drawing B_100, then it should only bleed through T_100, not T_99 or T_101.
My next thought is to draw all the triangles with their integer into one framebuffer (along with the integer), then draw the bleed through triangles into another framebuffer (also with the integer), and then merge these framebuffers together. I figure they will have the color, depth, and the integer, so I can hopefully merge them in the fragment shader.
Problem is, I have no idea how to write an integer alongside the out vec4 fragColor in the fragment shader.
Questions (and in short)
This leaves me with two questions:
How do I write an integer into a framebuffer? Do I need to write to 4 separate texture framebuffers? (like one color/depth framebuffer texture, another integer framebuffer texture, and then double this so I can merge the pairs of framebuffers together at some point?)
To make this more clear, the algorithm would look like
Render all the 'could be bled from triangles', described above as set T_i,
write colors and depth info into FB1, and write integers into FB2
Render all the 'bleeding' triangles, described above as set B_i,
write colors and depth into FB3, and write integers to FB4
Bind the textures for FB1, FB2, FB3, FB4
Render each pixel by sampling the RGBA, depth, and integers
from the appropriate texture and write those out into the
final framebuffer
I would need to access the color and depth from the textures in the shader. I would also need to access the integer from the other texture. Then I can do the comparison and choose which pixel to write to the default framebuffer.
Is this idea possible? I assume if (1) is, then the answer is yes. Maybe another question could be whether there's a better way. I tried thinking of doing this with the stencil buffer but had no luck
What you want is theoretically possible, but I can't speak as to its performance. You'll be reading and writing a whole lot of texels in a lot of textures for every program iteration.
Anyway to answer your questions:
A framebuffer can have multiple color attachments by using glFramebufferTexture2D with GL_COLOR_ATTACHMENT0, GL_COLOR_ATTACHMENT1, etc. Each texture can then have its own internal format, in your example you probably want a regular RGB texture for your color output, and a second 1-integer only texture.
Your depth buffer is complicated, because you don't want to let OpenGL handle it as normal. If you want to take over the depth buffer, you probably want to attach it as yet another, float texture that you can check against or not your screen-space fragments.
If you have doubts about your shader, remember that you can bind the any number of textures as input samplers you program in code, and each color bind gets its own output value (your shader runs per-texel, so you output one value at a time). Make sure the format of your output is correct, ie vec3/vec4 for the color buffer, int for your integer buffer and float for the float buffer.
And stencil buffers won't help you turn depth checking on or off in a single (possibly indirect) draw call. I can't visualize what your bleeding thing means, but it can probably help with that? Maybe? But definitely not conditional depth checking.

Any way to obtain percentage coverage of fragment (pixel) by primitive in hlsl/glsl fragment shader?

When the rasterizer invokes on primitive it split it into the collection of fragments (pixels). Next, the fragment shader called for every obtained pixel. Is there any way for me to have additional float parameter in my fragment shader, that will store information about how much the exact pixel is covered by the source primitive? This should have non-trivial value from 0-1 on triangle border pixels. Obviously it will be 1 on every "inside" triangle pixel.
I want rasterizer calculate and pass this value for me.
I thoight the "coservative rasterization" could help with that, but as I understand it uses for slightly different tasks (mostly for collision detection).
Also, as I understand there is no build-in method to do that. May be I can change the rasterized nature to do this? Is it possible?
When rendering to a multisampled framebuffer, you can look at the gl_SampleMaskIn[] bitmask array in the fragment shader to detect how many samples will be covered by the current fragment. This is about as close as you're going to get, and it's not great for what you want.
Obviously, it has the limitation of having the same granularity as the sample locations within a pixel. But the full mask also may be fewer than the number of samples in the framebuffer. If the renderer decides to generate multiple fragments per-pixel during multisample rasterization, the sample mask that any such fragments will only be for the samples that this particular fragment will write.
So if you have a 16-sample multisample framebuffer, the implementation may generate 4 fragments per-pixel, each covering a distinct set of 4 samples. So the sample bitmask for a fragment will never have more than 4 bits, even though you asked for 16x multisample rendering. And there's basically nothing you can do to detect if this is happening (outside of doing tests on specific hardware). All of this is implementation-defined.
Basically, what you want isn't really available; gl_SampleMask is the closest you can get, and how useful it is will be very implementation-dependent.
Maybe one could use GL_POLYGON_SMOOTH somehow for this, since as far as I understand it does exactly this, calculate the coverage of the current fragment and then modulates the fragment's alpha based on this

How can I apply a depth test to vertices (not fragments)?

TL;DR I'm computing a depth map in a fragment shader and then trying to use that map in a vertex shader to see if vertices are 'in view' or not and the vertices don't line up with the fragment texel coordinates. The imprecision causes rendering artifacts, and I'm seeking alternatives for filtering vertices based on depth.
Background. I am very loosely attempting to implement a scheme outlined in this paper (http://dash.harvard.edu/handle/1/4138746). The idea is to represent arbitrary virtual objects as lots of tangent discs. While they wanted to replace triangles in some graphics card of the future, I'm implementing this on conventional cards; my discs are just fans of triangles ("Discs") around center points ("Points").
This is targeting WebGL.
The strategy I intend to use, similar to what's done in the paper, is:
Render the Discs in a Depth-Only pass.
In a second (or more) pass, compute what's visible based solely on which Points are "visible" - ie their depth is <= the depth from the Depth-Only pass at that x and y.
I believe the authors of the paper used a gaussian blur on top of the equivalent of a GL_POINTS render applied to the Points (ie re-using the depth buffer from the DepthOnly pass, not clearing it) to actually render their object. It's hard to say: the process is unfortunately a one line comment, and I'm unsure of how to duplicate it in WebGL anyway (a naive gaussian blur will just blur in the background pixels that weren't touched by the GL_POINTS call).
Instead, I'm hoping to do something slightly different, by rerendering the discs in a second pass instead as cones (center of disc becomes apex of cone, think "close the umbrella") and effectively computing a voronoi diagram on the surface of the object (ala redbook http://www.glprogramming.com/red/chapter14.html#name19). The idea is that an output pixel is the color value of the first disc to reach it when growing radiuses from 0 -> their natural size.
The crux of the problem is that only discs whose centers pass the depth test in the first pass should be allowed to carry on (as cones) to the 2nd pass. Because what's true at the disc center applies to the whole disc/cone, I believe this requires evaluating a depth test at a vertex or object level, and not at a fragment level.
Since WebGL support for accessing depth buffers is still poor, in my first pass I am packing depth info into an RGBA Framebuffer in a fragment shader. I then intended to use this in the vertex shader of the second pass via a sampler2D; any disc center that was closer than the relative texture2D() lookup would be allowed on to the second pass; otherwise I would hack "discarding" the vertex (its alpha would be set to 0 or some flag set that would cause discard of fragments associated with the disc/cone or etc).
This actually kind of worked but it caused horrendous z-fighting between discs that were close together (very small perturbations wildly changed which discs were visible). I believe there is some floating point error between depth->rgba->depth. More importantly, though, the depth texture is being set by fragment texel coords, but I'm looking up vertices, which almost certainly don't line up exactly on top of relevant texel coordinates; so I get depth +/- noise, essentially, and the noise is the issue. Adding or subtracting .000001 or something isn't sufficient: you trade Type I errors for Type II. My render became more accurate when I switched from NEAREST to LINEAR for the depth texture interpolation, but it still wasn't good enough.
How else can I determine which disc's centers would be visible in a given render, so that I can do a second vertex/fragment (or more) pass focused on objects associated with those points? Or: is there a better way to go about this in general?

How can I deterministically detect the shader fragment location in its 2x2 pixel quad?

I've been trying to utilize the techniques in Eric Penner's "Shader Amortization using
Pixel Quad Message Passing" from GPU Pro 2, Chapter VI.2. The basic idea is that modern GPU's process fragment shaders in 2x2 fragment quads, and you can use ddx() and ddy() to get the value of some_var at all four fragments as long as the following hold:
Your GPU supports high-quality derivatives
You know which fragment you're processing (top-left, top-right, bottom-left, bottom-right)
This opens up a lot of opportunities for fragment shader optimization (like distributing texture fetches over a 2x2 pixel quad) that you'd need Compute Shaders to beat.
My problem is this:
I can't deterministically detect which fragment I'm processing. Ideally, each fragment block would start at even-numbered output pixel coords like (0, 0), (2, 0), ... (1024, 1024), ..., so you'd just need to check whether the output pixel x and y coords are even or odd to know which fragment you're currently processing. The method Penner uses in the book assumes this works...but it seems to be going wrong for me.
Unfortunately, my 2x2 fragment quads appear to be starting in nondeterministic places: I've seen them start at (even, even), (even, odd), and (odd, even). I can't remember if I've seen (odd, odd) or not, but anyway, the arrangement seems to depend on a myriad of factors I don't understand, including the output resolution and shader specifics. (I'm testing on an 8800 GTS, in case anyone's wondering.)
Does anyone know what might be causing this nondeterminism or have any documentation on it? I understand there's virtually no official standardization in this area, but I'm more interested in how things work in practice on modern desktop-level GPU's, and I'm hoping there's a way to get this technique to work. If no one knows how to reason about the even/odd start behavior, does anyone know any other way of determining the current fragment's relative location in its 2x2 quad?
Thanks :)
As it turns out, the premise of my question was mostly wrong:
The 2x2 fragment quads DO almost always start on even pixel numbers...as long as the output resolution is even-numbered.
If the output resolution is odd-numbered (a possibility with the underlying program I'm working with), things can get more complicated, for obvious reasons. I don't expect there's any uniformity here across drivers/GPU's/etc. either, but my current tests (which themselves may still be buggy) appear to demonstrate 2x2 pixel quads starting at an odd pixel along the dimension with odd resolution, at least when the odd dimension is horizontal.
All of this weirdness helped obscure my bigger issue: The code I used to detect the fragment's location in the pixel quad was buggy. I tested by setting the texture coordinates equal within a pixel quad (set to the pixel quad center)...or so I thought. However, I calculated the screen coordinates based on a full-screen quad where the uv mapping has the +v axis pointing downward. The screenspace origin starts at the bottom-left, because it's based on the top-right quadrant of Cartesian coordinates, and I accidentally forgot to invert the v-coordinate of the uv offset I used to find the pixel quad center. Many of my nondeterministic observations came from failing to check my assumptions while debugging and misinterpreting things as a result, particularly in combination with odd resolutions.
This was an embarrassing mistake I should have caught a lot sooner, but I figured I'd detail it as a warning to others to always double-check the direction of your vertical axis when you're dealing with opposite-facing coordinate frames. ;)
UPDATE:
I ran across a situation where 2x2 pixel quads started on even pixel numbers even when the resolution was odd. Thanks to the nondeterminism under odd resolutions, I had to work out another solution:
If you're deriving your screen pixel numbers from the uv coords of a fullscreen quad (for post-processing), the fragment location derived from this is only useful for arranging/placing shared samples between fragments, etc., not for the quad-pixel communication itself. You'll need to have screen pixel numbers with respect to the screenspace origin for that. You can derive these from vertex positions, or you can use ddx().x and ddy().y on the uv-based pixel numbers to find out their screen direction and mirror the fragment position in the appropriate direction from there.
Calculate the fragment location based on your screen pixel numbers (with respect to the true screenspace origin) and the assumption 2x2 pixel quads start on even pixels. (If you used uv-based pixel numbers, now is the time to mirror things.)
Do a ddx().x and ddy().y on the fragment location, and if they're negative in either direction, you know the pixel quad starts at an odd pixel number in that direction...so mirror in that direction.
If you calculate two fragment positions, one based on a uv origin and one based on a screen origin, use the uv-based one for reasoning about uv-based sample placement, and use the screen-based one for actually obtaining the values of a variable at neighboring fragments.
Profit.
I'll post a link to my working MIT-licensed code once I release it on Github, along with usage examples (the speedup is unfortunately not what I expected, but whatever ;)). I'm just waiting to get done with a larger shader I'll be uploading along with it.

The advantages of using a Z-Buffer versus prioritising pixels according to depth

This is a bit more of an academic question. Indeed I am preparing for a an exam and I'm just trying to truly understand this concept.
Allow me to explain somewhat the context. The issue at hand is hiding objects (or more specifically polygons) behind each other when drawing to the screen. A calculation needs to be done to decide which one gets drawn last and therefore to the forefront.
In a lecture I was at the other day my professor stated that prioritising pixels in terms of their depth value was computationally inefficient. He then gave us a short explanation of Z-buffers and how they test depth values of pixels and compare them with the the depth values of pixels in a buffer. How is this any different then 'prioritising pixels in terms of their depth'.
Thanks!
Deciding which polygon a fragment belongs to is computionally expensive, because that would require to find the closest polygon (and, having the entire geometry information available during pixel shading!) for every single pixel.
It is easy, almost trivial to sort entire objects, each consisting of many triangles (a polygon is no more than one or several triangles) according to their depth. This, however, is only a rough approximation, nearby objects will overlap and produce artefacts, so something needs to be done to make it pixel perfect.
This is where the z buffer comes in. If it turns out that a fragment's calculated depth is greater than what's already stored in the z-buffer, this means the fragment is "behind something", so it is discarded. Otherwise, the fragment is written to the color buffer and the depth value is written to the z-buffer. Of course that means that when 20 triangles are behind each other, then the same pixel will be shaded 19 times in vain. Alas, bad luck.
Modern graphics hardware addresses this by doing the z test before actually shading a pixel, according to the interpolated depth of the triangle's vertices (this optimization is obviously not possible if per-pixel depth is calculated).
Also, they employ conservative (sometimes hierarchical, sometimes just tiled) optimizations which discard entire groups of fragments quickly. For this, the z-buffer holds some additional (unknown to you) information, such as for example the maximum depth rendered to a 64x64 rectangular area. With this information, it can immediately discard any fragments in this screen area which are greater than that, without actually looking at the stored depths, and it can fully discard any fragments belonging to a triangle of which all vertices have a greater depth. Because, obviously, there is no way that any of it could be visible.
Those are implementation details, and very platform specific, though.
EDIT: Though this is probably obvious, I'm not sure if I made that point clear enough: When sorting to exploit z-culling, you would do the exact opposite of what you do with painter's algorithm. You want the closest things drawn first (roughly, does not have to be 100% precise), so instead of determining a pixel's final color in the sense of "last man standing", you have it in the sense of "first come, first served, and only one served".
First thing you need to understand is what your professor meant by 'prioritising pixels in terms of their depth'. My guess is that it's about storing all requested fragments for a given screen pixel and then producing the resulting color by choosing closest fragment. It's inefficient because Z buffer allows us to store only a single value instead of all of them.