Is the stencil buffer still relevant in modern OpenGL? - opengl

Me and a friend have been having an ongoing argument about the stencil buffer. In short I haven't been able to find a situation where the stencil buffer would provide any advantage over the programmable pipeline tools in OpenGL 3.2+. Are there any uses to the stencil buffer in modern OpenGL?
[EDIT]
Thanks everyone for all the inputs on the subject.

It is more useful than ever since you can sample stencil index textures from fragment shaders. It should not even be argued that the stencil buffer is not part of the programmable pipeline.
The depth buffer is used for simple pass/fail fragment rejection, which the stencil buffer can also do as suggested in comments. However, the stencil buffer can also accumulate information about test results over multiple passes. All sorts of logic and counting applications exist such as measuring a scene's depth complexity, constructive solid geometry, etc.

To add a recent example to Andon's answer, GTA V uses the stencil buffer kinda like an ID buffer to mark the player character, cars, vegetation etc.
It subsequently uses the stencil buffer to e.g. apply subsurface scattering only to the character or exclude him from motion blur.
See the GTA V Graphics Study (highly recommended, it's a great read!)
Edit: sure you can do this in software. But you can do rasterization or tessellation in software just as well... In the end it's about performance I guess. With depth24stencil8 you have a nice hardware-supported format, and the stencil test is most likely faster then doing discards in the fragment shader.

Just to provide one other use case, shadow volumes (aka "stencil shadows") are still very relevant: https://en.wikipedia.org/wiki/Shadow_volume
They're useful for indoor scenes where shadows are supposed to be pixel perfect, and you're less likely to have alpha-tested foliage messing up the extruded shadow volumes.
It's true that shadow maps are more common, but I suspect that stencil shadows will have a comeback once the brain dead Createive/3DLabs patent expires on the zfail method.

Related

Is it possible to depth test against a depth texture I am also sampling, in the same draw call?

Context:
I am using a deferred rendering setup, where in the first stage I have two FBO's: one is the GBuffer, for storing the normals, albedo, and material information for all visible fragments. This FBO has a 32-bit depth texture. This gets drawn into in a geometry pass, before any lighting is calculated.
The second FBO is color-only, and starts off black, but accumulates lighting over several passes, from lighting shaders that sample from the GBuffer and write to the color-only buffer using additive blending.
The problem is, I would really like to utilize early depth testing in order to have my lighting ONLY calculate for fragments that contain actual geometry (Not just sky). The best way I can think of to do this is to use depth testing to fail any pixels that have a depth of one in the case of sunlight, or to fail any pixels that lie behind the sphere of influence for point lights. However, I don't think I can bind this depth texture to my color FBO, since I also sample from it inside the lighting shader to calculate the fragments position in world-space.
So my question is: Is there a way to use the same depth texture for both the early depth test, and for sampling inside the shader? Or if not, is there some other (reasonably performant) way of rejecting pixels that don't have geometry in them? I will not be writing to this depth texture at all in my lighting pass.
I only have to target modern graphics hardware on PC's (So I can use any common extensions, or openGL 4.6 features).
There are rules in OpenGL about reading from data in a shader that's also being updated due to a framebuffer operation. Those rules used to be quite strict. Indeed, pre-GL 4.4, the rules were so strict that what you're trying to do was actually undefined behavior. That is, if an image from a texture was attached to the rendering FBO, and you took a sample from that texture in a way such that it was at all possible to be reading from the attached image, you got undefined behavior. Never mind if your write mask meant that no writing happened; it was UB.
Fortunately, it's well-defined now. You only get UB if you're doing an actual write, not merely because you have an image attached to the FBO. And by "now," I mean basically any hardware made in the last 10 years. While ARB_texture_barrier and GL 4.5 are fairly recent, their predecessor NV_texture_barrier is actually quite old. And despite being an NVIDIA extension by name, it was so widely implemented that it is even available on MacOS implementations.

Writing to depth buffer from opengl compute shader

Generally on modern desktop OpenGL hardware what is the best way to fill a depth buffer from a compute shader and then use that depth buffer for graphics pipeline rendering with triangles etc?
Specifically I am wondering about concerns regards HiZ. Also I wonder if it's better to do compute shader modifications to the depth buffer before or after the graphics rendering?
If the compute shader is run after the graphics rendering I assume the depth buffer will typically be decompressed behind the scenes. But I worry done the other way around the depth buffer may be in a decompressed/non-optimal state for the graphics pipeline?
As far as i know, you cannot bind textures with any of the depth formats as images, and thus cannot write to depth format textures in compute shaders. See glBindImageTexture documentation, it lists the formats that your texture format must be compatible to. Depth formats are not among them and the specification says the depth formats are not compatible to the normal formats.
Texture copying functions have the same compatibility restrictions, so you can't even e.g. write to a normal texture in the compute shader and then copy to a depth texture. glCopyImageSubData does not explicitly have that restriction but i haven't tried it and it's not part of the core profile anymore.
What might work is writing to a normal texture, then rendering a fullscreen triangle and setting gl_FragDepth to values read from the texture, but that's an additional fullscreen pass.
I don't quite understand your second question - if your compute shader stuff modifies the depth buffer, the result will most likely be different depending on whether you do it before or after regular rendering because different parts will be visible or occluded.
But maybe that question is moot since it seems you cannot manually write into depth buffers at all - which might also answer your third question - by not writing into depth buffers you cannot mess with the compression of it :)
Please note that i'm no expert in this, i had a similar problem and looked at the docs/spec myself, so this all might be wrong :) Please let me know if you manage to write to depth buffers with compute shaders!

In OpenGL, (how) can I depth test between two depth buffers?

Say I have two depth buffers ("DA" and "DB"). I'd like to attach DA to a framebuffer, and render to it in the normal/"traditional" way (every fragment depth tests against GL_LESS, if it passes, it is drawn and it's depth is written to the depth buffer).
Then, I'd like to attach DB- here I'd want to reference/use DB in the traditional way, but also depth test against DA with GL_GREATER. If the fragment passes both tests, I'd still just like to write depth to DB.
What this would accomplish is anything drawn in the second pass would only draw if it is behind the contents of the first pass, but in front of any other contents in the second pass.
Does OpenGL 2.1 (or OpenGL ES 2) offer any of this functionality? What could I do to work around it not formally offering it?
If it doesn't offer it, here's an idea for how I'd do it, that I would like to be sanity checked...
You attach a render buffer that acts as a depth buffer, then manually populate it with depths. And manually perform the "if depth greater than texel at DA" test (when failing, discard the fragment). I'm curious what problems I'd run into? Is there an easy way to emulate the depth precision of formally specified depth buffers? Would the performance be awful? Any other concerns?
I don't believe there is a way to have multiple depth buffers. I thought there might at least be vendor specific extensions, but a search through the extension registry came up empty even for that.
What you already started considering in the last paragraph looks like the only reasonably straightforward option that comes to mind. You use a depth texture as your second depth buffer, and discard fragments based on their comparison with the value from the depth texture in your fragment shader.
You don't have to manually populate the depth texture. If the depth values are generated as the result of an earlier rendering pass, you can copy them from the primary depth buffer to a depth texture using glCopyTexImage2D(). This function is available in OpenGL 2.1. In newer versions, you can also use glBlitFramebuffer() to perform the copy.
On your concerns:
Precision is a very valid concern. I had some fairly severe precision issues when I played with something similar in the past. They can probably be solved, but you definitely need to watch out for it, and carefully think about what your precision requirements are, and how you can fudge the values if necessary.
I would certainly expect performance to suffer somewhat. The fixed function depth buffer has the advantage of heavy optimizations, like early depth testing, that you won't get with your "home made" depth buffer. I wouldn't necessarily expect it to be horrible, though. And if this is what you need, it's worth trying.
I think this is pretty much a no go in ES 2.0, at least without relying on extensions. You could potentially consider writing depth values into color textures, but I think it would get very ugly and inefficient. ES 2.0 is great for its simplicity. But for what you're planning to do, you're exceeding its intended feature set. ES 3.0 adds depth textures, and some other features that could come into play.

openGl GPU optimization when rendering stacks of primitives

If I render 15 fully opaque quads with the same size on top of each other, depth test disabled, is the GPU hardware /software cleaver enough just to process the topmost quad and discard the other vertices/fragments? Or would one benefit from using the stencil buffer to achieve the same effect?
Most GPUs will overdraw in this scenario which will be very bad for performance if your quads are large. Rather than use the stencil buffer, the best way to optimise is probably to enable depth testing, assign appropriate depth values and render your quads front to back.
However, under certain conditions (e.g. no blending) tile based deferred rendering (TBDR) GPUs common in many mobile devices (particularly PowerVR devices used by all iOS devices and many Android devices) will do a process known as hidden surface removal (HSR) which will optimize this case and avoid rendering the pixels that will be obscured.
Definitely, it will generate fragments for all of the opaque quads. Also, if you disable the depth test, you may see the back surface on the screen. Because the depth testing is disable and whoever is rendered last will draw the screen.
Even, if you use the stencil buffer the fragments are still generated for the quads, pass through stencil and depth tests.

Reading FBO depth attachment whilst depth testing

I'm working with a deferred rendering engine using OpenGL 3.3. I have an FBO set up as my G-buffer with a texture attached as the depth component.
In my lighting pass I need to depth test (with writes disabled) to cull unnecessary pixels. However, I'm currently writing code which will reconstruct world position coordinates, this will also need access to the depth buffer.
Is it legal in Opengl 3.3 to bind a depth attachment to a texture unit and sample it whilst also using it for depth testing in the same pass?
I can't find anything specifically about it in the docs but my gut tells me that using the same buffer/texture for two different purposes will produce undefined behaviour. Does anybody know for sure? I have a limited set of hardware to test on and don't want to make false assumptions about what works.
At the very least this creates a situation where memory coherency cannot be guaranteed (coherency is something you sort of assume at all stages in the traditional pipeline pre-GL4 and have no standardized control over either).
The driver just might cache this memory in an undesirable way since this behavior is undefined. You would like to think that an appropriate combination of writemask and sampling would be a strong hint not to do that, but that is all up to whoever designed the driver and your results will tend to vary by hardware vendor, platform and hardware generation.
This scenario is a use-case for things like NV's texture barrier extension, but that is vendor specific and still does not tackle the problem entirely. If you want to do this sort of thing portably, your best bet is to promote the engine to GL4 and use standardized features for early fragment tests, barriers, etc.
Does your composite pass really need a depth buffer in the first place though? It sounds like you want to re-construct per-pixel position during lighting from the stored depth buffer. That's entirely possible in a framebuffer with no depth attachment at all.
Your G-Buffers will already be filled at this point, and after that you no longer need to do any fragment tests. The one fragment that passed all previous tests is what's eventually written to the G-Buffer and there's no reason to apply any additional tests to it when it comes time to do lighting.