OpenGL stencil buffer OR operation? - opengl

I'm not sure if this is possible to do, but it's worth a shot. I'm using the stencil buffer to reduce overdraw for light volumes in a deferred renderer using this algorithm (when the camera is outside the volume):
Using a cheap shader, draw back faces with depth testing set to LEQUAL, marking them in the stencil buffer.
Using the expensive lighting shader, draw front faces with depth testing set to GEQUAL.
This will cause only pixels within the light volume to be shaded. The problem with this comes when drawing multiple lights. First, since state changes are expensive, it's probably not the best idea to repeatedly switch between the cheap and expensive shader for each light. Ideally I'd like to take advantage of all 8 bits of the stencil buffer by rendering 8 light volumes with the cheap shader followed by 8 light volumes with the expensive shader. However, issues arise when lights overlap, as there's no way to tell which pixels belong to which lights.
The solution that comes to mind would be to use 1 bit in the stencil buffer per light. So, for light n, mark the nth bit in the stencil buffer in the cheap pass, then only render pixels with that bit on during the expensive pass.
I haven't used the stencil buffer before, but from what I'm reading this doesn't seem possible. For this to work I'd have to set the stencil buffer using bitwise OR and the stencil function would have to be bitwise AND. However, the only operations on the stencil buffer I can see are: KEEP, ZERO, REPLACE, INCR, DECR, and INVERT, and the only functions are: NEVER, ALWAYS, LESS, EQUAL, LEQUAL, GEQUAL, GREATER, and NOTEQUAL.
Is there any way to somehow get this OR and ANDing behavior using the stencil buffer? And if not, is there an alternative approach to efficiently rendering light volumes?

To get ORing update behavior, use glStencilMask combined with GL_REPLACE. To get ANDing test behavior, use the glStencilFunc mask parameter combined with GL_EQUAL. In both cases, you'll want the ref parameter to glStencilFunc to be 0xff.

Related

How to write integers alongside pixels in the framebuffer, and then use the written integer to ignore the depth buffer

What I want to do
I want to have a set triangles bleed through, or rather ignore the depth buffer, for another set triangles, but only if they have the same number.
Problem (optional reading)
I do not know how to do this without introducing a ton of bubbles into the pipeline. Right now I have very high throughput because I can throw my geometry onto the GPU, tell it to render, and forget about it. However, if I have to keep toggling the state when drawing, I'm worried I'm going to tank my performance. Other people who have done what I've just said (doing a ton of draw calls and state changes) have much worse performance than me. This performance hit is also significantly worse on older hardware, where we are talking on order of 50 - 100+ times performance loss by doing it the state-change way.
Unfortunately this triangle bleeding scenario happens many thousands of times, so the state machine will be getting flooded with "draw triangles, depth off, draw triangles that bleed through, depth on, ...", except N times, where N can get large (N >= 1000).
A good way of imagining this is having a set of triangles T_i, and a set of triangles that bleed through B_i where B_i only bleeds through T_i, and i ranges from 0...1000+. Note that if we are drawing B_100, then it should only bleed through T_100, not T_99 or T_101.
My next thought is to draw all the triangles with their integer into one framebuffer (along with the integer), then draw the bleed through triangles into another framebuffer (also with the integer), and then merge these framebuffers together. I figure they will have the color, depth, and the integer, so I can hopefully merge them in the fragment shader.
Problem is, I have no idea how to write an integer alongside the out vec4 fragColor in the fragment shader.
Questions (and in short)
This leaves me with two questions:
How do I write an integer into a framebuffer? Do I need to write to 4 separate texture framebuffers? (like one color/depth framebuffer texture, another integer framebuffer texture, and then double this so I can merge the pairs of framebuffers together at some point?)
To make this more clear, the algorithm would look like
Render all the 'could be bled from triangles', described above as set T_i,
write colors and depth info into FB1, and write integers into FB2
Render all the 'bleeding' triangles, described above as set B_i,
write colors and depth into FB3, and write integers to FB4
Bind the textures for FB1, FB2, FB3, FB4
Render each pixel by sampling the RGBA, depth, and integers
from the appropriate texture and write those out into the
final framebuffer
I would need to access the color and depth from the textures in the shader. I would also need to access the integer from the other texture. Then I can do the comparison and choose which pixel to write to the default framebuffer.
Is this idea possible? I assume if (1) is, then the answer is yes. Maybe another question could be whether there's a better way. I tried thinking of doing this with the stencil buffer but had no luck
What you want is theoretically possible, but I can't speak as to its performance. You'll be reading and writing a whole lot of texels in a lot of textures for every program iteration.
Anyway to answer your questions:
A framebuffer can have multiple color attachments by using glFramebufferTexture2D with GL_COLOR_ATTACHMENT0, GL_COLOR_ATTACHMENT1, etc. Each texture can then have its own internal format, in your example you probably want a regular RGB texture for your color output, and a second 1-integer only texture.
Your depth buffer is complicated, because you don't want to let OpenGL handle it as normal. If you want to take over the depth buffer, you probably want to attach it as yet another, float texture that you can check against or not your screen-space fragments.
If you have doubts about your shader, remember that you can bind the any number of textures as input samplers you program in code, and each color bind gets its own output value (your shader runs per-texel, so you output one value at a time). Make sure the format of your output is correct, ie vec3/vec4 for the color buffer, int for your integer buffer and float for the float buffer.
And stencil buffers won't help you turn depth checking on or off in a single (possibly indirect) draw call. I can't visualize what your bleeding thing means, but it can probably help with that? Maybe? But definitely not conditional depth checking.

OpenGL trim/inline contour of stencil

I have created a shape in my stencil buffer (black in the picture below). Now I would like to render to the backbuffer. I would like one texture on the outer pixels (say 4 pixels) of my stencil (red), and an other texture on the remaining pixels (red).
I have read several solutions that involve scaling, but that will not work when there is no obvious center of the shape.
How do I acquire the desired effect?
The stencil buffer works great for doing operations on the specific fragments being overlaid onto them. However, it's not so great for doing operations that require looking at pixels other than the one corresponding to the fragment being rendered. In order to do outlining, you have to ask about the values of neighboring pixels, which stencil operations don't allow.
So, if it is possible to put the stencil data you want to test against in a non-stencil format image (ie: a color image, maybe with an integer texture format), that would make things much simpler. You can do the effect of stencil discarding by using discard directly in the fragment shader. Since you can fetch arbitrarily from the texture (as long as you're not trying to modify it), you can fetch neighboring pixels and test their values. You can use that to identify when a fragment is near a border.
However, if you're relying on specialized stencil operations to build the stencil data itself (like bitwise operations), then that's more complicated. You will have to employ stencil texturing operations, so you're going to have to render to an FBO texture that has a depth/stencil format. And you'll have to set it up to allow you to read from the stencil aspect of the texture. This is an OpenGL 4.3 feature.
This effectively converts it into an 8-bit unsigned integer texture. That allows you to play whatever games you need to. But if you want to use stencil tests to discard fragments, you will also need texture barrier functionality to allow you to read from an image that's attached to the current FBO. But you don't need to actually use the barrier, since you should mask off stencil writing. You just need GL 4.5 or the NV/ARB_texture_barrier extension to be available, which they widely are.
Either way this happens, the biggest difficulty is going to be varying the size of the border. It is easy to just test the neighboring 9 pixels to see if it is at a border. But the larger the border size, the larger the area of pixels each fragment has to test. At that point, I would suggest trying to look for a different solution, one that is based on some knowledge of what pattern is being written into the stencil buffer.
That is, if the rendering operation that lays down the stencil has some knowledge of the shape, then it could compute a distance to the edge of the shape in some way. This might require constructing the geometry in a way that it has distance information in it.

Is the stencil buffer still relevant in modern OpenGL?

Me and a friend have been having an ongoing argument about the stencil buffer. In short I haven't been able to find a situation where the stencil buffer would provide any advantage over the programmable pipeline tools in OpenGL 3.2+. Are there any uses to the stencil buffer in modern OpenGL?
[EDIT]
Thanks everyone for all the inputs on the subject.
It is more useful than ever since you can sample stencil index textures from fragment shaders. It should not even be argued that the stencil buffer is not part of the programmable pipeline.
The depth buffer is used for simple pass/fail fragment rejection, which the stencil buffer can also do as suggested in comments. However, the stencil buffer can also accumulate information about test results over multiple passes. All sorts of logic and counting applications exist such as measuring a scene's depth complexity, constructive solid geometry, etc.
To add a recent example to Andon's answer, GTA V uses the stencil buffer kinda like an ID buffer to mark the player character, cars, vegetation etc.
It subsequently uses the stencil buffer to e.g. apply subsurface scattering only to the character or exclude him from motion blur.
See the GTA V Graphics Study (highly recommended, it's a great read!)
Edit: sure you can do this in software. But you can do rasterization or tessellation in software just as well... In the end it's about performance I guess. With depth24stencil8 you have a nice hardware-supported format, and the stencil test is most likely faster then doing discards in the fragment shader.
Just to provide one other use case, shadow volumes (aka "stencil shadows") are still very relevant: https://en.wikipedia.org/wiki/Shadow_volume
They're useful for indoor scenes where shadows are supposed to be pixel perfect, and you're less likely to have alpha-tested foliage messing up the extruded shadow volumes.
It's true that shadow maps are more common, but I suspect that stencil shadows will have a comeback once the brain dead Createive/3DLabs patent expires on the zfail method.

Using the stencil buffer to merge dynamic shadow [duplicate]

I'm not sure if this is possible to do, but it's worth a shot. I'm using the stencil buffer to reduce overdraw for light volumes in a deferred renderer using this algorithm (when the camera is outside the volume):
Using a cheap shader, draw back faces with depth testing set to LEQUAL, marking them in the stencil buffer.
Using the expensive lighting shader, draw front faces with depth testing set to GEQUAL.
This will cause only pixels within the light volume to be shaded. The problem with this comes when drawing multiple lights. First, since state changes are expensive, it's probably not the best idea to repeatedly switch between the cheap and expensive shader for each light. Ideally I'd like to take advantage of all 8 bits of the stencil buffer by rendering 8 light volumes with the cheap shader followed by 8 light volumes with the expensive shader. However, issues arise when lights overlap, as there's no way to tell which pixels belong to which lights.
The solution that comes to mind would be to use 1 bit in the stencil buffer per light. So, for light n, mark the nth bit in the stencil buffer in the cheap pass, then only render pixels with that bit on during the expensive pass.
I haven't used the stencil buffer before, but from what I'm reading this doesn't seem possible. For this to work I'd have to set the stencil buffer using bitwise OR and the stencil function would have to be bitwise AND. However, the only operations on the stencil buffer I can see are: KEEP, ZERO, REPLACE, INCR, DECR, and INVERT, and the only functions are: NEVER, ALWAYS, LESS, EQUAL, LEQUAL, GEQUAL, GREATER, and NOTEQUAL.
Is there any way to somehow get this OR and ANDing behavior using the stencil buffer? And if not, is there an alternative approach to efficiently rendering light volumes?
To get ORing update behavior, use glStencilMask combined with GL_REPLACE. To get ANDing test behavior, use the glStencilFunc mask parameter combined with GL_EQUAL. In both cases, you'll want the ref parameter to glStencilFunc to be 0xff.

GLSL Interlacing

I would like to efficiently render in an interlaced mode using GLSL.
I can alrdy do this like:
vec4 background = texture2D(plane[5], gl_TexCoord[1].st);
if(is_even_row(gl_TexCoord[1].t))
{
vec4 foreground = get_my_color();
gl_FragColor = vec4(fore.rgb * foreground .a + background .rgb * (1.0-foreground .a), background .a + fore.a);
}
else
gl_FragColor = background;
However, as far as I have understood the nature of branching in GLSL is that both branches will actually be executed, since "even_row" is considered as run-time value.
Is there any trick I can use here in order to avoid unnecessarily calling the rather heavy function "get_color"? The behavior of is_even_row is quite static.
Or is there some other way to do this?
NOTE: glPolygonStipple will not work since I have custom blend functions in my GLSL code.
(comment to answer, as requested)
The problem with interlacing is that GPUs run shaders in 2x2 clusters, which means that you gain nothing from interlacing (a good software implementation might possibly only execute the actual pixels that are needed, unless you ask for partial derivatives).
At best, interlacing runs at the same speed, at worst it runs slower because of the extra work for the interlacing. Some years ago, there was an article in ShaderX4, which suggested interlaced rendering. I tried that method on half a dozen graphics cards (3 generations of hardware of each the "two big" manufacturers), and it ran slower (sometimes slightly, sometimes up to 50%) in every case.
What you could do is do all the expensive rendering in 1/2 the vertical resolution, this will reduce the pixel shader work (and texture bandwidth) by 1/2. You can then upscale the texture (GL_NEAREST), and discard every other line.
The stencil test can be used to discard pixels before the pixel shader is executed. Of course the hardware still runs shaders in 2x2 groups, so in this pass you do not gain anything. However, that does not matter if it's just the very last pass, which is a trivial shader writing out a single fetched texel. The more costly composition shaders (the ones that matter!) run at half resolution.
You find a detailled description including code here: fake dynamic branching. This demo avoids lighting pixels by discarding those that are outside the light's range using the stencil.
Another way which does not need the stencil buffer is to use "explicit Z culling". This may in fact be even easier and faster.
For this, clear Z, disable color writes (glColorMask), and draw a fullscreen quad whose vertices have some "close" Z coordinate, and have the shader kill fragments in every odd line (or use the deprecated alpha test if you want, or whatever). gl_FragCoord.y is a very simple way of knowing which line to kill, using a small texture that wraps around would be another (if you must use GLSL 1.0).
Now draw another fullscreen quad with "far away" Z values in the vertices (and with depth test, of course). Simply fetch your half-res texture (GL_NEAREST filtering), and write it out. Since the depth buffer has a value that is "closer" in every other row, it will discard those pixels.
How does glPolygonStipple compare to this? Polygon stipple is a deprecated feature, because it is not directly supported by the hardware and has to be emulated by the driver either by "secretly" rewriting the shader to include extra logic or by falling back to software.
This is probably not the right way to do interlacing. If you really need to achieve this effect, don't do it in the fragment shader like this. Instead, here is what you could do:
Initialize a full screen 1-bit stencil buffer, where each bit stores the parity of its corresponding row.
Render your scene like usual to a temporary FBO with 1/2 the vertical resoltion.
Turn on the stencil test, and switch the stencil func depending on which set of scan lines you are going to draw.
Blit a rescaled version of the aforementioned fbo (containing the contents of your frame) to the stencil buffer.
Note that you could skip the offscreen FBO step and draw directly using the stencil buffer, but this would waste some fill rate testing those pixels that are just going to clipped anyway. If your program is shader heavy, the solution I just mentioned would be optimal. If it is not, you may end up being marginally better off drawing directly to the screen.