Early-stencil culling without early-z culling - opengl

I already have an idea of the answer but I need to be sure.
I render a scene in two pass.
In the first pass, if depth test succeeds, I mark stencil bit as 1 :
glEnable(GL_STENCIL_TEST);
glStencilMask(GL_TRUE);
glStencilOp(GL_KEEP, GL_KEEP, GL_REPLACE);
glStencilFunc(GL_ALWAYS, 1, GL_TRUE);
The second pass only writes where stencil is 1 :
glStencilFunc(GL_EQUAL, 1, GL_TRUE); // Pass test if stencil value is 1
glStencilMask(GL_FALSE); // Don't write anything to stencil buffer
In fact, this works properly but I expected a huge increase in terms of performance.
The shader used in the second pass is particular : it uses discard and gl_FragDepth affectation.
That makes early-z culling impossible. Fortunately, I'm only interested in early-stencil culling.
So there is my question : is there a way to take advantage from early-stencil culling without early-z culling ?
This thread is very related to this one, but I really need to use discard and gl_FragDepth affectation in the second shader...

There is no such thing as early stencil tests. Or early Z/depth tests for that matter. There are only early fragment tests, which happen to include the stencil and depth tests, but also other operations. They cannot be performed early in piecemeal; it's all early or all late.

Related

Read depth values while stencil testing(same texture)

I know that it is a very bad idea to read/write from/to the same texture/location, because this would result in undefined behaviour.
But in my case, if depth testing is disabled and I read the depth values in a shader, is it ok to do the stencil testing at the same time as reading the depth values within the same texture?
In my opinion there should not be any problems, because i'm not reading the stencil buffer values in the shader. Or, could there by any hardware related problems when the texture is bound for reading in a shader and OpenGL uses it to do the stencil testing?
This texture is filled with depth/stencil values. I wan't to avoid some heavy BRDF lighting(Directional light) calculations on specific pixels(the sky).
Example code:
//Contains the depth/stencil texture(deferredDepthStencilTextureID).
//Two FBO's are sharing the same depth/stencil texture.
lightAccumulationFBO->bind();
glEnable(GL_STENCIL_TEST);
glStencilFunc(GL_EQUAL, 1, 0xFF);
//Disable writing to the stencil buffer, i.e. all the bits is write-protected.
glStencilMask(0x00);
glDisable(GL_DEPTH_TEST);
glEnable(GL_BLEND);
glBlendFunc(GL_ONE, GL_ONE); //Additive blending. Light accumulation.
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, deferredDepthStencilTextureID); //GL_DEPTH24_STENCIL8
//Bind other textures here...
shader.bind();
//Uniforms here...
postProcessQuad->renderQuad();
shader.unbind();
//Unbind all the textures here.
...
glDisable(GL_BLEND);
glEnable(GL_DEPTH_TEST);
glDisable(GL_STENCIL_TEST);
lightAccumulationFBO->unbind();
But in my case, if depth testing is disabled and I read the depth values in a shader, is it ok to do the stencil testing at the same time as reading the depth values within the same texture?
No. Whether an operation is defined or not is based on the images attached to the FBO and read from. Not the components of said images. And no, write masking will not save you from undefined behavior.
So unless your stencil texture is separate from your depth texture, that's not going to work. And good luck finding hardware that will allow you to separate depth/stencil images.
Even with GL 4.5/NV/ARB_texture_barrier, the answer is still no. That functionality makes certain exceptions to the above rule, but only for operations that are due to fragment shader outputs. Stencil test operations are not fragment shader outputs, so they don't apply.

Check a stencil bit, and write to another bit

I'm rendering reflective surfaces in my 3D engine. I need to perform two stencil operations. First, I draw out the reflective surface with depth testing to find visible areas of the surface. I then need to draw out models into a G-Buffer, which is also stencilled to find areas to draw my skybox.
Draw Surface: always draw, write bit #1
Draw Models: draw only if bit #1 is set, write bit #2
How would I do this using OpenGL? I'm unsure of the relationship between the glStencilFunc ref and mask values, and the glDepthMask value.
glStencilOp
glStencilFunc
glStencilMask
The docs are quite specific, but it's not always intuitive or obvious what to do if you simply want to create and then activate a mask. I use these as a simple starting point...
Create the mask
Clear the stencil buffer to zeroes and write 1s to all pixels you draw to.
void createStencilMask()
{
// Clear the stencil buffer with zeroes.
// This assumes glClearStencil() is unchanged.
glClear(GL_STENCIL_BUFFER_BIT);
// Enable stencil raster ops
// 1. Enables writing to the stencil buffer (glStencilOp)
// 2. If stencil test (glStencilFunc) fails, no frags are written
glEnable(GL_STENCIL_TEST);
// sfail GL_KEEP - keep original value if stencil test fails
// dpfail GL_KEEP - also keep if the stencil passes but depth test fails
// dppass GL_REPLACE - write only if both pass and you'd normally see it
// this writes the value of 'ref' to the buffer
glStencilOp(GL_KEEP, GL_KEEP, GL_REPLACE);
// func GL_ALWAYS - the stencil test always passes
// ref 1 - replace with ones instead of zeroes
// mask 1 - only operate on the first bit (don't need any more)
// Assumes glStencilMask() is unchanged
glStencilFunc(GL_ALWAYS, 1, 1);
}
Call the above function, and draw your stuff. You can set glDepthMask and glColorMask so this doesn't actually affect the current colour/depth and your scene, only the stencil buffer.
Draw, using the mask
Only draw to pixels with 1s from the previous step.
void useStencilMask()
{
// Enable stencil raster ops
glEnable(GL_STENCIL_TEST);
// Just using the test, don't need to replace anything.
glStencilOp(GL_KEEP, GL_KEEP, GL_KEEP);
// Only render if the current pixel stencil value is one.
glStencilFunc(GL_EQUAL, 1, 1);
}
Draw and then leave the test disabled glDisable(GL_STENCIL_TEST) when you're done.
Reading and writing different bits
Now to focus on your question...
This bit's a little tricky because the same ref value of glStencilFunc() is used in both the stencil test and as the value to replace. However, you can get around this with masks:
Use mask in glStencilFunc() to ignore bits when testing/reading.
Use glStencilMask() to stop certain bits from getting written.
In your case you don't need to mask out the first bit from being written because it's already set. Simply update useStencilMask() to glStencilOp(GL_KEEP, GL_KEEP, GL_REPLACE) and glStencilFunc(GL_EQUAL, 3, 1);. Because mask is 1, only the first bit is used in the test for equality. However the whole ref of 3 (which is 0b11) is written. You could use glStencilMask(2) (which is 0b10) to stop the first bit from being written but it's already one so it doesn't matter.
You could also make use of GL_INCR which would set the second bit and remove the first. Or perhaps clear with ones and use GL_ZERO to mark both your bits.

Occlusion queries with stencil test only

Do occlusion queries still work if I disable depth testing altogether when the obstacle set is known a priori to be strictly in-between the camera and the object to be tested?
This is an attempt to improve performance, as, logically, I don't need complex z-tests if none of the occluders are behind the occludee.
I'm using the following commands to initialize color/depth/stencil buffers:
SDL_GL_SetAttribute(SDL_GL_RED_SIZE, 0);
SDL_GL_SetAttribute(SDL_GL_GREEN_SIZE, 0);
SDL_GL_SetAttribute(SDL_GL_BLUE_SIZE, 0);
SDL_GL_SetAttribute(SDL_GL_DEPTH_SIZE, 0);
SDL_GL_SetAttribute(SDL_GL_STENCIL_SIZE, 1);
...
glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
glDisable(GL_CULL_FACE);
glDisable(GL_DEPTH_TEST);
glDepthMask(GL_FALSE);
glEnable(GL_STENCIL_TEST);
glStencilMask(0x00000001);
...
glClear(GL_STENCIL_BUFFER_BIT);
The most conclusive document is the latest OpenGL spec. From the OpenGL 4.5 spec, section "17.3.7 Occlusion Queries", on page 476 (with emphasis added by me):
Occlusion queries use query objects to track the number of fragments or samples that pass the depth test.
When an occlusion query is active, the samples-passed count is incremented for each fragment that passes the depth test.
Therefore, the real question becomes: What does "pass the depth test" mean? Does a pixel pass the depth test if there is no depth test? And how does the stencil test come into play?
The key is that the stencil test is applied before the depth test, which is the behavior defined in the spec. So only fragments that pass the stencil test will go through the depth test, and will therefore be counted in the occlusion query. Or in other words, only fragments that pass both the stencil and depth test are counted.
One approach that will definitely work is that you enable the depth test, and let all fragments pass the depth test. This will then count all the fragments that passed the stencil test. The settings to use for this are:
glEnable(GL_DEPTH_TEST);
glDepthFunc(GL_ALWAYS);
glEnable(GL_STENCIL_TEST);
...
Now, will it also work as desired without having a depth buffer, or with the depth buffer disabled? The first part of this is answered at the end of section "17.3.6 Depth Buffer Test":
If there is no depth buffer, it is as if the depth buffer test always passes.
In this case, the answer is yes, you can use an occlusion query without a depth buffer, and it will count the fragments that pass the stencil test.
The second case is covered earlier in section "17.3.6 Depth Buffer Test":
When disabled, the depth comparison and subsequent possible updates to the depth buffer value are bypassed and the fragment is passed to the next operation.
Figure 17.1 in the spec shows "Occlusion Query" as the next operation following "Depth Buffer Test". Therefore, all fragments that passed the earlier tests (including stencil) will be counted by the occlusion query if the depth test is disabled.
And the final answer is: YES, you can use occlusion queries with just a stencil test.
Acknowledgement: Latest version revised based on feedback by #jozxyqk and #user2464424
From www.opengl.org/registry/specs/ARB/occlusion_query.txt
This extension solves both of those [HP_occlusion_test] problems. It returns as its
result the number of samples that pass the depth and stencil tests
...
Exactly what stage in the pipeline are we counting samples at?
RESOLVED: We are counting immediately after both the depth and
stencil tests, i.e., samples that pass both. Note that the depth
test comes after the stencil test, so to say that it is the
number that pass the depth test is sufficient; though it is often
conceptually helpful to think of the depth and stencil tests as
being combined, because the depth test's result impacts the
stencil operation used.
From www.opengl.org/registry/specs/ARB/occlusion_query2.txt
This extension trivially adds a boolean occlusion query to ARB_occlusion_query
With the depth test off, I'd assume all fragments pass. From the above it sounds like you can rely on the stencil test alone affecting occlusion query results, which is at odds with the following from opengl.org/wiki.
The stencil test, alpha test, or fragment shader discard​ is irrelevant with queries
The extension does not mention discard. The occlusion query section in GL 4.5 core/compat specs only mentions counting fragments that pass the depth test. If the fragment doesn't make it to the depth test, then I guess it isn't considered to pass it.
A bit of a side note, but I think it's also worth mentioning the early fragment test.

How to render a mesh behind another mesh, like a mask?

I would like it so that when mesh A (the character), is behind mesh B (a wall), it is still rendered but with a solid gray color.
I'm beginning opengles 2.0 and I'm still unsure as to go about this. From what I understand the depth buffer allows meshes to fight out who will be seen in the fragments they encompass, also there are various blend functions that could possibly involved in this, finally the stencil buffer looks like it would also have this desirable functionality.
So is there a way to output different colors through the shader based on a failed depth test? Is there a way to do this through blending? Or must I use the stencil buffer some how?
And what is this technique called for future reference? I've seen it used in a lot of video games.
This can be done using the stencil buffer. The stencil buffer gives each pixel some additional bits which can be used as a bitmask or a counter. In your case you'd configure the stencil test unit to set a specific bitmask when the depth test for the character fails (because it's obstructed by the well). Then you switch the stencil test mode operation to pass the stencil test for this specific bitmask, and render a full viewport, solid quad in the desired color, with depth testing and depth writes disabled.
Code
I strongly recommend you dive deep into the documentation for the stencil test unit. It's a very powerful mechanism, often overlooked. Your particular problem would be solved by the following. I stuggest you take this example code, read it in parallel to the stencil test functions references glStencilFunc, glStencilOp.
You must add a stencil buffer to your frame buffer's pixel format – how you do that is platform dependent. For example, if you're using GLUT, then you'd add |GLUT_STENCIL to the format bitmask of glutInitDisplayMode; on iOS you'd set a property on your GLKView; etc. Once you've added a stencil buffer, you should clear it along with your other render buffers by adding |GL_STENCIL_BUFFER_BIT to the initial glClear call of each drawing.
GLint const silouhette_stencil_mask = 0x1;
void display()
{
/* ... */
glEnable(GL_DEPTH_TEST);
glDepthMask(GL_TRUE);
glDepthFunc(GL_LESS);
glDisable(GL_STENCIL_TEST);
/* The following two are not necessary according to specification.
* But drivers can be buggy and this makes sure we don't run into
* issues caused by not wanting to change the stencil buffer, but
* it happening anyway due to a buggy driver.
*/
glStencilFunc(GL_NEVER, 0, 0);
glStencilOp(GL_KEEP, GL_KEEP, GL_KEEP);
draw_the_wall();
glEnable(GL_STENCIL_TEST);
glStencilFunc(GL_ALWAYS, silouhette_stencil_mask, 0xffffffff);
glStencilOp(GL_KEEP, GL_REPLACE, GL_KEEP);
draw_the_character();
glStencilFunc(GL_EQUAL, silouhette_stencil_mask, 0xffffffff);
glStencilOp(GL_KEEP, GL_KEEP, GL_KEEP);
glDisable(GL_DEPTH_TEST);
glDepthMask(GL_FALSE);
draw_full_viewport_solid_color();
/* ... */
}

Early stencil culling

I'm trying to get early fragment culling to work, based on the stencil test.
My scenario is the following: I have a fragment shader that does a lot of work, but needs to be run only on very few fragments when I render my scene. These fragments can be located pretty much anywhere on the screen (I can't use a scissor to quickly filter out these fragments).
In rendering pass 1, I generate a stencil buffer with two possible values. Values will have the following meaning for pass 2:
0: do not do anything
1: ok to proceed, (eg. enter the fragment shader, and render)
Pass 2 renders the scene properly speaking. The stencil buffer is configured this way:
glStencilMask(1);
glStencilFunc(GL_EQUAL, 1, 1); // if the value is NOT 1, please early cull!
glStencilOp(GL_KEEP, GL_KEEP, GL_KEEP); // never write to stencil buffer
Now I run my app. The color of selected pixels is altered based on the stencil value, which means the stencil test works fine.
However, I should see a huge, spectacular performance boost with early stencil culling... but nothing happens. My guess is that the stencil test either happens after the depth test, or even after the fragment shader has been called. Why?
nVidia apparently has a patent on early stencil culling:
http://www.freepatentsonline.com/7184040.html
Is this the right away for having it enabled?
I'm using an nVidia GeForce GTS 450 graphics card.
Is early stencil culling supposed to work with this card?
Running Windows 7 with latest drivers.
Like early Z, early stencil is often done using hierarchical stencil buffering.
There are a number of factors that can prevent hierarchical tiling from working properly, including rendering into an FBO on older hardware. However, the biggest obstacle to getting early stencil testing working in your example is that you've left stencil writes enabled for 1/(8) bits in the second pass.
I would suggest using glStencilMask (0x00) at the beginning of the second pass to let the GPU know you are not going to write anything to the stencil buffer.
There is an interesting read on early fragment testing as it is implemented in current generation hardware here. That entire blog is well worth reading if you have the time.