Occlusion queries with stencil test only - opengl

Do occlusion queries still work if I disable depth testing altogether when the obstacle set is known a priori to be strictly in-between the camera and the object to be tested?
This is an attempt to improve performance, as, logically, I don't need complex z-tests if none of the occluders are behind the occludee.
I'm using the following commands to initialize color/depth/stencil buffers:
SDL_GL_SetAttribute(SDL_GL_RED_SIZE, 0);
SDL_GL_SetAttribute(SDL_GL_GREEN_SIZE, 0);
SDL_GL_SetAttribute(SDL_GL_BLUE_SIZE, 0);
SDL_GL_SetAttribute(SDL_GL_DEPTH_SIZE, 0);
SDL_GL_SetAttribute(SDL_GL_STENCIL_SIZE, 1);
...
glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
glDisable(GL_CULL_FACE);
glDisable(GL_DEPTH_TEST);
glDepthMask(GL_FALSE);
glEnable(GL_STENCIL_TEST);
glStencilMask(0x00000001);
...
glClear(GL_STENCIL_BUFFER_BIT);

The most conclusive document is the latest OpenGL spec. From the OpenGL 4.5 spec, section "17.3.7 Occlusion Queries", on page 476 (with emphasis added by me):
Occlusion queries use query objects to track the number of fragments or samples that pass the depth test.
When an occlusion query is active, the samples-passed count is incremented for each fragment that passes the depth test.
Therefore, the real question becomes: What does "pass the depth test" mean? Does a pixel pass the depth test if there is no depth test? And how does the stencil test come into play?
The key is that the stencil test is applied before the depth test, which is the behavior defined in the spec. So only fragments that pass the stencil test will go through the depth test, and will therefore be counted in the occlusion query. Or in other words, only fragments that pass both the stencil and depth test are counted.
One approach that will definitely work is that you enable the depth test, and let all fragments pass the depth test. This will then count all the fragments that passed the stencil test. The settings to use for this are:
glEnable(GL_DEPTH_TEST);
glDepthFunc(GL_ALWAYS);
glEnable(GL_STENCIL_TEST);
...
Now, will it also work as desired without having a depth buffer, or with the depth buffer disabled? The first part of this is answered at the end of section "17.3.6 Depth Buffer Test":
If there is no depth buffer, it is as if the depth buffer test always passes.
In this case, the answer is yes, you can use an occlusion query without a depth buffer, and it will count the fragments that pass the stencil test.
The second case is covered earlier in section "17.3.6 Depth Buffer Test":
When disabled, the depth comparison and subsequent possible updates to the depth buffer value are bypassed and the fragment is passed to the next operation.
Figure 17.1 in the spec shows "Occlusion Query" as the next operation following "Depth Buffer Test". Therefore, all fragments that passed the earlier tests (including stencil) will be counted by the occlusion query if the depth test is disabled.
And the final answer is: YES, you can use occlusion queries with just a stencil test.
Acknowledgement: Latest version revised based on feedback by #jozxyqk and #user2464424

From www.opengl.org/registry/specs/ARB/occlusion_query.txt
This extension solves both of those [HP_occlusion_test] problems. It returns as its
result the number of samples that pass the depth and stencil tests
...
Exactly what stage in the pipeline are we counting samples at?
RESOLVED: We are counting immediately after both the depth and
stencil tests, i.e., samples that pass both. Note that the depth
test comes after the stencil test, so to say that it is the
number that pass the depth test is sufficient; though it is often
conceptually helpful to think of the depth and stencil tests as
being combined, because the depth test's result impacts the
stencil operation used.
From www.opengl.org/registry/specs/ARB/occlusion_query2.txt
This extension trivially adds a boolean occlusion query to ARB_occlusion_query
With the depth test off, I'd assume all fragments pass. From the above it sounds like you can rely on the stencil test alone affecting occlusion query results, which is at odds with the following from opengl.org/wiki.
The stencil test, alpha test, or fragment shader discard​ is irrelevant with queries
The extension does not mention discard. The occlusion query section in GL 4.5 core/compat specs only mentions counting fragments that pass the depth test. If the fragment doesn't make it to the depth test, then I guess it isn't considered to pass it.
A bit of a side note, but I think it's also worth mentioning the early fragment test.

Related

Early-stencil culling without early-z culling

I already have an idea of the answer but I need to be sure.
I render a scene in two pass.
In the first pass, if depth test succeeds, I mark stencil bit as 1 :
glEnable(GL_STENCIL_TEST);
glStencilMask(GL_TRUE);
glStencilOp(GL_KEEP, GL_KEEP, GL_REPLACE);
glStencilFunc(GL_ALWAYS, 1, GL_TRUE);
The second pass only writes where stencil is 1 :
glStencilFunc(GL_EQUAL, 1, GL_TRUE); // Pass test if stencil value is 1
glStencilMask(GL_FALSE); // Don't write anything to stencil buffer
In fact, this works properly but I expected a huge increase in terms of performance.
The shader used in the second pass is particular : it uses discard and gl_FragDepth affectation.
That makes early-z culling impossible. Fortunately, I'm only interested in early-stencil culling.
So there is my question : is there a way to take advantage from early-stencil culling without early-z culling ?
This thread is very related to this one, but I really need to use discard and gl_FragDepth affectation in the second shader...
There is no such thing as early stencil tests. Or early Z/depth tests for that matter. There are only early fragment tests, which happen to include the stencil and depth tests, but also other operations. They cannot be performed early in piecemeal; it's all early or all late.

How can I write a different value to the depth buffer than the value used for depth comparisson?

In Open GL, is it possible to render a polygon with a regular depth test enabled, but when the depth buffer value is actually written to the depth buffer, I want to write a custom value?
(The reason is I'm rendering a particle system, which should be depth-tested against the geometry in the scene, but I want to write a very large depth value where the particle system is located, thus utilizing the depth-blur post-processing step to further blur the particle system)
Update
To further refine the question, is it possible without rendering in multiple passes?
You don't.
OpenGL does not permit you to lie. The depth test tests the value in the depth buffer against the incoming value to be written in the depth buffer. Once that test passes, the tested depth value will be written to the depth buffer. Period.
You cannot (ab)use the depth test to do something other than testing depth values.
ARB_conservative_depth/GL 4.2 does allow you a very limited form of this. Basically, you promise the implementation that you will only change the depth in a way that makes it smaller or larger than the original value. But even then, it would only work in specific cases, and then only so long as you stay within the limits you specified.
Enforcing early fragment tests will similarly not allow you to do this. The specification explicitly states that the depth will be written before your shader executes. So anything you write to gl_FragDepth will be ignored.
One way to do it in a single pass is by doing the depth-test "manually".
Set glDepthFunc to GL_ALWAYS
Then in the fragment shader, sample the current value of the depth buffer and depending on it discard the fragment using discard;
To sample the current value of the depth buffer, you either need ARM_shader_framebuffer_fetch_depth_stencil (usually on mobile platforms) or NV_texture_barrier. The later however will yield undefined results if multiple particles of the same drawcall render on top of each other, while the former in this case will use the written depth value of the last particle rendered at the same location.
You can also do it without any extension by copying the current depth buffer into a depth texture before you render the particles and then use that depth texture for your manual depth test. That also avoids that particles which render on top of each other would interfere, as they'll all use the old depth value for the manual test.
You can use gl_FragDepth in fragment shader to write your custom values.
gl_FragDepth = 0.3;
https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/gl_FragDepth.xhtml

What happens to the depth buffer if I discard a fragment in a shader using early_fragment_tests?

I'm using a fragment shader which discards some fragments using the discard keyword. My shader also uses the early_fragment_tests ( image store load obliges ).
EDIT :
I do not write the gl_FragDepth, I let the standard OpenGL handle the depth value.
Will my depth buffer be updated with the fragment's depth before the discard keyword is executed?
EDIT :
It does not seems like it on my NVidia Quadro 600 and K5000.
Any clue where I could find that information? FYI, I searched http://www.opengl.org/registry/specs/ARB/shader_image_load_store.txt. I found close enough topics but not that particular one.
Will my depth buffer be updated with the fragment's depth before the discard keyword is executed?
No, this sort of behavior is explicitly forbidden in a shader that contains discard or that writes an arbitrary value to gl_FragDepth. This is because in such a shader, the depth of your fragment after it is shaded may be unrelated the position generated during initial rasterization (pre-shading).
Without writing to gl_FragDepth or discarding, the depth of a fragment is actually known long before the actual fragment shader executes and this forms the foundation for early depth tests. Rasterization/shading can be skipped for some (individual tiled regions) or all of a primitive if it can be determined that it would have failed a depth test before the fragment shader is evaluated, but if the fragment shader itself is what determines a fragment's depth, then all bets are off.
There is an exception to this rule in DX11 / OpenGL 4.x. If you write your shaders in such a way that you can guarantee the output depth will always preserve the result of a depth test (same result as the depth generated during rasterization), early fragment tests can be enabled in a shader that uses discard or writes to gl_FragDepth. This feature is known as conservative depth, and unless you use this it is generally understood that discard is going to break early depth optimizations across the board.
Now, since you should never write to the depth buffer before you know whether the value you are writing passes or fails a depth test (gl_FragDepth may be different) or if the fragment even survives (discard may be used), you can see why a primitive shaded by a fragment shader that contains discard cannot write to the depth buffer before the shader is evaluated.
I think the information you are looking for is on that page:
If early fragment tests are enabled, any depth value computed by the
fragment shader has no effect. Additionally, the depth buffer, stencil
buffer, and occlusion query sample counts may be updated even for
fragments or samples that would be discarded after fragment shader
execution due to per-fragment operations such as alpha-to-coverage or
alpha tests.
The word "may" in "the depth buffer, [etc.] may be updated", implies it is implementation dependent (or completely random).

How to render a mesh behind another mesh, like a mask?

I would like it so that when mesh A (the character), is behind mesh B (a wall), it is still rendered but with a solid gray color.
I'm beginning opengles 2.0 and I'm still unsure as to go about this. From what I understand the depth buffer allows meshes to fight out who will be seen in the fragments they encompass, also there are various blend functions that could possibly involved in this, finally the stencil buffer looks like it would also have this desirable functionality.
So is there a way to output different colors through the shader based on a failed depth test? Is there a way to do this through blending? Or must I use the stencil buffer some how?
And what is this technique called for future reference? I've seen it used in a lot of video games.
This can be done using the stencil buffer. The stencil buffer gives each pixel some additional bits which can be used as a bitmask or a counter. In your case you'd configure the stencil test unit to set a specific bitmask when the depth test for the character fails (because it's obstructed by the well). Then you switch the stencil test mode operation to pass the stencil test for this specific bitmask, and render a full viewport, solid quad in the desired color, with depth testing and depth writes disabled.
Code
I strongly recommend you dive deep into the documentation for the stencil test unit. It's a very powerful mechanism, often overlooked. Your particular problem would be solved by the following. I stuggest you take this example code, read it in parallel to the stencil test functions references glStencilFunc, glStencilOp.
You must add a stencil buffer to your frame buffer's pixel format – how you do that is platform dependent. For example, if you're using GLUT, then you'd add |GLUT_STENCIL to the format bitmask of glutInitDisplayMode; on iOS you'd set a property on your GLKView; etc. Once you've added a stencil buffer, you should clear it along with your other render buffers by adding |GL_STENCIL_BUFFER_BIT to the initial glClear call of each drawing.
GLint const silouhette_stencil_mask = 0x1;
void display()
{
/* ... */
glEnable(GL_DEPTH_TEST);
glDepthMask(GL_TRUE);
glDepthFunc(GL_LESS);
glDisable(GL_STENCIL_TEST);
/* The following two are not necessary according to specification.
* But drivers can be buggy and this makes sure we don't run into
* issues caused by not wanting to change the stencil buffer, but
* it happening anyway due to a buggy driver.
*/
glStencilFunc(GL_NEVER, 0, 0);
glStencilOp(GL_KEEP, GL_KEEP, GL_KEEP);
draw_the_wall();
glEnable(GL_STENCIL_TEST);
glStencilFunc(GL_ALWAYS, silouhette_stencil_mask, 0xffffffff);
glStencilOp(GL_KEEP, GL_REPLACE, GL_KEEP);
draw_the_character();
glStencilFunc(GL_EQUAL, silouhette_stencil_mask, 0xffffffff);
glStencilOp(GL_KEEP, GL_KEEP, GL_KEEP);
glDisable(GL_DEPTH_TEST);
glDepthMask(GL_FALSE);
draw_full_viewport_solid_color();
/* ... */
}

Early stencil culling

I'm trying to get early fragment culling to work, based on the stencil test.
My scenario is the following: I have a fragment shader that does a lot of work, but needs to be run only on very few fragments when I render my scene. These fragments can be located pretty much anywhere on the screen (I can't use a scissor to quickly filter out these fragments).
In rendering pass 1, I generate a stencil buffer with two possible values. Values will have the following meaning for pass 2:
0: do not do anything
1: ok to proceed, (eg. enter the fragment shader, and render)
Pass 2 renders the scene properly speaking. The stencil buffer is configured this way:
glStencilMask(1);
glStencilFunc(GL_EQUAL, 1, 1); // if the value is NOT 1, please early cull!
glStencilOp(GL_KEEP, GL_KEEP, GL_KEEP); // never write to stencil buffer
Now I run my app. The color of selected pixels is altered based on the stencil value, which means the stencil test works fine.
However, I should see a huge, spectacular performance boost with early stencil culling... but nothing happens. My guess is that the stencil test either happens after the depth test, or even after the fragment shader has been called. Why?
nVidia apparently has a patent on early stencil culling:
http://www.freepatentsonline.com/7184040.html
Is this the right away for having it enabled?
I'm using an nVidia GeForce GTS 450 graphics card.
Is early stencil culling supposed to work with this card?
Running Windows 7 with latest drivers.
Like early Z, early stencil is often done using hierarchical stencil buffering.
There are a number of factors that can prevent hierarchical tiling from working properly, including rendering into an FBO on older hardware. However, the biggest obstacle to getting early stencil testing working in your example is that you've left stencil writes enabled for 1/(8) bits in the second pass.
I would suggest using glStencilMask (0x00) at the beginning of the second pass to let the GPU know you are not going to write anything to the stencil buffer.
There is an interesting read on early fragment testing as it is implemented in current generation hardware here. That entire blog is well worth reading if you have the time.