Why is scissor test behind fragment operation? - opengl

If I understand correctly, scissor test is per fragment operation, but I was wondering if it's possible to put the test before fragment shader so that fragments outside scissor box don't need to be shaded, or even before rasterizer?
The only reason of not doing so I can think of is to scissor the clear color.

The scissor test will in almost any real-world scenario affect the rasterization itself. The GPU's rasterizer units will not produce fragments outside of the scissor rect. When the OpenGL pipeline was first created, the fragment shader didn't exist. Stuff like texture mapping is considered part of the rasterization stage in earlier versions of the GL spec.
However, this conceptual pipeline is not what actual HW implements. And this poses no problem as long as the final result will not be changed by the deviating implementation.
You will typically see that even more importantly than scissor test, depth test will also be carried out before the fragment shader is invoked ("early Z"). This will work as long as the fragment shader does not modify the depth value of the fragments. Typically, the implementation will automatically enable early Z behind your back as long as there is no assignment to gl_FragDepth in the fragment shader.
Modern versions of the GL specification explicitly mention the "early" tests. Section 14.9 "Early Per-Fragment Tests" of the OpenGL 4.5 core profile specification states (emphasis mine):
Once fragments are produced by rasterization, a number of per-fragment
operations may be performed prior to fragment shader execution (see
section 15). If a fragment is discarded during any of these
operations, it will not be processed by any subsequent stage,
including fragment shader execution. Up to five operations are
performed on each fragment, in the following order:
the pixel ownership test (see section 17.3.1);
the scissor test (see section 17.3.2);
the stencil test (see section 17.3.5);
the depth buffer test (see section 17.3.6); and
occlusion query sample counting (see section 17.3.7).
The pixel ownership and scissor tests are always performed. The other operations are performed if and only if early fragment tests are enabled
in the active fragment shader (see section 15.2). [...]
So actually, the "late" scissor test doesn't really exist in the GL any more, even if the pipeline diagram in the very same document still lists it after the fragment shader.

Related

Does a fragment shader only run for visible fragments?

We need to write a raytracer in OpenGL. Now, I decided I would shoot a ray for every fragment shader call since, as far as I understand, a fragment is a screen pixel that could be written to by a geometry object. So I was wondering if a fragment shader would only run for visible pixels or for all pixels. If it only runs for visible ones, it would be a given that the primary ray (from screen to object) is not obstructed. This would save a lot of calculations.
There is absolutely no guarantee that the execution of a fragment shader means that the fragment is certainly visible.
Early depth test by itself will not save you. Rendering each triangle front-to-back will not save you; there is no guarantee in OpenGL that fragments are generated in order (only "as if" in order). And that's ignoring cases of overlap where it's impossible to have a proper ordering. Even issuing each triangle in its own separate rendering command guarantees nothing as far as OpenGL is concerned.
The only thing you can do to ensure this is to perform a depth pre-pass. That is, render your entire scene, but without a fragment shader active (and turn off color writes to the framebuffer). That will write all of the depth data to the depth buffer. That way, if you use early depth tests, when you render your scene again, the only fragments that pass the depth test will be those that are visible.
Depth pre-passes can be pretty fast, depending on your vertex shader and other aspects of your rendering pipeline.

Write gl_FragDepth while still executing depth pre-test

Given a depth-prepass renderer, I have the minimum depth values a given fragment can possibly contain- thus, it makes no sense to consider any fragments farther than denoted.
Now, I have a shader which writes to gl_FragDepth, however is guaranteed to write a value greater than or equal to the depth value of its polygonal face. How can I still execute a depth-pretest (ie, if the fragment depth is farther than the buffer value, discard without shader execution), but allows me to write a value different (greater) than interpolated face depth if it passes the pre-test?
Starting with OpenGL 4.2 (GLSL 4.20), the functionality you're looking for is available as a layout qualifier on gl_FragDepth. It allows you to specify your intent on how you are going to modify the depth output in the fragment shader. For example, the following specifies that you are only going to change the depth value to make in greater:
layout (depth_greater) out float gl_FragDepth;
This will allow the early depth test to still be used when the depth is modified in the fragment shader. If you do not follow the "contract" you establish with this qualifier, you will get undefined behavior. For example, with the qualifier above, if you make the depth smaller, fragments that would otherwise be visible may get eliminated.
The functionality is based on the GL_AMD_conservative_depth and GL_ARB_conservative_depth extensions. If you want to use it with OpenGL versions lower than 4.2, you could check for the presence of these extension, and use it in one of its extension forms if it's available.

Multisampling in pipeline

In multisampling, during rasterization there are more than one sample points in each pixel and sample points are decided which constitutes the primitive.
which attributes are same for each sample in a pixel? I read somewhere that color and texture values are same but depth and stensil values for samples in a pixel are different. But as fragment shader is executed for each sample points then they should be different.
Also, When does the multiple samples are resolved in pipeline, after fragment shader? And do they linearly average out?
First you have to realize how multisampling works and why it was created.
I am going to approach this from the perspective of anti-aliasing, because that was the primary use-case for multisampling up until multisample textures were introduced in GL3.
When something is multisampled, this means that each sample point contains multiple samples. These samples may be identical to one another if a primitive has relatively uniform characteristics (e.g. has the same depth everywhere) and smart GL/hardware implementations are capable of identifying such situations and reducing memory bandwidth by reading/writing shared samples intelligently (similar to color/depth buffer compression). However, the cost in terms of required storage for a 4x MSAA framebuffer is the same as 4x SSAA because GL has to accommodate the worst-case scenario, where each of the 4 samples is unique.
Which attributes are same for each sample in a pixel?
Each fragment may cover multiple sample points for attributes such as color, texture coordinates, etc. Rather than invoking the fragment shader 4x as frequently to achieve 4x anti-aliasing, a trick was devised where each attribute would be sampled at the fragment center (this is the default) and then a single output written to each of the covered sample locations. The default behavior is somewhat lacking in the situation where the center of a fragment is not part of the actual coverage area - for this, centroid sampling was introduced... vertex attributes will be interpolated at the center of a primitive's coverage area within a fragment rather than the center of the fragment itself.
Later, when it comes time to write a color to the framebuffer, all of these samples need to be averaged to produce a single pixel; we call this multisample resolve. This works well for some things, but it does not address issues of aliasing that occurs during fragment shading itself.
Texturing occurs during the execution of a fragment shader, and this means that the sample frequency for texturing remains the same, so MSAA generally does not help with texture aliasing. Thus, while supersample anti-aliasing improves both aliasing at geometric edges and texture / shader aliasing (things like specular highlights), multisampling generally only reduces "jaggies."
I read somewhere that color and texture values are same but depth and stensil values for samples in a pixel are different.
In short, anything that is computed in the fragment shader will be the same for all covered samples. Anything that can be determined before fragment shading (e.g. depth) may vary.
Fragment tests such as depth/stencil are evaluated for each sub-sample. But multisampled depth buffers carry some restrictions. Up until D3D 10.1, hardware was not required to support multisampled depth textures so you could not sample multisampled depth buffers in a fragment shader.
But as fragment shader is executed for each sample points then they should be different.
There is a feature called sample shading, which can force an implementation of MSAA to work more like SSAA by improving the ratio between fragments shaded and samples generated during rasterization. But by default, the behavior you are describing is not multisampling.
When does the multiple samples are resolved in pipeline, after fragment shader? And do they linearly average out?
Multisample resolution occurs after fragment shading, anytime you have to write the contents of a multisampled buffer into a singlesampled buffer. This includes things like glBlitFramebuffer (...). You can also manually implement multisample resolution yourself in the fragment shader, if you use multisampled textures.
Finally, regarding the process used for multisample resolution, that is implementation-specific as is the sample layout. If you ever open your display driver's control panel and look at the myriad of anti-aliasing options available you will see multiple options for sample layout and MSAA resolve algorithm.
I would highly suggest you take a look at this article. While it is related to D3D10+ and not OpenGL, the general rules apply to both APIs (D3D9 has slightly different rules) and the quality of the diagrams is phenomenal.
In particular, pay special attention to the section on MSAA rasterization rules for triangles, which states:
For a triangle, a coverage test is performed for each sample location (not for a pixel center). If more than one sample location is covered, a pixel shader runs once with attributes interpolated at the pixel center. The result is stored (replicated) for each covered sample location in the pixel that passes the depth/stencil test.

What happens to the depth buffer if I discard a fragment in a shader using early_fragment_tests?

I'm using a fragment shader which discards some fragments using the discard keyword. My shader also uses the early_fragment_tests ( image store load obliges ).
EDIT :
I do not write the gl_FragDepth, I let the standard OpenGL handle the depth value.
Will my depth buffer be updated with the fragment's depth before the discard keyword is executed?
EDIT :
It does not seems like it on my NVidia Quadro 600 and K5000.
Any clue where I could find that information? FYI, I searched http://www.opengl.org/registry/specs/ARB/shader_image_load_store.txt. I found close enough topics but not that particular one.
Will my depth buffer be updated with the fragment's depth before the discard keyword is executed?
No, this sort of behavior is explicitly forbidden in a shader that contains discard or that writes an arbitrary value to gl_FragDepth. This is because in such a shader, the depth of your fragment after it is shaded may be unrelated the position generated during initial rasterization (pre-shading).
Without writing to gl_FragDepth or discarding, the depth of a fragment is actually known long before the actual fragment shader executes and this forms the foundation for early depth tests. Rasterization/shading can be skipped for some (individual tiled regions) or all of a primitive if it can be determined that it would have failed a depth test before the fragment shader is evaluated, but if the fragment shader itself is what determines a fragment's depth, then all bets are off.
There is an exception to this rule in DX11 / OpenGL 4.x. If you write your shaders in such a way that you can guarantee the output depth will always preserve the result of a depth test (same result as the depth generated during rasterization), early fragment tests can be enabled in a shader that uses discard or writes to gl_FragDepth. This feature is known as conservative depth, and unless you use this it is generally understood that discard is going to break early depth optimizations across the board.
Now, since you should never write to the depth buffer before you know whether the value you are writing passes or fails a depth test (gl_FragDepth may be different) or if the fragment even survives (discard may be used), you can see why a primitive shaded by a fragment shader that contains discard cannot write to the depth buffer before the shader is evaluated.
I think the information you are looking for is on that page:
If early fragment tests are enabled, any depth value computed by the
fragment shader has no effect. Additionally, the depth buffer, stencil
buffer, and occlusion query sample counts may be updated even for
fragments or samples that would be discarded after fragment shader
execution due to per-fragment operations such as alpha-to-coverage or
alpha tests.
The word "may" in "the depth buffer, [etc.] may be updated", implies it is implementation dependent (or completely random).

Does OpenGL stencil test happen before or after fragment program runs?

When I set glStencilFunc( GL_NEVER, . . . ) effectively disabling all drawing, and then run my [shader-bound] program I get no performance increase over letting the fragment shader run. I thought the stencil test happened before the fragment program. Is that not the case, or at least not guaranteed? Replacing the fragment shader with one that simply writes a constant to gl_FragColor does result in a higher FPS.
It's actually a bit of both. Per fragment operations should happen after the fragment program as you can see in this OpenGL ES 2.0 pipeline diagram. However, many modern graphic cards have an early z test that discards fragments earlier as long as you don't write to the depth in the fragment shader.
Here is a paper from AMD/ATI that talks about such tests. I remember reading that the spec allows early tests as long as doing them before the shader produces the same result as doing them after, which is why you wouldn't want to modify the depth or discard a fragment in the shader. This thread on OpenGL forums has some interesting discussion about it.
In addition to fragment depth modification, there are a few other things that can prevent the depth/stencil test from happening before the fragment shader. If z-writes are enabled, then any method of aborting the fragment in the shader will do this, such as alpha-test or the discard shader instruction.
If the GPU wants to do the stencil/z test in the same operation as the z/stencil write, it has to wait until after the fragment shader executes so that it knows the fragment is allowed to write to the z-buffer. This may vary between different cards though. At least it should be easy to tell if it's your current problem.
Take a look at the following outline for the DX10 pipeline, it says that the stencil test runs before the pixel shader:
http://3.bp.blogspot.com/_2YU3pmPHKN4/Sz_0vqlzrBI/AAAAAAAAAcg/CpDXxOB-r3U/s1600-h/D3D10CheatSheet.jpg
and the same is true in DX11:
http://4.bp.blogspot.com/_2YU3pmPHKN4/S1KhDSPmotI/AAAAAAAAAcw/d38b4oA_DxM/s1600-h/DX11.JPG
I don't know if this is mandated in the OpenGL spec but it would be detrimental for an implementation to not do the stencil test before running the fragment program.
As you can see here:
http://www.opengl.org/wiki/Stencil_Test
Stencil test is run after fragmentShader.
I understand that it is not good for performance.