openGl GPU optimization when rendering stacks of primitives - opengl

If I render 15 fully opaque quads with the same size on top of each other, depth test disabled, is the GPU hardware /software cleaver enough just to process the topmost quad and discard the other vertices/fragments? Or would one benefit from using the stencil buffer to achieve the same effect?

Most GPUs will overdraw in this scenario which will be very bad for performance if your quads are large. Rather than use the stencil buffer, the best way to optimise is probably to enable depth testing, assign appropriate depth values and render your quads front to back.
However, under certain conditions (e.g. no blending) tile based deferred rendering (TBDR) GPUs common in many mobile devices (particularly PowerVR devices used by all iOS devices and many Android devices) will do a process known as hidden surface removal (HSR) which will optimize this case and avoid rendering the pixels that will be obscured.

Definitely, it will generate fragments for all of the opaque quads. Also, if you disable the depth test, you may see the back surface on the screen. Because the depth testing is disable and whoever is rendered last will draw the screen.
Even, if you use the stencil buffer the fragments are still generated for the quads, pass through stencil and depth tests.

Related

Optimize for expensive fragment shader

I'm rendering multiple layers of flat triangles with a raytracer in the fragment shader. The upper layers have holes, and I'm looking for a way how I can avoid running the shader for pixels that are filled already by one of the upper layers, i.e. I want only the parts of the lower layers rendered that lie in the holes in the upper layers. Of course, if there's a hole or not is not known unless the fragment shader did his thing for a layer.
As far as I understand, I cannot use early depth testing because there, the depth values are interpolated between the vertices and do not come from the fragment shader. Is there a way to "emulate" that behavior?
The best way to solve this issue is to not use layers. You are only using layers because of the limitations of using a 3D texture to store your scene data. So... don't do that.
SSBOs and buffer textures (if your hardware is too old for SSBOs) can access more memory than a 3D texture. And you could even employ manual swizzling of the data to improve cache locality if that is viable.
As far as I understand, I cannot use early depth testing because there, the depth values are interpolated between the vertices and do not come from the fragment shader.
This is correct insofar as you cannot use early depth tests, but it is incorrect as to why.
The "depth" provided by the VS doesn't need to be the depth of the actual fragment. You are rendering your scene in layers, presumably with each layer being a full-screen quad. By definition, everything in one layer of rendering is beneath everything in a lower layer. So the absolute depth value doesn't matter; what matters is whether there is something from a higher layer over this fragment.
So each layer could get its own depth value, with lower layers getting a lower depth value. The exact value is arbitrary and irrelevant; what matters is that higher layers have higher values.
The reason this doesn't work is this: if your raytracing algorithm detects a miss within a layer (a "hole"), you must discard that fragment. And the use of discard at all turns off early depth testing in most hardware, since the depth testing logic is usually tied to the depth writing logic (it is an atomic read/conditional-modify/conditional-write).

Is the stencil buffer still relevant in modern OpenGL?

Me and a friend have been having an ongoing argument about the stencil buffer. In short I haven't been able to find a situation where the stencil buffer would provide any advantage over the programmable pipeline tools in OpenGL 3.2+. Are there any uses to the stencil buffer in modern OpenGL?
[EDIT]
Thanks everyone for all the inputs on the subject.
It is more useful than ever since you can sample stencil index textures from fragment shaders. It should not even be argued that the stencil buffer is not part of the programmable pipeline.
The depth buffer is used for simple pass/fail fragment rejection, which the stencil buffer can also do as suggested in comments. However, the stencil buffer can also accumulate information about test results over multiple passes. All sorts of logic and counting applications exist such as measuring a scene's depth complexity, constructive solid geometry, etc.
To add a recent example to Andon's answer, GTA V uses the stencil buffer kinda like an ID buffer to mark the player character, cars, vegetation etc.
It subsequently uses the stencil buffer to e.g. apply subsurface scattering only to the character or exclude him from motion blur.
See the GTA V Graphics Study (highly recommended, it's a great read!)
Edit: sure you can do this in software. But you can do rasterization or tessellation in software just as well... In the end it's about performance I guess. With depth24stencil8 you have a nice hardware-supported format, and the stencil test is most likely faster then doing discards in the fragment shader.
Just to provide one other use case, shadow volumes (aka "stencil shadows") are still very relevant: https://en.wikipedia.org/wiki/Shadow_volume
They're useful for indoor scenes where shadows are supposed to be pixel perfect, and you're less likely to have alpha-tested foliage messing up the extruded shadow volumes.
It's true that shadow maps are more common, but I suspect that stencil shadows will have a comeback once the brain dead Createive/3DLabs patent expires on the zfail method.

Partially render a 3D scene

I want to partially render a 3D scene, by this I mean I want to render some pixels and skip others. There are many non-realtime renderers that allow selecting a section that you want to render.
Example, fully rendered image (all pixels rendered) vs partially rendered:
I want to make the renderer not render part of a scene, in this case the renderer would just skip rendering these areas and save resources (memory/CPU).
If it's not possible to do in OpenGL, can someone suggest any other open source renderer, it could even be a software renderer.
If you're talking about rendering rectangular subportions of a display, you'd use glViewport and adjust your projection appropriately.
If you want to decide whether to render or not per pixel, especially with the purely fixed pipeline, you'd likely use a stencil buffer. That does exactly much the name says — you paint as though spraying through a stencil. It's a per-pixel mask, reliably at least 8 bits per pixel, and has supported in hardware for at least the last fifteen years. Amongst other uses, it used to be how you could render a stipple without paying for the 'professional' cards that officially supported glStipple.
With GLSL there is also the discard statement that immediately ends processing of a fragment and produces no output. The main caveat is that on some GPUs — especially embedded GPUs — the advice is to prefer returning any colour with an alpha of 0 (assuming that will have no effect according to your blend mode) if you can avoid a conditional by doing so. Conditionals and discards otherwise can have a strong negative effect on parallelism as fragment shaders are usually implemented by SIMD units doing multiple pixels simultaneously, so any time that a shader program look like they might diverge there can be a [potentially unnecessary] splitting of tasks. Very GPU dependent stuff though, so be sure to profile in real life.
EDIT: as pointed out in the comments, using a scissor rectangle would be smarter than adjusting the viewport. That both means you don't have to adjust your projection and, equally, that rounding errors in any adjustment can't possibly create seams.
It's also struck me that an alternative to using the stencil for a strict binary test is to pre-populate the z-buffer with the closest possible value on pixels you don't want redrawn; use the colour mask to draw to the depth buffer only.
You can split the scene and render it in parts - this way you will render with less memory consumption and you can simply skip unnecessary parts or regions. Also read this

Preventing Z-fighting on coplanar polygons?

I'm trying to deal with Z-fighting on co-planar polygons in an OpenGL based renderer. Due to legacy issues I have my hands tied in many regards. Mainly the renderer has to be myopic due to how the data flow works within the older systems.
Users are able to draw geometry in arbitrary positions and I have no real method of detecting when they've decided to overlap two polygons. With glPolygonOffset I'd need to have that knowledge to offset on or the other of the polygons. I'm also fairly sure playing with the projection matrix won't help matters as the Z-fighting is coming from round off error due to the co-planar nature of the data. Turning depth-writing on and off isn't really an option either as I don't know when this problem is about to occur during the render loop.
So suggestions?
The Red Book has some. For example you could first render all the coplanar polygons into the stencil buffer, without touching the color buffer (i.e. glColorMask(0,0,0,0)) but doing Z tests and updating the depth buffer. Then turn of depth testing and depth writes, enable color writes and stencil test and render the polygons in their layering order; the stencil buffer will apply the result of the earlier depth test preparations.

My own z-buffer

How I can make my own z-buffer for correct blending alpha channels? I'm using glsl.
I have only one idea. And this is use 2 "buffers", one of them storing depth-component and another color (with alpha channel). I don't need access to buffer in my program. I cant use uniform array because glsl have a restriction for the number of uniforms variables. I cant use FBO because behaviour for sometime writing and reading Frame Buffer is not defined (and dont working at any cards).
How I can resolve this problem?!
Or how to read actual real time z-buffer from glsl? (I mean for each fragment shader call z-buffer must be updated)
How I can make my own z-buffer for correct blending alpha channels?
That's not possible. For perfect order-independent transparency you must get rid of z-buffer and replace it with another mechanism for hidden surface removal.
With z-buffer there are two possible ways to tackle the problem.
Multi-layered z-buffer (impractical with hardware acceleration) - basically it'll store several layers of "depth" values and will use it for blending transparent surfaces. Will hog a lot of memory, and there will be maximum number of transparent overlayying surfaces, once you're over the limit, there will be artifacts.
Depth peeling (google it). Order independent transparency, but there's a limit for maximum number of "overlaying" transparent polygons per pixel. Can actually be implemented on hardware.
Both approaches will have a limit (maximum number of overlapping transparent polygons per pixel), once you go over the limit, scene will no longer render properly. Which means the whole thing rather useless.
What you could actually do (to get perfect solution) is to remove the zbuffer completely, and make a graphic rendering pipeline that will gather all polygons to be rendered, clip them, split them (when two polygons intersect), sort them and then paint them on screen in correct order to ensure that you'll get correct result. However, this is hard, and doing it with hardware acceleration is harder. I think (I'm not completely certain it happened) 5 ot 6 years ago some ATI GPU-related document mentioned that some of their cards could render correct scene with Z-Buffer disabled by enabling some kind of extension. However, they didn't say a thing about alpha-blending. I haven't heard about this feature since. Perhaps it didn't become popular and shared the fate of TruForm (forgotten). Also such rendering pipeline will not be able to some things that are possible on z-buffer
If it's order-independent transparencies you're after then the fundamental problem is that a depth buffer stores on depth per pixel but if you're composing a view of partially transparent geometry then more than one fragment contributes to each pixel.
If you were to solve the problem robustly you'd need an ordered list of depths per pixel, going back to the closest opaque fragment. You'd then walk the list in reverse order. In practice OpenGL doesn't do things like variably sized arrays so people achieve pretty much that by drawing their geometry in back-to-front order.
An alternative embodied by GL_SAMPLE_ALPHA_TO_COVERAGE is to switch to screen-door transparency, which is indistinguishable from real transparency either at a really high resolution or with multisampling. Ideally you'd do that stochastically, but that would void the OpenGL rule of repeatability. Nevertheless since you're in GLSL you can do it for yourself. Your sampler simply takes the input alpha and uses that as the probability that it'll output the final pixel. So grab a random value in the range 0.0 to 1.0 from somewhere and if it's greater than the alpha then discard the pixel. Always output with an alpha of 1.0 and just use the normal depth buffer. Answers like this say a bit more on what you can do to get randomish numbers in GLSL, and obviously you want to turn multisampling up as high as possible.
Eric Enderton has written a decent paper (which has a slide version) on stochastic order-independent transparency that goes alongside a DirectX implementation that's worth checking out.