GLSL Interlacing - c++

I would like to efficiently render in an interlaced mode using GLSL.
I can alrdy do this like:
vec4 background = texture2D(plane[5], gl_TexCoord[1].st);
if(is_even_row(gl_TexCoord[1].t))
{
vec4 foreground = get_my_color();
gl_FragColor = vec4(fore.rgb * foreground .a + background .rgb * (1.0-foreground .a), background .a + fore.a);
}
else
gl_FragColor = background;
However, as far as I have understood the nature of branching in GLSL is that both branches will actually be executed, since "even_row" is considered as run-time value.
Is there any trick I can use here in order to avoid unnecessarily calling the rather heavy function "get_color"? The behavior of is_even_row is quite static.
Or is there some other way to do this?
NOTE: glPolygonStipple will not work since I have custom blend functions in my GLSL code.

(comment to answer, as requested)
The problem with interlacing is that GPUs run shaders in 2x2 clusters, which means that you gain nothing from interlacing (a good software implementation might possibly only execute the actual pixels that are needed, unless you ask for partial derivatives).
At best, interlacing runs at the same speed, at worst it runs slower because of the extra work for the interlacing. Some years ago, there was an article in ShaderX4, which suggested interlaced rendering. I tried that method on half a dozen graphics cards (3 generations of hardware of each the "two big" manufacturers), and it ran slower (sometimes slightly, sometimes up to 50%) in every case.
What you could do is do all the expensive rendering in 1/2 the vertical resolution, this will reduce the pixel shader work (and texture bandwidth) by 1/2. You can then upscale the texture (GL_NEAREST), and discard every other line.
The stencil test can be used to discard pixels before the pixel shader is executed. Of course the hardware still runs shaders in 2x2 groups, so in this pass you do not gain anything. However, that does not matter if it's just the very last pass, which is a trivial shader writing out a single fetched texel. The more costly composition shaders (the ones that matter!) run at half resolution.
You find a detailled description including code here: fake dynamic branching. This demo avoids lighting pixels by discarding those that are outside the light's range using the stencil.
Another way which does not need the stencil buffer is to use "explicit Z culling". This may in fact be even easier and faster.
For this, clear Z, disable color writes (glColorMask), and draw a fullscreen quad whose vertices have some "close" Z coordinate, and have the shader kill fragments in every odd line (or use the deprecated alpha test if you want, or whatever). gl_FragCoord.y is a very simple way of knowing which line to kill, using a small texture that wraps around would be another (if you must use GLSL 1.0).
Now draw another fullscreen quad with "far away" Z values in the vertices (and with depth test, of course). Simply fetch your half-res texture (GL_NEAREST filtering), and write it out. Since the depth buffer has a value that is "closer" in every other row, it will discard those pixels.
How does glPolygonStipple compare to this? Polygon stipple is a deprecated feature, because it is not directly supported by the hardware and has to be emulated by the driver either by "secretly" rewriting the shader to include extra logic or by falling back to software.

This is probably not the right way to do interlacing. If you really need to achieve this effect, don't do it in the fragment shader like this. Instead, here is what you could do:
Initialize a full screen 1-bit stencil buffer, where each bit stores the parity of its corresponding row.
Render your scene like usual to a temporary FBO with 1/2 the vertical resoltion.
Turn on the stencil test, and switch the stencil func depending on which set of scan lines you are going to draw.
Blit a rescaled version of the aforementioned fbo (containing the contents of your frame) to the stencil buffer.
Note that you could skip the offscreen FBO step and draw directly using the stencil buffer, but this would waste some fill rate testing those pixels that are just going to clipped anyway. If your program is shader heavy, the solution I just mentioned would be optimal. If it is not, you may end up being marginally better off drawing directly to the screen.

Related

How to write integers alongside pixels in the framebuffer, and then use the written integer to ignore the depth buffer

What I want to do
I want to have a set triangles bleed through, or rather ignore the depth buffer, for another set triangles, but only if they have the same number.
Problem (optional reading)
I do not know how to do this without introducing a ton of bubbles into the pipeline. Right now I have very high throughput because I can throw my geometry onto the GPU, tell it to render, and forget about it. However, if I have to keep toggling the state when drawing, I'm worried I'm going to tank my performance. Other people who have done what I've just said (doing a ton of draw calls and state changes) have much worse performance than me. This performance hit is also significantly worse on older hardware, where we are talking on order of 50 - 100+ times performance loss by doing it the state-change way.
Unfortunately this triangle bleeding scenario happens many thousands of times, so the state machine will be getting flooded with "draw triangles, depth off, draw triangles that bleed through, depth on, ...", except N times, where N can get large (N >= 1000).
A good way of imagining this is having a set of triangles T_i, and a set of triangles that bleed through B_i where B_i only bleeds through T_i, and i ranges from 0...1000+. Note that if we are drawing B_100, then it should only bleed through T_100, not T_99 or T_101.
My next thought is to draw all the triangles with their integer into one framebuffer (along with the integer), then draw the bleed through triangles into another framebuffer (also with the integer), and then merge these framebuffers together. I figure they will have the color, depth, and the integer, so I can hopefully merge them in the fragment shader.
Problem is, I have no idea how to write an integer alongside the out vec4 fragColor in the fragment shader.
Questions (and in short)
This leaves me with two questions:
How do I write an integer into a framebuffer? Do I need to write to 4 separate texture framebuffers? (like one color/depth framebuffer texture, another integer framebuffer texture, and then double this so I can merge the pairs of framebuffers together at some point?)
To make this more clear, the algorithm would look like
Render all the 'could be bled from triangles', described above as set T_i,
write colors and depth info into FB1, and write integers into FB2
Render all the 'bleeding' triangles, described above as set B_i,
write colors and depth into FB3, and write integers to FB4
Bind the textures for FB1, FB2, FB3, FB4
Render each pixel by sampling the RGBA, depth, and integers
from the appropriate texture and write those out into the
final framebuffer
I would need to access the color and depth from the textures in the shader. I would also need to access the integer from the other texture. Then I can do the comparison and choose which pixel to write to the default framebuffer.
Is this idea possible? I assume if (1) is, then the answer is yes. Maybe another question could be whether there's a better way. I tried thinking of doing this with the stencil buffer but had no luck
What you want is theoretically possible, but I can't speak as to its performance. You'll be reading and writing a whole lot of texels in a lot of textures for every program iteration.
Anyway to answer your questions:
A framebuffer can have multiple color attachments by using glFramebufferTexture2D with GL_COLOR_ATTACHMENT0, GL_COLOR_ATTACHMENT1, etc. Each texture can then have its own internal format, in your example you probably want a regular RGB texture for your color output, and a second 1-integer only texture.
Your depth buffer is complicated, because you don't want to let OpenGL handle it as normal. If you want to take over the depth buffer, you probably want to attach it as yet another, float texture that you can check against or not your screen-space fragments.
If you have doubts about your shader, remember that you can bind the any number of textures as input samplers you program in code, and each color bind gets its own output value (your shader runs per-texel, so you output one value at a time). Make sure the format of your output is correct, ie vec3/vec4 for the color buffer, int for your integer buffer and float for the float buffer.
And stencil buffers won't help you turn depth checking on or off in a single (possibly indirect) draw call. I can't visualize what your bleeding thing means, but it can probably help with that? Maybe? But definitely not conditional depth checking.

Partially render a 3D scene

I want to partially render a 3D scene, by this I mean I want to render some pixels and skip others. There are many non-realtime renderers that allow selecting a section that you want to render.
Example, fully rendered image (all pixels rendered) vs partially rendered:
I want to make the renderer not render part of a scene, in this case the renderer would just skip rendering these areas and save resources (memory/CPU).
If it's not possible to do in OpenGL, can someone suggest any other open source renderer, it could even be a software renderer.
If you're talking about rendering rectangular subportions of a display, you'd use glViewport and adjust your projection appropriately.
If you want to decide whether to render or not per pixel, especially with the purely fixed pipeline, you'd likely use a stencil buffer. That does exactly much the name says — you paint as though spraying through a stencil. It's a per-pixel mask, reliably at least 8 bits per pixel, and has supported in hardware for at least the last fifteen years. Amongst other uses, it used to be how you could render a stipple without paying for the 'professional' cards that officially supported glStipple.
With GLSL there is also the discard statement that immediately ends processing of a fragment and produces no output. The main caveat is that on some GPUs — especially embedded GPUs — the advice is to prefer returning any colour with an alpha of 0 (assuming that will have no effect according to your blend mode) if you can avoid a conditional by doing so. Conditionals and discards otherwise can have a strong negative effect on parallelism as fragment shaders are usually implemented by SIMD units doing multiple pixels simultaneously, so any time that a shader program look like they might diverge there can be a [potentially unnecessary] splitting of tasks. Very GPU dependent stuff though, so be sure to profile in real life.
EDIT: as pointed out in the comments, using a scissor rectangle would be smarter than adjusting the viewport. That both means you don't have to adjust your projection and, equally, that rounding errors in any adjustment can't possibly create seams.
It's also struck me that an alternative to using the stencil for a strict binary test is to pre-populate the z-buffer with the closest possible value on pixels you don't want redrawn; use the colour mask to draw to the depth buffer only.
You can split the scene and render it in parts - this way you will render with less memory consumption and you can simply skip unnecessary parts or regions. Also read this

GLSL Shaders: blending, primitive-specific behavior, and discarding a vertex

Criteria: I’m using OpenGL with shaders (GLSL) and trying to stay with modern techniques (e.g., trying to stay away from deprecated concepts).
My questions, in a very general sense--see below for more detail—are as follows:
Do shaders allow you to do custom blending that help eliminate z-order transparency issues found when using GL_BLEND?
Is there a way for a shader to know what type of primitive is being drawn without “manually” passing it some sort of flag?
Is there a way for a shader to “ignore” or “discard” a vertex (especially when drawing points)?
Background: My application draws points connected with lines in an ortho projection (vertices have varying depth in the projection). I’ve only recently started using shaders in the project (trying to get away from deprecated concepts). I understand that standard blending has ordering issues with alpha testing and depth testing: basically, if a “translucent” pixel at a higher z level is drawn first (thus blending with whatever colors were already drawn to that pixel at a lower z level), and an opaque object is then drawn at that pixel but at a lower z level, depth testing prevents changing the pixel that was already drawn for the “higher” z level, thus causing blending issues. To overcome this, you need to draw opaque items first, then translucent items in ascending z order. My gut feeling is that shaders wouldn’t provide an (efficient) way to change this behavior—am I wrong?
Further, for speed and convenience, I pass information for each vertex (along with a couple of uniform variables) to the shaders and they use the information to find a subset of the vertices that need special attention. Without doing a similar set of logic in the app itself (and slowing things down) I can’t know a priori what subset of vericies that is. Thus I send all vertices to the shader. However, when I draw “points” I’d like the shader to ignore all the vertices that aren’t in the subset it determines. I think I can get the effect by setting alpha to zero and using an alpha function in the GL context that will prevent drawing anything with alpha less than, say, 0.01. However, is there a better or more “correct” glsl way for a shader to say “just ignore this vertex”?
Do shaders allow you to do custom blending that help eliminate z-order transparency issues found when using GL_BLEND?
Sort of. If you have access to GL 4.x-class hardware (Radeon HD 5xxx or better, or GeForce 4xx or better), then you can perform order-independent transparency. Earlier versions have techniques like depth peeling, but they're quite expensive.
The GL 4.x-class version uses essentially a series of "linked lists" of transparent samples, which you do a full-screen pass to resolve into the final sample color. It's not free of course, but it isn't as expensive as other OIT methods. How expensive it would be for your case is uncertain; it is proportional to how many overlapping pixels you have.
You still have to draw opaque stuff first, and you have to draw transparent stuff using special shader code.
Is there a way for a shader to know what type of primitive is being drawn without “manually” passing it some sort of flag?
No.
Is there a way for a shader to “ignore” or “discard” a vertex (especially when drawing points)?
No in general, but yes for points. A Geometry shader can conditionally emit vertices, thus allowing you to discard any vertex for arbitrary reasons.
Discarding a vertex in non-point primitives is possible, but it will also affect the interpretation of that primitive. The reason it's simple for points is because a vertex is a primitive, while a vertex in a triangle isn't a whole primitive. You can discard lines, but discarding a vertex within a line is... of dubious value.
That being said, your explanation for why you want to do this is of dubious merit. You want to update vertex data with essentially a boolean value that says "do stuff with me" or not to. That means that, every frame, you have to modify your data to say which points should be rendered and which shouldn't.
The simplest and most efficient way to do this is to simply not render with them. That is, arrange your data so that the only thing on the GPU are the points you want to render. Thus, there's no need to do anything special at all. If you're going to be constantly updating your vertex data, then you're already condemned to dealing with streaming vertex data. So you may as well stream it in a way that makes rendering efficient.

My own z-buffer

How I can make my own z-buffer for correct blending alpha channels? I'm using glsl.
I have only one idea. And this is use 2 "buffers", one of them storing depth-component and another color (with alpha channel). I don't need access to buffer in my program. I cant use uniform array because glsl have a restriction for the number of uniforms variables. I cant use FBO because behaviour for sometime writing and reading Frame Buffer is not defined (and dont working at any cards).
How I can resolve this problem?!
Or how to read actual real time z-buffer from glsl? (I mean for each fragment shader call z-buffer must be updated)
How I can make my own z-buffer for correct blending alpha channels?
That's not possible. For perfect order-independent transparency you must get rid of z-buffer and replace it with another mechanism for hidden surface removal.
With z-buffer there are two possible ways to tackle the problem.
Multi-layered z-buffer (impractical with hardware acceleration) - basically it'll store several layers of "depth" values and will use it for blending transparent surfaces. Will hog a lot of memory, and there will be maximum number of transparent overlayying surfaces, once you're over the limit, there will be artifacts.
Depth peeling (google it). Order independent transparency, but there's a limit for maximum number of "overlaying" transparent polygons per pixel. Can actually be implemented on hardware.
Both approaches will have a limit (maximum number of overlapping transparent polygons per pixel), once you go over the limit, scene will no longer render properly. Which means the whole thing rather useless.
What you could actually do (to get perfect solution) is to remove the zbuffer completely, and make a graphic rendering pipeline that will gather all polygons to be rendered, clip them, split them (when two polygons intersect), sort them and then paint them on screen in correct order to ensure that you'll get correct result. However, this is hard, and doing it with hardware acceleration is harder. I think (I'm not completely certain it happened) 5 ot 6 years ago some ATI GPU-related document mentioned that some of their cards could render correct scene with Z-Buffer disabled by enabling some kind of extension. However, they didn't say a thing about alpha-blending. I haven't heard about this feature since. Perhaps it didn't become popular and shared the fate of TruForm (forgotten). Also such rendering pipeline will not be able to some things that are possible on z-buffer
If it's order-independent transparencies you're after then the fundamental problem is that a depth buffer stores on depth per pixel but if you're composing a view of partially transparent geometry then more than one fragment contributes to each pixel.
If you were to solve the problem robustly you'd need an ordered list of depths per pixel, going back to the closest opaque fragment. You'd then walk the list in reverse order. In practice OpenGL doesn't do things like variably sized arrays so people achieve pretty much that by drawing their geometry in back-to-front order.
An alternative embodied by GL_SAMPLE_ALPHA_TO_COVERAGE is to switch to screen-door transparency, which is indistinguishable from real transparency either at a really high resolution or with multisampling. Ideally you'd do that stochastically, but that would void the OpenGL rule of repeatability. Nevertheless since you're in GLSL you can do it for yourself. Your sampler simply takes the input alpha and uses that as the probability that it'll output the final pixel. So grab a random value in the range 0.0 to 1.0 from somewhere and if it's greater than the alpha then discard the pixel. Always output with an alpha of 1.0 and just use the normal depth buffer. Answers like this say a bit more on what you can do to get randomish numbers in GLSL, and obviously you want to turn multisampling up as high as possible.
Eric Enderton has written a decent paper (which has a slide version) on stochastic order-independent transparency that goes alongside a DirectX implementation that's worth checking out.

Quick question about glColorMask and its work

I want to render depth buffer to do some nice shadow mapping. My drawing code though, consists of many shader switches. If I set glColorMask(0,0,0,0) and leave all shader programs, textures and others as they are, and just render the depth buffer, will it be 'OK' ? I mean, if glColorMask disables the "write of color components", does it mean that per-fragment shading IS NOT going to be performed?
For rendering a shadow map, you will normally want to bind a depth texture (preferrably square and power of two, because stereo drivers take this as hint!) to a FBO and use exactly one shader (as simple as possible) for everything. You do not want to attach a color buffer, because you are not interested in color at all, and it puts more unnecessary pressure on ROP (plus, some hardware can render double speed or more with depth-only). You do not want to switch between many shaders.
Depending on whether you do "classic" shadow mapping, or something more sophisticated such as exponential shadow maps, the shader that you will use is either as simple as it can be (constant color, and no depth write), or performs some (moderately complex) calculations on depth, but you normally do not want to perform any colour calculations, since that will mean needless calculations which will not be visible in any way.
No, the fragment operations will be performed anyway, but their result will be squashed by your zero color mask.
If you don't want some fragment operations to be performed - use the proper shader program which has an empty fragment shader attached and set the draw buffer to GL_NONE.
There is another way to disable fragment processing - to enable GL_RASTERIZER_DISCARD, but you won't get even the depth values in this case :)
No, the shader programs execute independent of the fixed function pipeline. Setting the glColorMask will have no effect on the shader programs.