How to write integers alongside pixels in the framebuffer, and then use the written integer to ignore the depth buffer - opengl

What I want to do
I want to have a set triangles bleed through, or rather ignore the depth buffer, for another set triangles, but only if they have the same number.
Problem (optional reading)
I do not know how to do this without introducing a ton of bubbles into the pipeline. Right now I have very high throughput because I can throw my geometry onto the GPU, tell it to render, and forget about it. However, if I have to keep toggling the state when drawing, I'm worried I'm going to tank my performance. Other people who have done what I've just said (doing a ton of draw calls and state changes) have much worse performance than me. This performance hit is also significantly worse on older hardware, where we are talking on order of 50 - 100+ times performance loss by doing it the state-change way.
Unfortunately this triangle bleeding scenario happens many thousands of times, so the state machine will be getting flooded with "draw triangles, depth off, draw triangles that bleed through, depth on, ...", except N times, where N can get large (N >= 1000).
A good way of imagining this is having a set of triangles T_i, and a set of triangles that bleed through B_i where B_i only bleeds through T_i, and i ranges from 0...1000+. Note that if we are drawing B_100, then it should only bleed through T_100, not T_99 or T_101.
My next thought is to draw all the triangles with their integer into one framebuffer (along with the integer), then draw the bleed through triangles into another framebuffer (also with the integer), and then merge these framebuffers together. I figure they will have the color, depth, and the integer, so I can hopefully merge them in the fragment shader.
Problem is, I have no idea how to write an integer alongside the out vec4 fragColor in the fragment shader.
Questions (and in short)
This leaves me with two questions:
How do I write an integer into a framebuffer? Do I need to write to 4 separate texture framebuffers? (like one color/depth framebuffer texture, another integer framebuffer texture, and then double this so I can merge the pairs of framebuffers together at some point?)
To make this more clear, the algorithm would look like
Render all the 'could be bled from triangles', described above as set T_i,
write colors and depth info into FB1, and write integers into FB2
Render all the 'bleeding' triangles, described above as set B_i,
write colors and depth into FB3, and write integers to FB4
Bind the textures for FB1, FB2, FB3, FB4
Render each pixel by sampling the RGBA, depth, and integers
from the appropriate texture and write those out into the
final framebuffer
I would need to access the color and depth from the textures in the shader. I would also need to access the integer from the other texture. Then I can do the comparison and choose which pixel to write to the default framebuffer.
Is this idea possible? I assume if (1) is, then the answer is yes. Maybe another question could be whether there's a better way. I tried thinking of doing this with the stencil buffer but had no luck

What you want is theoretically possible, but I can't speak as to its performance. You'll be reading and writing a whole lot of texels in a lot of textures for every program iteration.
Anyway to answer your questions:
A framebuffer can have multiple color attachments by using glFramebufferTexture2D with GL_COLOR_ATTACHMENT0, GL_COLOR_ATTACHMENT1, etc. Each texture can then have its own internal format, in your example you probably want a regular RGB texture for your color output, and a second 1-integer only texture.
Your depth buffer is complicated, because you don't want to let OpenGL handle it as normal. If you want to take over the depth buffer, you probably want to attach it as yet another, float texture that you can check against or not your screen-space fragments.
If you have doubts about your shader, remember that you can bind the any number of textures as input samplers you program in code, and each color bind gets its own output value (your shader runs per-texel, so you output one value at a time). Make sure the format of your output is correct, ie vec3/vec4 for the color buffer, int for your integer buffer and float for the float buffer.
And stencil buffers won't help you turn depth checking on or off in a single (possibly indirect) draw call. I can't visualize what your bleeding thing means, but it can probably help with that? Maybe? But definitely not conditional depth checking.

Related

OpenGL trim/inline contour of stencil

I have created a shape in my stencil buffer (black in the picture below). Now I would like to render to the backbuffer. I would like one texture on the outer pixels (say 4 pixels) of my stencil (red), and an other texture on the remaining pixels (red).
I have read several solutions that involve scaling, but that will not work when there is no obvious center of the shape.
How do I acquire the desired effect?
The stencil buffer works great for doing operations on the specific fragments being overlaid onto them. However, it's not so great for doing operations that require looking at pixels other than the one corresponding to the fragment being rendered. In order to do outlining, you have to ask about the values of neighboring pixels, which stencil operations don't allow.
So, if it is possible to put the stencil data you want to test against in a non-stencil format image (ie: a color image, maybe with an integer texture format), that would make things much simpler. You can do the effect of stencil discarding by using discard directly in the fragment shader. Since you can fetch arbitrarily from the texture (as long as you're not trying to modify it), you can fetch neighboring pixels and test their values. You can use that to identify when a fragment is near a border.
However, if you're relying on specialized stencil operations to build the stencil data itself (like bitwise operations), then that's more complicated. You will have to employ stencil texturing operations, so you're going to have to render to an FBO texture that has a depth/stencil format. And you'll have to set it up to allow you to read from the stencil aspect of the texture. This is an OpenGL 4.3 feature.
This effectively converts it into an 8-bit unsigned integer texture. That allows you to play whatever games you need to. But if you want to use stencil tests to discard fragments, you will also need texture barrier functionality to allow you to read from an image that's attached to the current FBO. But you don't need to actually use the barrier, since you should mask off stencil writing. You just need GL 4.5 or the NV/ARB_texture_barrier extension to be available, which they widely are.
Either way this happens, the biggest difficulty is going to be varying the size of the border. It is easy to just test the neighboring 9 pixels to see if it is at a border. But the larger the border size, the larger the area of pixels each fragment has to test. At that point, I would suggest trying to look for a different solution, one that is based on some knowledge of what pattern is being written into the stencil buffer.
That is, if the rendering operation that lays down the stencil has some knowledge of the shape, then it could compute a distance to the edge of the shape in some way. This might require constructing the geometry in a way that it has distance information in it.

(Modern) OpenGL Different Colored Faces on a Cube - Using Shaders

A cube with different colored faces in intermediate mode is very simple. But doing this same thing with shaders seems to be quite a challenge.
I have read that in order to create a cube with different coloured faces, I should create 24 vertices instead of 8 vertices for the cube - in other words, (I visualies this as 6 squares that don't quite touch).
Is perhaps another (better?) solution to texture the faces of the cube using a real simple texture a flat color - perhaps a 1x1 pixel texture?
My texturing idea seems simpler to me - from a coder's point of view.. but which method would be the most efficient from a GPU/graphic card perspective?
I'm not sure what your overall goal is (e.g. what you're learning to do in the long term), but generally for high performance applications (e.g. games) your goal is to reduce GPU load. Every time you switch certain states (e.g. change textures, render targets, shader uniform values, etc..) the GPU stalls reconfiguring itself to meet your demands.
So, you can pass in a 1x1 pixel texture for each face, but then you'd need six draw calls (usually not so bad, but there is some prep work and potential cache misses) and six texture sets (can be very bad, often as bad as changing shader uniform values).
Suppose you wanted to pass in one texture and use that as a texture map for the cube. This is a little less trivial than it sounds -- you need to express each texture face on the texture in a way that maps to the vertices. Often you need to pass in a texture coordinate for each vertex, and due to the spacial configuration of the texture this normally doesn't end up meaning one texture coordinate for one spatial vertex.
However, if you use an environmental/reflection map, the complexities of mapping are handled for you. In this way, you could draw a single texture on all sides of your cube. (Or on your sphere, or whatever sphere-mapped shape you wanted.) I'm not sure I'd call this easier since you have to form the environmental texture carefully, and you still have to set a different texture for each new colors you want to represent -- or change the texture either via the GPU or in step with the GPU, and that's tricky and usually not performant.
Which brings us back to the canonical way of doing as you mentioned: use vertex values -- they're fast, you can draw many, many cubes very quickly by only specifying different vertex data, and it's easy to understand. It really is the best way, and how GPUs are designed to run quickly.
Additionally..
And yes, you can do this with just shaders... But it'd be ugly and slow, and the GPU would end up computing it per each pixel.. Pass the object space coordinates to the fragment shader, and in the fragment shader test which side you're on and output the corresponding color. Highly not recommended, it's not particularly easier, and it's definitely not faster for the GPU -- to change colors you'd again end up changing uniform values for the shaders.

My own z-buffer

How I can make my own z-buffer for correct blending alpha channels? I'm using glsl.
I have only one idea. And this is use 2 "buffers", one of them storing depth-component and another color (with alpha channel). I don't need access to buffer in my program. I cant use uniform array because glsl have a restriction for the number of uniforms variables. I cant use FBO because behaviour for sometime writing and reading Frame Buffer is not defined (and dont working at any cards).
How I can resolve this problem?!
Or how to read actual real time z-buffer from glsl? (I mean for each fragment shader call z-buffer must be updated)
How I can make my own z-buffer for correct blending alpha channels?
That's not possible. For perfect order-independent transparency you must get rid of z-buffer and replace it with another mechanism for hidden surface removal.
With z-buffer there are two possible ways to tackle the problem.
Multi-layered z-buffer (impractical with hardware acceleration) - basically it'll store several layers of "depth" values and will use it for blending transparent surfaces. Will hog a lot of memory, and there will be maximum number of transparent overlayying surfaces, once you're over the limit, there will be artifacts.
Depth peeling (google it). Order independent transparency, but there's a limit for maximum number of "overlaying" transparent polygons per pixel. Can actually be implemented on hardware.
Both approaches will have a limit (maximum number of overlapping transparent polygons per pixel), once you go over the limit, scene will no longer render properly. Which means the whole thing rather useless.
What you could actually do (to get perfect solution) is to remove the zbuffer completely, and make a graphic rendering pipeline that will gather all polygons to be rendered, clip them, split them (when two polygons intersect), sort them and then paint them on screen in correct order to ensure that you'll get correct result. However, this is hard, and doing it with hardware acceleration is harder. I think (I'm not completely certain it happened) 5 ot 6 years ago some ATI GPU-related document mentioned that some of their cards could render correct scene with Z-Buffer disabled by enabling some kind of extension. However, they didn't say a thing about alpha-blending. I haven't heard about this feature since. Perhaps it didn't become popular and shared the fate of TruForm (forgotten). Also such rendering pipeline will not be able to some things that are possible on z-buffer
If it's order-independent transparencies you're after then the fundamental problem is that a depth buffer stores on depth per pixel but if you're composing a view of partially transparent geometry then more than one fragment contributes to each pixel.
If you were to solve the problem robustly you'd need an ordered list of depths per pixel, going back to the closest opaque fragment. You'd then walk the list in reverse order. In practice OpenGL doesn't do things like variably sized arrays so people achieve pretty much that by drawing their geometry in back-to-front order.
An alternative embodied by GL_SAMPLE_ALPHA_TO_COVERAGE is to switch to screen-door transparency, which is indistinguishable from real transparency either at a really high resolution or with multisampling. Ideally you'd do that stochastically, but that would void the OpenGL rule of repeatability. Nevertheless since you're in GLSL you can do it for yourself. Your sampler simply takes the input alpha and uses that as the probability that it'll output the final pixel. So grab a random value in the range 0.0 to 1.0 from somewhere and if it's greater than the alpha then discard the pixel. Always output with an alpha of 1.0 and just use the normal depth buffer. Answers like this say a bit more on what you can do to get randomish numbers in GLSL, and obviously you want to turn multisampling up as high as possible.
Eric Enderton has written a decent paper (which has a slide version) on stochastic order-independent transparency that goes alongside a DirectX implementation that's worth checking out.

GLSL Interlacing

I would like to efficiently render in an interlaced mode using GLSL.
I can alrdy do this like:
vec4 background = texture2D(plane[5], gl_TexCoord[1].st);
if(is_even_row(gl_TexCoord[1].t))
{
vec4 foreground = get_my_color();
gl_FragColor = vec4(fore.rgb * foreground .a + background .rgb * (1.0-foreground .a), background .a + fore.a);
}
else
gl_FragColor = background;
However, as far as I have understood the nature of branching in GLSL is that both branches will actually be executed, since "even_row" is considered as run-time value.
Is there any trick I can use here in order to avoid unnecessarily calling the rather heavy function "get_color"? The behavior of is_even_row is quite static.
Or is there some other way to do this?
NOTE: glPolygonStipple will not work since I have custom blend functions in my GLSL code.
(comment to answer, as requested)
The problem with interlacing is that GPUs run shaders in 2x2 clusters, which means that you gain nothing from interlacing (a good software implementation might possibly only execute the actual pixels that are needed, unless you ask for partial derivatives).
At best, interlacing runs at the same speed, at worst it runs slower because of the extra work for the interlacing. Some years ago, there was an article in ShaderX4, which suggested interlaced rendering. I tried that method on half a dozen graphics cards (3 generations of hardware of each the "two big" manufacturers), and it ran slower (sometimes slightly, sometimes up to 50%) in every case.
What you could do is do all the expensive rendering in 1/2 the vertical resolution, this will reduce the pixel shader work (and texture bandwidth) by 1/2. You can then upscale the texture (GL_NEAREST), and discard every other line.
The stencil test can be used to discard pixels before the pixel shader is executed. Of course the hardware still runs shaders in 2x2 groups, so in this pass you do not gain anything. However, that does not matter if it's just the very last pass, which is a trivial shader writing out a single fetched texel. The more costly composition shaders (the ones that matter!) run at half resolution.
You find a detailled description including code here: fake dynamic branching. This demo avoids lighting pixels by discarding those that are outside the light's range using the stencil.
Another way which does not need the stencil buffer is to use "explicit Z culling". This may in fact be even easier and faster.
For this, clear Z, disable color writes (glColorMask), and draw a fullscreen quad whose vertices have some "close" Z coordinate, and have the shader kill fragments in every odd line (or use the deprecated alpha test if you want, or whatever). gl_FragCoord.y is a very simple way of knowing which line to kill, using a small texture that wraps around would be another (if you must use GLSL 1.0).
Now draw another fullscreen quad with "far away" Z values in the vertices (and with depth test, of course). Simply fetch your half-res texture (GL_NEAREST filtering), and write it out. Since the depth buffer has a value that is "closer" in every other row, it will discard those pixels.
How does glPolygonStipple compare to this? Polygon stipple is a deprecated feature, because it is not directly supported by the hardware and has to be emulated by the driver either by "secretly" rewriting the shader to include extra logic or by falling back to software.
This is probably not the right way to do interlacing. If you really need to achieve this effect, don't do it in the fragment shader like this. Instead, here is what you could do:
Initialize a full screen 1-bit stencil buffer, where each bit stores the parity of its corresponding row.
Render your scene like usual to a temporary FBO with 1/2 the vertical resoltion.
Turn on the stencil test, and switch the stencil func depending on which set of scan lines you are going to draw.
Blit a rescaled version of the aforementioned fbo (containing the contents of your frame) to the stencil buffer.
Note that you could skip the offscreen FBO step and draw directly using the stencil buffer, but this would waste some fill rate testing those pixels that are just going to clipped anyway. If your program is shader heavy, the solution I just mentioned would be optimal. If it is not, you may end up being marginally better off drawing directly to the screen.

Count image similarity on GPU [OpenGL/OcclusionQuery]

OpenGL. Let's say I've drawn one image and then the second one using XOR. Now I've got black buffer with non-black pixels somewhere, I've read that I can use shaders to count black [ rgb(0,0,0) ] pixels ON GPU?
I've also read that it has to do something with OcclusionQuery.
http://oss.sgi.com/projects/ogl-sample/registry/ARB/occlusion_query.txt
Is it possible and how? [any programming language]
If you've got other idea on how to find similarity via OpenGL/GPU - that would be great too.
I'm not sure how you do the XOR bit (at least it should be slow; I don't think any of current GPUs accelerate that), but here's my idea:
have two input images
turn on occlusion query.
draw the two images to the screen (i.e. full screen quad with two textures set up), with a fragment shader that computes abs(texel1-texel2), and kills the pixel (discard in GLSL) if the pixels are the same (difference is zero or below some threshold). Easiest is probably just using a GLSL fragment shader, and there you just read two textures, compute abs() of the difference and discard the pixel. Very basic GLSL knowledge is enough here.
get number of pixels that passed the query. For pixels that are the same, the query won't pass (pixels will be discarded by the shader), and for pixels that are different, the query will pass.
At first I though of a more complex approach that involves depth buffer, but then realized that just killing pixels should be enough. Here's my original though (but the above one is simpler and more efficient):
have two input images
clear screen and depth buffer
draw the two images to the screen (i.e. full screen quad with two textures set up), with a fragment shader that computes abs(texel1-texel2), and kills the pixel (discard in GLSL) if the pixels are different. Draw the quad so that it's depth buffer value is something close to near plane.
after this step, depth buffer will contain small depth values for pixels that are the same, and large (far plane) depth values for pixels that are different.
turn on occlusion query, and draw another full screen quad with depth closer than far plane, but larger than the previous quad.
get number of pixels that passed the query. For pixels that are the same, the query won't pass (depth buffer is already closer), and for pixels that are different, the query will pass. You'd use SAMPLES_PASSED_ARB to get this. There's an occlusion query example at CodeSampler.com to get your started.
Of course all this requires GPU with occlusion query support. Most GPUs since 2002 or so do support that, with exception of some low-end ones (in particular, Intel 915 (aka GMA 900) and Intel 945 (aka GMA 950)).