Quick question about glColorMask and its work - c++

I want to render depth buffer to do some nice shadow mapping. My drawing code though, consists of many shader switches. If I set glColorMask(0,0,0,0) and leave all shader programs, textures and others as they are, and just render the depth buffer, will it be 'OK' ? I mean, if glColorMask disables the "write of color components", does it mean that per-fragment shading IS NOT going to be performed?

For rendering a shadow map, you will normally want to bind a depth texture (preferrably square and power of two, because stereo drivers take this as hint!) to a FBO and use exactly one shader (as simple as possible) for everything. You do not want to attach a color buffer, because you are not interested in color at all, and it puts more unnecessary pressure on ROP (plus, some hardware can render double speed or more with depth-only). You do not want to switch between many shaders.
Depending on whether you do "classic" shadow mapping, or something more sophisticated such as exponential shadow maps, the shader that you will use is either as simple as it can be (constant color, and no depth write), or performs some (moderately complex) calculations on depth, but you normally do not want to perform any colour calculations, since that will mean needless calculations which will not be visible in any way.

No, the fragment operations will be performed anyway, but their result will be squashed by your zero color mask.
If you don't want some fragment operations to be performed - use the proper shader program which has an empty fragment shader attached and set the draw buffer to GL_NONE.
There is another way to disable fragment processing - to enable GL_RASTERIZER_DISCARD, but you won't get even the depth values in this case :)

No, the shader programs execute independent of the fixed function pipeline. Setting the glColorMask will have no effect on the shader programs.


Can I use different shader programs for the same rendering job?

My question was unclear at first, I'll try to rephrase it:
How do I use different shaders to do different rendering operations on the same mesh polygons? For example, I want to add lighting using one shader and add fog using another shader. I need to use the color interpolated from the first shader in the calculation of the second shader, but I don't know how to do it if I can't (or rather not supposed to) pass around the color buffer between shaders.
Also (and that was where my question started), I need the same world-view-projection calculations for both shaders, so am I supposed to calculate it in every shader seperatly? Am I supposed to use one big shader for all my rendering operations?
Original question:
Say I have two different shader programs. The first one calculates the vertex positions in the vertex shader and does some operations in the fragment shader.
Let's say I want to use the fragment shader to do different calculations, but I still want to use the same vertex positions calculated by the first vertex shader. Do I have to calculate the vertex positions again or is there a way to share state between different shader programs?
you got more options:
multi pass
this one usually render the geometry into depth and "color" buffer first and then in next passes uses that as input textures for rendering single rectangle covering whole screen/view. Deferred shading is an example of this but there are many other implementations of effects that are not Deferred shading related. Here an example of multi pass:
In first pass the planets and stars and stuff is rendered, in second the atmosphere is added.
You can combine the passes either by blending or direct rendering. The direct rendering requires that you render to texture each pass and render in the last one. Blending is changing the color of the output in each pass.
single pass
what you describe is more like you should encode the different shaders as a functions for single fragment shader... Yes you can combine more shaders into single one if they are compatible and combine their results to final output color.
Big shader is a performance hit but I think it would be still faster than having multiple passes doing the same.
Take a look at this example:
this one computes enviromental reflection, lighting, geometry color and combines them together to single output color.
Exotic shaders
There are also exotic shaders that go around the pipeline limitations like this one:
Which are used for stuff that is believed to be not possible to implement in GL/GLSL pipeline. Anyway If the limitations are too binding you can still use compute shader...

I'm using some standard GLSL (version 120) vertex and fragment shaders to simulate LIDAR. In other words, instead of just returning a color at each x,y position (each pixel, via the fragment shader), it should return color and distance.
I suppose I don't actually need all of the color bits, since I really only want the intensity; so I could store the distance in gl_FragColor.b, for example, and use .rg for the intensity. But then I'm not entirely clear on how I get the value back out again.
Is there a simple way to return values from the fragment shader? I've tried varying, but it seems like the fragment shader can't write variables other than gl_FragColor.
I understand that some people use the GLSL pipeline for general-purpose (non-graphics) GPU processing, and that might be an option — except I still do want to render my objects normally.
OpenGL already returns this "distance calculation" via the depth buffer, although it's not linear. You can simply create a frame buffer object (FBO), attach colour and depth buffers, render to it, and you have the result sitting in the depth buffer (although you'll have to undo the depth transformation). This is the easiest option to program provided you are familiar with the depth calculations.
Another method, as you suggest, is storing the value in a colour buffer. You don't have to use the main colour buffer because then you'd lose your colour or have to render twice. Instead, attach a second render target (texture) to your FBO (GL_COLOR_ATTACHMENT1) and use gl_FragData[0] for normal colour and gl_FragData[1] for your distance (for newer GL versions you should be declaring out variables in the fragment shader). It depends on the precision you need, but you'll probably want to make the distance texture 32 bit float (GL_R32F and write to gl_FragData[1].r).
- This is a decent place to start: http://www.opengl.org/wiki/Framebuffer_Object
Yes, GLSL can be used for compute purposes. Especially with ARB_image_load_store and nvidia's bindless graphics. You even have access to shared memory via compute shaders (though I've never got one faster than 5 times slower). As #Jherico says, fragment shaders generally output to a single place in a framebuffer attachment/render target, and recent features such as image units (ARB_image_load_store) allow you to write to arbitrary locations from a shader. It's probably overkill and slower but you could also write your distances to a buffer via image units .
Finally, if you want the data back on the host (CPU accessible) side, use glGetTexImage with your distance texture (or glMapBuffer if you decided to use image units).
Fragment shaders output to a rendering buffer. If you want to use the GPU for computing and fetching data back into host memory you have a few options
Create a framebuffer and attach a texture to it to hold your data. Once the image has been rendered you can read back information from the texture into host memory.
Use an CUDA, OpenCL or an OpenGL compute shader to write the memory into an arbitrary bound buffer, and read back the buffer contents

GLSL Shaders: blending, primitive-specific behavior, and discarding a vertex

Criteria: I’m using OpenGL with shaders (GLSL) and trying to stay with modern techniques (e.g., trying to stay away from deprecated concepts).
My questions, in a very general sense--see below for more detail—are as follows:
Do shaders allow you to do custom blending that help eliminate z-order transparency issues found when using GL_BLEND?
Is there a way for a shader to know what type of primitive is being drawn without “manually” passing it some sort of flag?
Is there a way for a shader to “ignore” or “discard” a vertex (especially when drawing points)?
Background: My application draws points connected with lines in an ortho projection (vertices have varying depth in the projection). I’ve only recently started using shaders in the project (trying to get away from deprecated concepts). I understand that standard blending has ordering issues with alpha testing and depth testing: basically, if a “translucent” pixel at a higher z level is drawn first (thus blending with whatever colors were already drawn to that pixel at a lower z level), and an opaque object is then drawn at that pixel but at a lower z level, depth testing prevents changing the pixel that was already drawn for the “higher” z level, thus causing blending issues. To overcome this, you need to draw opaque items first, then translucent items in ascending z order. My gut feeling is that shaders wouldn’t provide an (efficient) way to change this behavior—am I wrong?
Further, for speed and convenience, I pass information for each vertex (along with a couple of uniform variables) to the shaders and they use the information to find a subset of the vertices that need special attention. Without doing a similar set of logic in the app itself (and slowing things down) I can’t know a priori what subset of vericies that is. Thus I send all vertices to the shader. However, when I draw “points” I’d like the shader to ignore all the vertices that aren’t in the subset it determines. I think I can get the effect by setting alpha to zero and using an alpha function in the GL context that will prevent drawing anything with alpha less than, say, 0.01. However, is there a better or more “correct” glsl way for a shader to say “just ignore this vertex”?
Do shaders allow you to do custom blending that help eliminate z-order transparency issues found when using GL_BLEND?
Sort of. If you have access to GL 4.x-class hardware (Radeon HD 5xxx or better, or GeForce 4xx or better), then you can perform order-independent transparency. Earlier versions have techniques like depth peeling, but they're quite expensive.
The GL 4.x-class version uses essentially a series of "linked lists" of transparent samples, which you do a full-screen pass to resolve into the final sample color. It's not free of course, but it isn't as expensive as other OIT methods. How expensive it would be for your case is uncertain; it is proportional to how many overlapping pixels you have.
You still have to draw opaque stuff first, and you have to draw transparent stuff using special shader code.
Is there a way for a shader to know what type of primitive is being drawn without “manually” passing it some sort of flag?
Is there a way for a shader to “ignore” or “discard” a vertex (especially when drawing points)?
No in general, but yes for points. A Geometry shader can conditionally emit vertices, thus allowing you to discard any vertex for arbitrary reasons.
Discarding a vertex in non-point primitives is possible, but it will also affect the interpretation of that primitive. The reason it's simple for points is because a vertex is a primitive, while a vertex in a triangle isn't a whole primitive. You can discard lines, but discarding a vertex within a line is... of dubious value.
That being said, your explanation for why you want to do this is of dubious merit. You want to update vertex data with essentially a boolean value that says "do stuff with me" or not to. That means that, every frame, you have to modify your data to say which points should be rendered and which shouldn't.
The simplest and most efficient way to do this is to simply not render with them. That is, arrange your data so that the only thing on the GPU are the points you want to render. Thus, there's no need to do anything special at all. If you're going to be constantly updating your vertex data, then you're already condemned to dealing with streaming vertex data. So you may as well stream it in a way that makes rendering efficient.

GLSL Interlacing

I would like to efficiently render in an interlaced mode using GLSL.
I can alrdy do this like:
vec4 background = texture2D(plane[5], gl_TexCoord[1].st);
vec4 foreground = get_my_color();
gl_FragColor = vec4(fore.rgb * foreground .a + background .rgb * (1.0-foreground .a), background .a + fore.a);
gl_FragColor = background;
However, as far as I have understood the nature of branching in GLSL is that both branches will actually be executed, since "even_row" is considered as run-time value.
Is there any trick I can use here in order to avoid unnecessarily calling the rather heavy function "get_color"? The behavior of is_even_row is quite static.
Or is there some other way to do this?
NOTE: glPolygonStipple will not work since I have custom blend functions in my GLSL code.
(comment to answer, as requested)
The problem with interlacing is that GPUs run shaders in 2x2 clusters, which means that you gain nothing from interlacing (a good software implementation might possibly only execute the actual pixels that are needed, unless you ask for partial derivatives).
At best, interlacing runs at the same speed, at worst it runs slower because of the extra work for the interlacing. Some years ago, there was an article in ShaderX4, which suggested interlaced rendering. I tried that method on half a dozen graphics cards (3 generations of hardware of each the "two big" manufacturers), and it ran slower (sometimes slightly, sometimes up to 50%) in every case.
What you could do is do all the expensive rendering in 1/2 the vertical resolution, this will reduce the pixel shader work (and texture bandwidth) by 1/2. You can then upscale the texture (GL_NEAREST), and discard every other line.
The stencil test can be used to discard pixels before the pixel shader is executed. Of course the hardware still runs shaders in 2x2 groups, so in this pass you do not gain anything. However, that does not matter if it's just the very last pass, which is a trivial shader writing out a single fetched texel. The more costly composition shaders (the ones that matter!) run at half resolution.
You find a detailled description including code here: fake dynamic branching. This demo avoids lighting pixels by discarding those that are outside the light's range using the stencil.
Another way which does not need the stencil buffer is to use "explicit Z culling". This may in fact be even easier and faster.
For this, clear Z, disable color writes (glColorMask), and draw a fullscreen quad whose vertices have some "close" Z coordinate, and have the shader kill fragments in every odd line (or use the deprecated alpha test if you want, or whatever). gl_FragCoord.y is a very simple way of knowing which line to kill, using a small texture that wraps around would be another (if you must use GLSL 1.0).
Now draw another fullscreen quad with "far away" Z values in the vertices (and with depth test, of course). Simply fetch your half-res texture (GL_NEAREST filtering), and write it out. Since the depth buffer has a value that is "closer" in every other row, it will discard those pixels.
How does glPolygonStipple compare to this? Polygon stipple is a deprecated feature, because it is not directly supported by the hardware and has to be emulated by the driver either by "secretly" rewriting the shader to include extra logic or by falling back to software.
This is probably not the right way to do interlacing. If you really need to achieve this effect, don't do it in the fragment shader like this. Instead, here is what you could do:
Initialize a full screen 1-bit stencil buffer, where each bit stores the parity of its corresponding row.
Render your scene like usual to a temporary FBO with 1/2 the vertical resoltion.
Turn on the stencil test, and switch the stencil func depending on which set of scan lines you are going to draw.
Blit a rescaled version of the aforementioned fbo (containing the contents of your frame) to the stencil buffer.
Note that you could skip the offscreen FBO step and draw directly using the stencil buffer, but this would waste some fill rate testing those pixels that are just going to clipped anyway. If your program is shader heavy, the solution I just mentioned would be optimal. If it is not, you may end up being marginally better off drawing directly to the screen.

GLSL: How to access nearby vertex colors? (bilinear interpolation without uniforms)

I'm trying to make bilinear color interpolation on a quad, i succeeded with the help of my previous question on here, but it has bad performance because its requires me to repeat glBegin() and glEnd() and 4 times glUniform() before glBegin().
The question is: is it anyhow possible to apply bilinear color interpolation on a quad like this:
glColor4f(...); glVertexAttrib2f(uv, 0, 0); glTexCoord2f(...); glVertex3f(...);
glColor4f(...); glVertexAttrib2f(uv, 1, 0); glTexCoord2f(...); glVertex3f(...);
glColor4f(...); glVertexAttrib2f(uv, 1, 1); glTexCoord2f(...); glVertex3f(...);
glColor4f(...); glVertexAttrib2f(uv, 0, 1); glTexCoord2f(...); glVertex3f(...);
... // here can be any amount of quads without repeating glBegin()/glEnd()
To do this, i think i should somehow access the nearby vertex colors, but how? Or is there any other solutions for this?
I need this to work this way so i can easily switch between different interpolation shaders.
Any other solution that works with one glBegin() command is good too, but sending all corner colors per vertex isnt acceptable, unless thats the only solution here?
Edit: The example code uses immediate mode for clarity only. Even with vertex arrays/buffers the problem would be the same: i would have to split the rendering calls into 4 vertices chunks, which causes the whole speed drop here!
Long story short: You cannot do this with a vertex shader.
The interpolator (or rasterizer) is one of the components of the graphics pipeline that is not programmable. Given how the graphics pipe works, neither a vertex shader nor a fragment shader are allowed access to anything but their vertex (or fragment, respectively), for reasons of speed, simplicity, and parallelism.
The workaround is to use a texture lookup, which has already been noted in previous answers.
In newer versions of OpenGL (3.0 and up I believe?) there is now the concept of a geometry shader. Geometry shaders are more complicated to implement than the relatively simple vertex and fragment shaders, but geometry shaders are given topological information. That is, they execute on a primitive (triangle, line, quad, etc) rather than a single point. With that information, they could create additional geometry in order to resolve your alternate color interpolation method.
However, that's far more complicated than necessary. I'd stick with a 4 texel texture map and implement your logic in your fragment lookup.
Under the hood, OpenGL (and all the hardware that it drives) will do everything as triangles, so if you choose to blend colors via vertex interpolation, it will be triangular interpolation because the hardware doesn't work any other way.
If you want "quad" interpolation, you should put your colors into a texture, because in hardware a texture is always "quad" shaped.
If you really think it's the number of draws that cause your performance drop, you can try to use Instancing (Using glDrawArrayInstanced+glVertexAttribDivisor), available in GL 3.1 core.
An alternative might be point sprites, depending on your usage model (mostly, maximum size of your quads, and are they always perpendicular to the view). That's available since GL 2.0 core.
Linear interpolation with colours specified per vertex can be set up efficiently using glColorPointer. Similarly you should use glTexCoordPointer/glVertexAttribPointer/glVertexPointer to replace all those individual per-vertex calls with a single call referencing the data in an array. Then render all your quads with a single (or at most a handful of) glDrawArrays or glDrawElements call. You'll see a huge improvement from this even without VBOs (which just change where the arrays are stored).
You mention you want to change shaders (between ShaderA and ShaderB say) on a quad by quad basis. You should either:
Arrange things so you can batch all of the ShaderA quads together and all the ShaderB quads together and render all of each together with a single call. Changing shader is generally quite expensive so you want to minimise the number of changes.
Implement all the different shader logic you want in a single "unified" shader, but selected by another vertex attribute which selects between the different codepaths. Whether this is anywhere near as efficient as the batching approach (which is preferable) depends on whether or not each "tile" of SIMD shaders tends to have to run a mixture of paths or just one.