At the moment I've implemented Deferred Rendering using OpenGL its fairly simple just now. However I'm having major performance issues due to using the stencil pass at the moment (at least in the current way I use it). I've mainly been using ogldev.atspace tutorial (only got 2 links per post atm sorry!) as a reference alongside a few dozen tidbits of information from other articles.
It works like:
Gbuffer pass (render geometry and fill normals, diffuse, ambient etc.)
For each light
2a) stencil pass
2b) light pass
Swap to screen
The thing is using the stencil pass in this fashion is incurring huge costs, as I need to swap between light pass mode and stencil mode for every light in the scene. So that's a lot of GL state swaps.
An alternate method without the stencil pass would look like this:
Gbuffer fill
Set light pass
Compute for all lights
Swap to screen
Doing this skips the need to swap all of the OpenGL states (and buffer clears etc.) for each light in the scene.
I've tested/profiled this using CodeXL and basic fps's std::cout'ng. The State Change Functions using the stencil pass method take up 44% of my GL calls (in comparison to 6% for draw and 6% for textures), buffer swaps/clears etc also cost a fair % more. When I go to the second method the GL state changes drop to 2.98% and the others drop a fair margin as well. The FPS also changes drastically for example I have 65~ lights in my scene, dynamically moving. Stencil Pass gives me around 20-30 fps if I'm lucky in release mode (with rendering taking the majority of the total time). The second method gives me 71~ (with the rendering taking the smaller part of the total time).
Now why not just use the second method? Well, it causes serious lighting issues that I don't get with the first. I have no idea how to get rid of them. Here's an example:
2nd non-stencil version (it essentially bleeds and overlaps onto things outside its range):
http://imgur.com/BNn9SP2
1st stencil version (how it should look):
http://imgur.com/kVGRwH2
So my main question is, is there a way to avoid using the stencil pass (and use something similar to the first without the graphical glitch) without completely changing the algorithm to something like tiled deferred rendering?
And if not, is their an alternate deferred rendering method, that isn't too much of a jump from the style of deferred renderer I'm using?
Getting rid of the stencil pass isn't a new problem for me, I was looking for an alternative to this a 6 months or so back when I first implemented it and thought it might be a bit too much of an overhead for what I had in mind. But I couldn't find anything at the time and still can't.
Another tecnique wich is used in Doom3 for lighting is the following:
http://fabiensanglard.net/doom3/renderer.php
For each light
render the geometry affected only by 1 light
accumulate the light result (clamping to 255)
As optimization you can add a scissor test so that you will render only the visible part of the geometry for each light.
The advantage of this over stencil lights is that you can do complex light calculations if you want, or just keep simple lights. And the whole work is GPU, you don't have redundant state changes (you setup 1 shader only, 1 vbo only and you re-draw each time changin only the uniform parameters of the light and the scissor test area). You don't even need a G-Buffer.
Related
In Minecraft for example, you can place torches anywhere and each one effects the light level in the world and there is no limit to the amount of torches / light sources you can put down in the world. I am 99% sure that the lighting for the torches is taken care of on the CPU and stored for each block and so when rendering the light value at that certain block just needs to be passed into the shader, but light sources cannot move for this reason. If you had a game where you could place light sources that could move around (arrow on fire, minecart with a light on it, glowing ball of energy) and the lighting wasn't as simple (color was included) what are the most efficient ways to calculate the lighting effects.
From my research I have found differed rendering, differed lighting, dynamically creating shaders with different amounts of lights available and using a for loop (can't use uniforms due to unrolling), and static light maps (these would probably only be used for the still lights). Are there any other ways to do lighting calculations such as doing what minecraft does except allowing moving lights, or is it possible to take an infinite amount of lights and mathematically combine them into an approximation that only involves a few lights (this is an idea I came up with but I can't figure out how it could be done)?
If it helps, I am a programmer with decent experience in OpenGL (legacy and modern) so you can give me code snippets although I have not done too much with lighting so brief explanations would be appreciated. I am also willing to do research if you can point me in the right direction!
Your title is a bit misleading infinite light implies directional light in infinite distance like Sun. I would use unlimited number of lights instead. Here some approaches for this I know of:
(back) ray-tracers
they can handle any number of light sources natively. Light is just another object in engine. If ray hits the light source it just take the light intensity and stop the recursion. Unfortunately current gfx hardware is not suited for this kind of rendering. There are GPU enhanced engines for this but the specialized gfx HW is still in development and did not hit the market yet. Memory requirements are not much different then standard BR rendering and You can still use BR meshes but mathematical (analytical) meshes are natively supported and are better for this.
Standard BR rendering
BR means boundary representation such engines (Like OpenGL fixed function) can handle only limited number of lights. This is because each primitive/fragment needs the complete list of lights and the computations are done for all light on per primitive or per fragment basis. If you got many light this would be slow.
GLSL example of fixed number of light sources see the fragment shader
Also the current GPU's have limited memory for uniforms (registers) in which the lights and other rendering parameters are stored so there are possible workarounds like have light parameters stored in a texture and iterate over all of them per primitive/fragment inside GLSL shader but the number of lights affect performance of coarse so you are limited by target frame-rate and computational power. Additional memory requirements for this is just the texture with light parameters which is not so much (few vectors per light).
light maps
they can be computed even for moving objects. Complex light maps can be computed slowly (not per frame). This leads to small lighting artifacts but you need to know what to look for to spot it. Light maps and shadow maps are very similar and often computed at once. There are simple light maps and complex radiation maps models out there
look Shading mask algorithm for radiation calculations
These are either:
projected 2D maps (hard to implement/use and often less precise)
3D Voxel maps (Memory demanding but easier to compute/use)
Some approaches uses pre-rendered Z-Buffer as geometry source and then fill the lights via Radiosity or any other technique. These can handle any number of lights as these maps can be computation demanding they are often computed in the background and updated once in a while.
fast moving light sources are usually updated more often or excluded from maps and rendered as transparent geometry to make impression of light. The computational power needed for this depends on the computation method the basic are done like:
set a camera to the larges visible surfaces
render scene and handle the result as light/shadow map
store it as 2D or 3D texture or voxel map
and then continue with normal rendering from camera view
So you need to render scene more then once per frame/map update and also need additional buffers to store the rendered result which for high resolution or Voxel maps can be a big chunk of memory.
multi pass light layer
there are cases when light is added after rendering of the scene for example I used it for
Atmospheric scattering in GLSL
Here comes all multi pass rendering techniques you need additional buffers to store the sub results and usually the multi pass rendering is done on the same view/scene so pre-rendered geometry is used which significantly speeds this up either as locked VAO or as already rendered Z-buffer Color and Index buffers from first pass. After this handle next passes as single or few Quads (like in the Atmospheric scattering link) so the computational power needed for this is not much bigger in comparison to basic BR rendering
forward rendering vs. deferred-rendering
in a google this forward rendering vs. deferred-rendering is first relevant hit I found. It is not very good one (a bit to vague for my taste) but for starters it is enough
forward rendering techniques are usually standard single pass BR renders
deffered rendering is standard multi pass renders. In first pass is rendered all the geometries of the scene into Z buffer, Color buffer and some auxiliary buffers just to know which fragment of the result belongs to which object,material,... And then in the next passes are added effects,light,shadows,... but the geometry is not rendered again instead just single or few overlay QUADs/per pass are rendered so the next passes are usually pretty fast ...
The link suggest that for high lights number is the deffered rendering more suited but that strongly depends on which of the previous technique is used. Usually the multi pass light layer is used (with is one of the standard deffered rendering techniques) so in that case it is true, and the memory and computational power demands are the same see the previous section.
I'm working on rendering a scene that potentially has multiple intersecting transparent objects. This makes the standard method of sorting and drawing back to front problematic (even sorting triangles wouldn't work if the triangles intersect). So I've implemented depth peeling using a GLSL fragment shader to do the second depth test. It's works great.
Now I want to be able to apply certain effects using shaders. One of the objects in the scene is a syringe, and I would like to apply a glass effect. If I was drawing back to front, this would be easy - just start the shader when I draw the syringe, since everything behind it is already in the frame buffer. However, when using depth peeling this approach won't work.
So my questions are:
How to I apply shader effects to a single object in a scene when using depth peeling?
How do I combine effect shaders with my depth peeling shader (assuming they need to run at the same time)?
I should note that I'm pretty new at using shaders, so code examples are appreciated!
I'd be surprised if that's possible without ray tracing. As far as I know, the way to use refraction shaders is to do texture lookups in an environment map. This map can be either precomputed, or it's computed on the fly in a separate rendering pass. For the latter option you would need one separate environment map and one extra pass for each object that uses the shader. I kinda doubt that that's possible if the objects intersect each other. Even if it was, for each of these passes you would also need another couple passes for the depth peeling. Now if you also wanted the depth peeling shader passes to factor in refractions for the surrounding objects, this would quickly get out of hand.
EDIT: I'm still looking for some help about the use of OpenCL or compute shaders. I would prefer to keep using OGL 3.3 and not have to deal with the bad driver support for OGL 4.3 and OpenCL 1.2, but I can't think of anyway to do this type of shading without using one of the two (to match lights and tiles). Is it possible to implement tile-based culling without using GPGPU?
I wrote a deferred render in OpenGL 3.3. Right now I don't do any culling for the light pass (I just render a full screen quad for every light). This (obviously) has a ton of overdraw. (Sometimes it is ~100%). Because of this I've been looking into ways to improve performance during the light pass. It seems like the best way in (almost) everyone's opinion is to cull the scene using screen space tiles. This was the method used in Frostbite 2. I read the the presentation from Andrew Lauritzen during SIGGRAPH 2010 (http://download-software.intel.com/sites/default/files/m/d/4/1/d/8/lauritzen_deferred_shading_siggraph_2010.pdf) , and I'm not sure I fully understand the concept. (and for that matter why it's better than anything else, and if it is better for me)
In the presentation Laurtizen goes over deferred shading with light volumes, quads, and tiles for culling the scene. According to his data, the tile based deferred renderer was the fastest (by far). I don't understand why it is though. I'm guessing it has something to do with the fact that for each tile, all the lights are batched together. In the presentation it says to read the G-Buffer once and then compute the lighting, but this doesn't make sense to me. In my mind, I would implement this like this:
for each tile {
for each light effecting the tile {
render quad (the tile) and compute lighting
blend with previous tiles (GL_ONE, GL_ONE)
}
}
This would still involve sampling the G-Buffer a lot. I would think that doing that would have the same (if not worse) performance than rendering a screen aligned quad for every light. From how it's worded though, it seems like this is what's happening:
for each tile {
render quad (the tile) and compute all lights
}
But I don't see how one would do this without exceeding the instruction limit for the fragment shader on some GPUs . Can anyone help me with this? It also seems like almost every tile based deferred renderer uses compute shaders or OpenCL (to batch the lights), why is this, and if I didn't use these what would happen?
But I don't see how one would do this without exceeding the instruction limit for the fragment shader on some GPUs .
It rather depends on how many lights you have. The "instruction limits" are pretty high; it's generally not something you need to worry about outside of degenerate cases. Even if 100+ lights affects a tile, odds are fairly good that your lighting computations aren't going to exceed instruction limits.
Modern GL 3.3 hardware can run at least 65536 dynamic instructions in a fragment shader, and likely more. For 100 lights, that's still 655 instructions per light. Even if you take 2000 instructions to compute the camera-space position, that still leaves 635 instructions per light. Even if you were doing Cook-Torrance directly in the GPU, that's probably still sufficient.
Criteria: I’m using OpenGL with shaders (GLSL) and trying to stay with modern techniques (e.g., trying to stay away from deprecated concepts).
My questions, in a very general sense--see below for more detail—are as follows:
Do shaders allow you to do custom blending that help eliminate z-order transparency issues found when using GL_BLEND?
Is there a way for a shader to know what type of primitive is being drawn without “manually” passing it some sort of flag?
Is there a way for a shader to “ignore” or “discard” a vertex (especially when drawing points)?
Background: My application draws points connected with lines in an ortho projection (vertices have varying depth in the projection). I’ve only recently started using shaders in the project (trying to get away from deprecated concepts). I understand that standard blending has ordering issues with alpha testing and depth testing: basically, if a “translucent” pixel at a higher z level is drawn first (thus blending with whatever colors were already drawn to that pixel at a lower z level), and an opaque object is then drawn at that pixel but at a lower z level, depth testing prevents changing the pixel that was already drawn for the “higher” z level, thus causing blending issues. To overcome this, you need to draw opaque items first, then translucent items in ascending z order. My gut feeling is that shaders wouldn’t provide an (efficient) way to change this behavior—am I wrong?
Further, for speed and convenience, I pass information for each vertex (along with a couple of uniform variables) to the shaders and they use the information to find a subset of the vertices that need special attention. Without doing a similar set of logic in the app itself (and slowing things down) I can’t know a priori what subset of vericies that is. Thus I send all vertices to the shader. However, when I draw “points” I’d like the shader to ignore all the vertices that aren’t in the subset it determines. I think I can get the effect by setting alpha to zero and using an alpha function in the GL context that will prevent drawing anything with alpha less than, say, 0.01. However, is there a better or more “correct” glsl way for a shader to say “just ignore this vertex”?
Do shaders allow you to do custom blending that help eliminate z-order transparency issues found when using GL_BLEND?
Sort of. If you have access to GL 4.x-class hardware (Radeon HD 5xxx or better, or GeForce 4xx or better), then you can perform order-independent transparency. Earlier versions have techniques like depth peeling, but they're quite expensive.
The GL 4.x-class version uses essentially a series of "linked lists" of transparent samples, which you do a full-screen pass to resolve into the final sample color. It's not free of course, but it isn't as expensive as other OIT methods. How expensive it would be for your case is uncertain; it is proportional to how many overlapping pixels you have.
You still have to draw opaque stuff first, and you have to draw transparent stuff using special shader code.
Is there a way for a shader to know what type of primitive is being drawn without “manually” passing it some sort of flag?
No.
Is there a way for a shader to “ignore” or “discard” a vertex (especially when drawing points)?
No in general, but yes for points. A Geometry shader can conditionally emit vertices, thus allowing you to discard any vertex for arbitrary reasons.
Discarding a vertex in non-point primitives is possible, but it will also affect the interpretation of that primitive. The reason it's simple for points is because a vertex is a primitive, while a vertex in a triangle isn't a whole primitive. You can discard lines, but discarding a vertex within a line is... of dubious value.
That being said, your explanation for why you want to do this is of dubious merit. You want to update vertex data with essentially a boolean value that says "do stuff with me" or not to. That means that, every frame, you have to modify your data to say which points should be rendered and which shouldn't.
The simplest and most efficient way to do this is to simply not render with them. That is, arrange your data so that the only thing on the GPU are the points you want to render. Thus, there's no need to do anything special at all. If you're going to be constantly updating your vertex data, then you're already condemned to dealing with streaming vertex data. So you may as well stream it in a way that makes rendering efficient.
I would like to efficiently render in an interlaced mode using GLSL.
I can alrdy do this like:
vec4 background = texture2D(plane[5], gl_TexCoord[1].st);
if(is_even_row(gl_TexCoord[1].t))
{
vec4 foreground = get_my_color();
gl_FragColor = vec4(fore.rgb * foreground .a + background .rgb * (1.0-foreground .a), background .a + fore.a);
}
else
gl_FragColor = background;
However, as far as I have understood the nature of branching in GLSL is that both branches will actually be executed, since "even_row" is considered as run-time value.
Is there any trick I can use here in order to avoid unnecessarily calling the rather heavy function "get_color"? The behavior of is_even_row is quite static.
Or is there some other way to do this?
NOTE: glPolygonStipple will not work since I have custom blend functions in my GLSL code.
(comment to answer, as requested)
The problem with interlacing is that GPUs run shaders in 2x2 clusters, which means that you gain nothing from interlacing (a good software implementation might possibly only execute the actual pixels that are needed, unless you ask for partial derivatives).
At best, interlacing runs at the same speed, at worst it runs slower because of the extra work for the interlacing. Some years ago, there was an article in ShaderX4, which suggested interlaced rendering. I tried that method on half a dozen graphics cards (3 generations of hardware of each the "two big" manufacturers), and it ran slower (sometimes slightly, sometimes up to 50%) in every case.
What you could do is do all the expensive rendering in 1/2 the vertical resolution, this will reduce the pixel shader work (and texture bandwidth) by 1/2. You can then upscale the texture (GL_NEAREST), and discard every other line.
The stencil test can be used to discard pixels before the pixel shader is executed. Of course the hardware still runs shaders in 2x2 groups, so in this pass you do not gain anything. However, that does not matter if it's just the very last pass, which is a trivial shader writing out a single fetched texel. The more costly composition shaders (the ones that matter!) run at half resolution.
You find a detailled description including code here: fake dynamic branching. This demo avoids lighting pixels by discarding those that are outside the light's range using the stencil.
Another way which does not need the stencil buffer is to use "explicit Z culling". This may in fact be even easier and faster.
For this, clear Z, disable color writes (glColorMask), and draw a fullscreen quad whose vertices have some "close" Z coordinate, and have the shader kill fragments in every odd line (or use the deprecated alpha test if you want, or whatever). gl_FragCoord.y is a very simple way of knowing which line to kill, using a small texture that wraps around would be another (if you must use GLSL 1.0).
Now draw another fullscreen quad with "far away" Z values in the vertices (and with depth test, of course). Simply fetch your half-res texture (GL_NEAREST filtering), and write it out. Since the depth buffer has a value that is "closer" in every other row, it will discard those pixels.
How does glPolygonStipple compare to this? Polygon stipple is a deprecated feature, because it is not directly supported by the hardware and has to be emulated by the driver either by "secretly" rewriting the shader to include extra logic or by falling back to software.
This is probably not the right way to do interlacing. If you really need to achieve this effect, don't do it in the fragment shader like this. Instead, here is what you could do:
Initialize a full screen 1-bit stencil buffer, where each bit stores the parity of its corresponding row.
Render your scene like usual to a temporary FBO with 1/2 the vertical resoltion.
Turn on the stencil test, and switch the stencil func depending on which set of scan lines you are going to draw.
Blit a rescaled version of the aforementioned fbo (containing the contents of your frame) to the stencil buffer.
Note that you could skip the offscreen FBO step and draw directly using the stencil buffer, but this would waste some fill rate testing those pixels that are just going to clipped anyway. If your program is shader heavy, the solution I just mentioned would be optimal. If it is not, you may end up being marginally better off drawing directly to the screen.