Tiled deferred shading without compute shader - opengl

I'm building a deferred renderer and since I want to support a large amount of lights in the scene I've had a look at tiled deferred shading.
The problem is that I have to target OpenGL 3.3 hardware and it doesn't support GLSL compute shaders.
Is there a possibility to implement tiled deferred shading with normal shaders?

Tiled deferred rendering does not strictly require a compute shader. What it requires is that, for each tile, you have a series of lights which it will process. A compute shader is merely one way to accomplish that.
An alternative is to build the light lists for each frustum on the CPU, then upload that data to the GPU for its eventual use. Obviously it requires much more memory work than the CS version. But it's probably not that expensive, and it allows you to easily play with tile sizes to find the most optimal. More tiles means more CPU work and more data to be uploaded, but fewer lights-per-tile (generally speaking) and more efficient processing.
One way to do that for GL 3.3-class hardware is to make each tile a separate quad. The quad will be given, as part of its per-vertex parameters, the starting index for its part of the total light list and the total number of lights for that tile to process. The idea being that there is a globally-accessible array, and each tile has a contiguous region of this array that it will process.
This array could be the actual lights themselves, or it could be indices into a second (much smaller) array of lights. You'll have to measure the difference to tell if it's worthwhile to have the additional indirection in the access.
The primary array should probably be a buffer texture, since it can get quite large, depending on the number of lights and tiles. If you go with the indirect route, then the array of actual light data will likely fit into a uniform block. But in either case, you're going to need to employ buffer streaming techniques when uploading to it.

Related

Efficiently providing geometry for terrain physics

I have been researching different approaches to terrain systems in game engines for a bit now, trying to familiarize myself with the work. A number of the details seem straightforward, but I am getting hung up on a single detail.
For performance reasons many terrain solutions utilize shaders to generate parts or all of the geometry, such as vertex shaders to generate positions or tessellation shaders for LoD. At first I figured those approaches were exclusively for renders that weren't concerned about physics simulations.
The reason I say that is because as I understand shaders at the moment, the results of a shader computation generally are discarded at the end of the frame. So if you rely on shaders heavily then the geometry information will be gone before you could access it and send it off to another system (such as physics running on the CPU).
So, am I wrong about shaders? Can you store the results of them generating geometry to be accessed by other systems? Or am I forced to keep the terrain geometry on CPU and leave the shaders to the other details?
Shaders
You understand parts of the shaders correctly, that is: after a frame, the data is stored as a final composed image in the backbuffer.
BUT: Using transform feedback it is possible to capture transformed geometry into a vertex buffer and reuse it. Transform Feedback happens AFTER the vertex/geometry/tessellation shader, so you could use the geometry shader to generate a terrain (or visible parts of it once), push it through transform-feedback and store it.
This way, you potentially could use CPU collision detection with your terrain! You can even combine this with tessellation.
You will love this: A Framework for Real-Time, Deformable Terrain.
For the LOD and tessellation: LOD is not the prerequisite of tessellation. You can use tessellation to allow some more sophisticated effects such as adding a detail by recursive subdivision of rough geometry. Linking it with LOD is simply a very good optimization avoiding RAM-memory based LOD-mesh-levels, since you just have your "base mesh" and subdivide it (Although this will be an unsatisfying optimization imho).
Now some deeper info on GPU and CPU exclusive terrain.
GPU Generated Terrain (Procedural)
As written in the NVidia article Generating Complex Procedural Terrains Using the GPU:
1.2 Marching Cubes and the Density Function Conceptually, the terrain surface can be completely described by a single function, called the
density function. For any point in 3D space (x, y, z), the function
produces a single floating-point value. These values vary over
space—sometimes positive, sometimes negative. If the value is
positive, then that point in space is inside the solid terrain.
If the value is negative, then that point is located in empty space
(such as air or water). The boundary between positive and negative
values—where the density value is zero—is the surface of the terrain.
It is along this surface that we wish to construct a polygonal mesh.
Using Shaders
The density function used for generating the terrain, must be available for the collision-detection shader and you have to fill an output buffer containing the collision locations, if any...
CUDA
See: https://www.youtube.com/watch?v=kYzxf3ugcg0
Here someone used CUDA, based on the NVidia article, which however implies the same:
In CUDA, performing collision detection, the density function must be shared.
This will however make the transform feedback techniques a little harder to implement.
Both, Shaders and CUDA, imply resampling/recalculation of the density at at least one location, just for the collision detection of a single object.
CPU Terrain
Usually, this implies a RAM-memory stored set of geometry in the form of vertex/index-buffer pairs, which are regularly processed by the shader-pipeline. As you have the data available here, you will also most likely have a collision mesh, which is a simplified representation of your terrain, against which you perform collision.
Alternatively you could spend your terrain a set of colliders, marking the allowed paths, which is imho performed in the early PS1 Final Fantasy games (which actually don't really have a terrain in the sense we understand terrain today).
This short answer is neither extensively deep nor complete. I just tried to give you some insight into some concepts used in dozens of solutions.
Some more reading: http://prideout.net/blog/?tag=opengl-transform-feedback.

(Modern) OpenGL Different Colored Faces on a Cube - Using Shaders

A cube with different colored faces in intermediate mode is very simple. But doing this same thing with shaders seems to be quite a challenge.
I have read that in order to create a cube with different coloured faces, I should create 24 vertices instead of 8 vertices for the cube - in other words, (I visualies this as 6 squares that don't quite touch).
Is perhaps another (better?) solution to texture the faces of the cube using a real simple texture a flat color - perhaps a 1x1 pixel texture?
My texturing idea seems simpler to me - from a coder's point of view.. but which method would be the most efficient from a GPU/graphic card perspective?
I'm not sure what your overall goal is (e.g. what you're learning to do in the long term), but generally for high performance applications (e.g. games) your goal is to reduce GPU load. Every time you switch certain states (e.g. change textures, render targets, shader uniform values, etc..) the GPU stalls reconfiguring itself to meet your demands.
So, you can pass in a 1x1 pixel texture for each face, but then you'd need six draw calls (usually not so bad, but there is some prep work and potential cache misses) and six texture sets (can be very bad, often as bad as changing shader uniform values).
Suppose you wanted to pass in one texture and use that as a texture map for the cube. This is a little less trivial than it sounds -- you need to express each texture face on the texture in a way that maps to the vertices. Often you need to pass in a texture coordinate for each vertex, and due to the spacial configuration of the texture this normally doesn't end up meaning one texture coordinate for one spatial vertex.
However, if you use an environmental/reflection map, the complexities of mapping are handled for you. In this way, you could draw a single texture on all sides of your cube. (Or on your sphere, or whatever sphere-mapped shape you wanted.) I'm not sure I'd call this easier since you have to form the environmental texture carefully, and you still have to set a different texture for each new colors you want to represent -- or change the texture either via the GPU or in step with the GPU, and that's tricky and usually not performant.
Which brings us back to the canonical way of doing as you mentioned: use vertex values -- they're fast, you can draw many, many cubes very quickly by only specifying different vertex data, and it's easy to understand. It really is the best way, and how GPUs are designed to run quickly.
Additionally..
And yes, you can do this with just shaders... But it'd be ugly and slow, and the GPU would end up computing it per each pixel.. Pass the object space coordinates to the fragment shader, and in the fragment shader test which side you're on and output the corresponding color. Highly not recommended, it's not particularly easier, and it's definitely not faster for the GPU -- to change colors you'd again end up changing uniform values for the shaders.

How lighting in building games with unlimited number of lights works?

In Minecraft for example, you can place torches anywhere and each one effects the light level in the world and there is no limit to the amount of torches / light sources you can put down in the world. I am 99% sure that the lighting for the torches is taken care of on the CPU and stored for each block and so when rendering the light value at that certain block just needs to be passed into the shader, but light sources cannot move for this reason. If you had a game where you could place light sources that could move around (arrow on fire, minecart with a light on it, glowing ball of energy) and the lighting wasn't as simple (color was included) what are the most efficient ways to calculate the lighting effects.
From my research I have found differed rendering, differed lighting, dynamically creating shaders with different amounts of lights available and using a for loop (can't use uniforms due to unrolling), and static light maps (these would probably only be used for the still lights). Are there any other ways to do lighting calculations such as doing what minecraft does except allowing moving lights, or is it possible to take an infinite amount of lights and mathematically combine them into an approximation that only involves a few lights (this is an idea I came up with but I can't figure out how it could be done)?
If it helps, I am a programmer with decent experience in OpenGL (legacy and modern) so you can give me code snippets although I have not done too much with lighting so brief explanations would be appreciated. I am also willing to do research if you can point me in the right direction!
Your title is a bit misleading infinite light implies directional light in infinite distance like Sun. I would use unlimited number of lights instead. Here some approaches for this I know of:
(back) ray-tracers
they can handle any number of light sources natively. Light is just another object in engine. If ray hits the light source it just take the light intensity and stop the recursion. Unfortunately current gfx hardware is not suited for this kind of rendering. There are GPU enhanced engines for this but the specialized gfx HW is still in development and did not hit the market yet. Memory requirements are not much different then standard BR rendering and You can still use BR meshes but mathematical (analytical) meshes are natively supported and are better for this.
Standard BR rendering
BR means boundary representation such engines (Like OpenGL fixed function) can handle only limited number of lights. This is because each primitive/fragment needs the complete list of lights and the computations are done for all light on per primitive or per fragment basis. If you got many light this would be slow.
GLSL example of fixed number of light sources see the fragment shader
Also the current GPU's have limited memory for uniforms (registers) in which the lights and other rendering parameters are stored so there are possible workarounds like have light parameters stored in a texture and iterate over all of them per primitive/fragment inside GLSL shader but the number of lights affect performance of coarse so you are limited by target frame-rate and computational power. Additional memory requirements for this is just the texture with light parameters which is not so much (few vectors per light).
light maps
they can be computed even for moving objects. Complex light maps can be computed slowly (not per frame). This leads to small lighting artifacts but you need to know what to look for to spot it. Light maps and shadow maps are very similar and often computed at once. There are simple light maps and complex radiation maps models out there
look Shading mask algorithm for radiation calculations
These are either:
projected 2D maps (hard to implement/use and often less precise)
3D Voxel maps (Memory demanding but easier to compute/use)
Some approaches uses pre-rendered Z-Buffer as geometry source and then fill the lights via Radiosity or any other technique. These can handle any number of lights as these maps can be computation demanding they are often computed in the background and updated once in a while.
fast moving light sources are usually updated more often or excluded from maps and rendered as transparent geometry to make impression of light. The computational power needed for this depends on the computation method the basic are done like:
set a camera to the larges visible surfaces
render scene and handle the result as light/shadow map
store it as 2D or 3D texture or voxel map
and then continue with normal rendering from camera view
So you need to render scene more then once per frame/map update and also need additional buffers to store the rendered result which for high resolution or Voxel maps can be a big chunk of memory.
multi pass light layer
there are cases when light is added after rendering of the scene for example I used it for
Atmospheric scattering in GLSL
Here comes all multi pass rendering techniques you need additional buffers to store the sub results and usually the multi pass rendering is done on the same view/scene so pre-rendered geometry is used which significantly speeds this up either as locked VAO or as already rendered Z-buffer Color and Index buffers from first pass. After this handle next passes as single or few Quads (like in the Atmospheric scattering link) so the computational power needed for this is not much bigger in comparison to basic BR rendering
forward rendering vs. deferred-rendering
in a google this forward rendering vs. deferred-rendering is first relevant hit I found. It is not very good one (a bit to vague for my taste) but for starters it is enough
forward rendering techniques are usually standard single pass BR renders
deffered rendering is standard multi pass renders. In first pass is rendered all the geometries of the scene into Z buffer, Color buffer and some auxiliary buffers just to know which fragment of the result belongs to which object,material,... And then in the next passes are added effects,light,shadows,... but the geometry is not rendered again instead just single or few overlay QUADs/per pass are rendered so the next passes are usually pretty fast ...
The link suggest that for high lights number is the deffered rendering more suited but that strongly depends on which of the previous technique is used. Usually the multi pass light layer is used (with is one of the standard deffered rendering techniques) so in that case it is true, and the memory and computational power demands are the same see the previous section.

GPU particle metaball-surface rendering

I have a question about a very specific method on how to render surface particles. The method is explained very well in the Nvidia GPU Gems 3 chapter 7 "Point-Based Visualization of Metaballs on a GPU", link to this chapter.
The article is about rendering an implicit surface using points or splats that are evenly distributed over the surface. They say that the computation of these particles is done completely on the GPU. Only the data which defines the surface is sent from CPU to the GPU to keep the traffic as low as possible.
They also gave some pseudo code examples of fragment shader programs to compute the particle positions, velocity etc. and for me it looks like these programs should run once for every particle.
Now my question is, how do they store these particles? What kind of data structure is it?
It must be some kind of buffer or texture that can be accessed for reading as well as for writing operations on the GPU. But how do I render this buffer/texture again in the next rendering step?
My first idea was some kind of vertex-buffer-object which is sent to the GPU once at the beginning and continuously updated there at each rendering pass. Is that possible at all?
One requirement for me is that it must be implemented using OpenGL/GLSL, I hope that is possible.
Yes you need some kind of VBO and repeated passes over the same data. The data structure can be a SoA (Struct of Arrays) or AoS (Array of Structs) depending on how you prefer to code the access to the different properties of the array, ie:
SoA:
Positions Array
Speed Array
Normal Array
AoS:
Just one Array containing [Position, Speed, Normal].
AoS are the same as interleaved arrays for rendering where in only one array you keep all the properties of the mesh.
You could use either a VBO or a Texture, the only difference is the way the caching is done, since textures are optimized for 2D access.
The rendering is done in steps exactly like you are picturing it, so all you need to do is to "render" the physical stepping of the system using shaders that compute the properties you want and then bind the same structures to the true graphics rendering in a subsequent step.

Is it possible to reuse glsl vertex shader output later?

I have a huge mesh(100k triangles) that needs to be drawn a few times and blend together every frame. Is it possible to reuse the vertex shader output of the first pass of mesh, and skip the vertex stage on later passes? I am hoping to save some cost on the vertex pipeline and rasterization.
Targeted OpenGL 3.0, can use features like transform feedback.
I'll answer your basic question first, then answer your real question.
Yes, you can store the output of vertex transformation for later use. This is called Transform Feedback. It requires OpenGL 3.x-class hardware or better (aka: DX10-hardware).
The way it works is in two stages. First, you have to set your program up to have feedback-based varyings. You do this with glTransformFeedbackVaryings. This must be done before linking the program, in a similar way to things like glBindAttribLocation.
Once that's done, you need to bind buffers (given how you set up your transform feedback varyings) to GL_TRANSFORM_FEEDBACK_BUFFER with glBindBufferRange, thus setting up which buffers the data are written into. Then you start your feedback operation with glBeginTransformFeedback and proceed as normal. You can use a primitive query object to get the number of primitives written (so that you can draw it later with glDrawArrays), or if you have 4.x-class hardware (or AMD 3.x hardware, all of which supports ARB_transform_feedback2), you can render without querying the number of primitives. That would save time.
Now for your actual question: it's probably not going to help buy you any real performance.
You're drawing terrain. And terrain doesn't really get any transformation. Typically you have a matrix multiplication or two, possibly with normals (though if you're rendering for shadow maps, you don't even have that). That's it.
Odds are very good that if you shove 100,000 vertices down the GPU with such a simple shader, you've probably saturated the GPU's ability to render them all. You'll likely bottleneck on primitive assembly/setup, and that's not getting any faster.
So you're probably not going to get much out of this. Feedback is generally used for either generating triangle data for later use (effectively pseudo-compute shaders), or for preserving the results from complex transformations like matrix palette skinning with dual-quaternions and so forth. A simple matrix multiply-and-go will barely be a blip on the radar.
You can try it if you like. But odds are you won't have any problems. Generally, the best solution is to employ some form of deferred rendering, so that you only have to render an object once + X for every shadow it casts (where X is determined by the shadow mapping algorithm). And since shadow maps require different transforms, you wouldn't gain anything from feedback anyway.