Improving shadow map performance of point lights? - opengl

I'm making point lights with shadow maps, and currently I have it done with six depth map textures, each rendered individually and applied to the light map. This works quite well, but the performance cost is high.
Now, to improve the performance by reducing FBO changes and shader swapping between depthmap shader and lightmap shader, I was thinking of a couple of approaches. First one involves having a single texture, six times larger than the shadow map, and rendering all the point light depth maps "at once", and then using this texture to lay out the light map in one call. Is it possible to render only to portions of textures?
To elaborate the first example, it would be something like this:
1. Set shadow map size to 128x128
2. Create a depth map texture of 384x256 (3 x 2 shadow map)
3. Render the first 'side' of a point light to the rectangle (0, 0, 128, 128)
4. Render the second 'side' to the rectangle (128, 0, 128, 128)
5. - 9. Render the other 'sides' in a similar manner
10. Swap to light map mode
11. Render the light map in a single call using the 'cube-map'-ish depth texture
Second method I thought is to use 3D textures instead of partial rendering, but I still have a similar question: can I render to only a certain 'layer' of a 3D texture while retaining the other layers?

Why would combined shadow maps have any advantage?
You have to render scene six times.
You could render 6 times 128x128 FBO and then create 256x384 texture and fill it with previously rendered textures. But bandwidth and sampling rate per-pixel remain exact same.
Rendering one part while preserving other's might be done with stencil buffer, but in this case I don't see any point of creating combined-shadow-map.
Hope this helps. Any feedback would be appreciated.

Related

How to implement density maps with instanced rendering

I came across this amazing blog about rendering grass by Kévin Boulanger.
In his project he has used a certain density map:
The black areas represent places in 3D world where grass is to be rendered.
And white areas are where the grass is not present.
My question is-- I am rendering grass in my scene using instanced rendering feature of OpenGL.
But the grass is spanned pretty much across the whole terrain. I have not been able to map such density map with the positions of the grass that I am rendering.
How do I use such density maps with instanced rendering?
What Kevin did was basically reading that density map (on the cpu) when generating the grass patch meshes and stored the value in the vertices (the same value in all the vertices of 1 blade). Then when the patch was rendered, in the pixel shader, if this value was less than some threshold, he discarded the pixel. The system is a little bit different patches made of billboards, density is stored in a texture instead. So basically you render grass everywhere but the grass becomes invisible on the road.
It's explained in his thesis, page 67-71.
But what he describes doesn't work with instancing because each patch is a different mesh or uses different textures so you can't draw them with instancing.
A solution would be to store this density value per instance of grass blade in a uniform buffer instead. So that each grass patch is the same mesh and can be instanced.
For the patches made of billboards it's more complicated, you would need to have "1 texture" per billboard. You could make a big texture (or a texture array) that contains the equivalent of several billboards and store a texture coordinate offset per billboard in the constant buffer. Be sure to reuse the same part of the texture for billboards that look identical in this case.

How many depthtextures can i bind to a framebuffer?

I am trying to create shadow maps of many objects in a sceneRoom with their shadows being projected on the sceneRoom. Untill now i've been able to project the shadows of the sceneRoom on itself, but i want to project the shadows of other Objects in the sceneRoom on the sceneRoom's floor.
is it possible to create multiple depth textures in one framebuffer? or should i use several Framebuffers where each has one depth texture?
There is only one GL_DEPTH_ATTACHMENT point, so you can only have at most one attached depth buffer at any time. So you have to use some other method.
No, there is only one attachment point (well, technically two if you count GL_DEPTH_STENCIL_ATTACHMENT) for depth in an FBO. You can only attach one thing to the depth, but that does not mean you are limited to a single image.
You can use an array texture to store multiple depth images and then attach this array texture to GL_DEPTH_ATTACHMENT.
However, the only way to draw into an explicit array level in this texture would be to use a Geometry Shader to do layered rendering. Since it sounds like each one of these depth images you are interested in are actually completely different sets of geometry, this does not sound like the approach you want. If you used a Geometry Shader to do this, you would process the same set of geometry for each layer.
One thing you could consider is actually using a single depth buffer, but packing your shadow maps into an atlas. If each of your shadow maps is 512x512, you could store 4 of them in a single texture with dimensions 1024x1024 and adjust texture coordinates (and viewport when you draw into the atlas) appropriately. The reason you might consider doing this is because changing the render target (FBO state) tends to be the most expensive thing you would do between draw calls in a series of depth-only draws. You might change a few uniforms or vertex pointers, but those are dirt cheap to change.

Reverse triangle lookup from affected pixels?

Assume I have a 3D triangle mesh, and a OpenGL framebuffer to which I can render the mesh.
For each rendered pixel, I need to build a list of triangles that rendered to that pixel, even those that are occluded.
The only way I could think of doing this is to individually render each triangle from the mesh, then go through each pixel in the framebuffer to determine if it was affected by the triangle (using the depth buffer or a user-defined fragment shader output variable). I would then have to clear the framebuffer and do the same for the next triangle.
Is there a more efficient way to do this?
I considered, for each fragment in the fragment shader, writing out a triangle identifier, but GLSL doesn't allow outputting a list of integers.
For each rendered pixel, I need to build a list of triangles that rendered to that pixel, even those that are occluded.
You will not be able to do it for entire scene. There's no structure that allow you to associate "list" with every pixel.
You can get list of primitives that affected certain area using select buffer (see glRenderMode(GL_SELECT)).
You can get scene depth complexity using stencil buffer techniques.
If there are 8 triangles total, then you can get list of triangles that effected every pixel using stencil buffer (basically, assign unique (1 << n) stencil value to each triangle, and OR it with existing stencil buffer value for every stencil OP).
But to solve it in generic case, you'll need your own rasterizer and LOTS of memory to store per-pixel triangle lists. The problem is quite similar to multi-layered depth buffer, after all.
Is there a more efficient way to do this?
Actually, yes, but it is not hardware accelerated and OpenGL has nothing to do it. Store all rasterized triangles in OCT-tree. Launch a "ray" through that OCT-tree for every pixel you want to test, and count triangles this ray hits. That's collision detection problem.

Off-screen multiple render targets using Frame Buffer Object (FBO) or?

Situation: Generating N samples of a shape and corresponding edges (using Sobel filter or my own) with different transformations and rotations, while viewport (size=600*600) and camera remain constants. i.e. there will be N samples + N corresponding edges.
I am thinking to do like this,
Use One FBO with 2 renderbuffers [i.e. size of each buffer will be= (N *600) * 600]- 1st for N shapes and 2nd for edges of the corresponding shapes
Questions:
Which is the best way to achieve above things?
Though viewport size is 600*600pixels but shape will only occupy around 50*50pixels. So is there any efficient way to apply edge detection on bounding box/AABB region only on 2nd buffer? Also only reading 2N bounding box (N sample + N corresponding edges) in efficient way?
1 : I'm not sure what you call "best way". Use Multiple Render Targets : you create two 600*N textures, bind them both to the FBO with glDrawArrays, and in your fragment shader, so something like that :
layout(location = 0) out vec3 color;
layout(location = 1) out vec3 edges;
When writing to "color" and "edges", you'll effectively write in your textures.
2 : You shouldn't do this. Compute your bounding boxes on the CPU, and project them (i.e. multiply each corner by your ModelViewProjection matrix) to get the bounding boxes in 2D
By the way : Compute your bounding boxes first, so that you won't need 600*600 textures but 50*50...
EDIT : You usually restrict the drawn zone with glViewPort. But ther is only one viewport, and you need several. You can try the Viewport array extension and live on the bleeding edge, or pass the AABB in a texture, or don't worry about that until performance matters...
Oh, and you can't use Sobel just like that... Sobel requires that you can read all texels around, which is not the case since you're currently rendering said texels. Either make a two-pass algorithm without MRTs (first color, then edges) or don't use Sobel and guess you edges in the shader ( I don't really see how )
Like Calvin said, you have to first render your object into the the first framebuffer and then bind this as texture (use texture attachment rather than a renderbuffer) for the second pass to find the edges, as the edge detection usually needs access to a pixel's surrounding pixels.
Regarding your second question, you could probably use the stencil buffer. Just draw your shapes in the first pass and let them write a reference value into the stencil buffer. Then do the edge detection (usually by rendering a screen sized quad with the corrseponding fragment shader) and configure the stencil test to only pass where the stencil buffer contains the reference value. This way (assuming early-z hardware, which is quite common now) the fragment shader will only be executed on the pixels the shape has actually been drawn onto.

Avoiding glBindTexture() calls?

My game renders lots of cubes which randomly have 1 of 12 textures. I already Z order the geometry so therefore I cant just render all the cubes with texture1 then 2 then 3 etc... because that would defeat z ordering. I already keep track of the previous texture and in they are == then I do not call glbindtexture, but its still way too many calls to this. What else can I do?
Thanks
Ultimate and fastest way would be to have an array of textures (normal ones or cubemaps). Then dynamically fetch the texture slice according to an id stored in each cube instance data/ or cube face data (if you want a different texture on a per cube face basis) using GLSL built-in gl_InstanceID or gl_PrimitiveID.
With this implementation you would bind your texture array just once.
This would of course required used of gpu_shader4 and texture_array extensions:
http://developer.download.nvidia.com/opengl/specs/GL_EXT_gpu_shader4.txt
http://developer.download.nvidia.com/opengl/specs/GL_EXT_texture_array.txt
I have used this mechanism (using D3D10, but principle applies too) and it worked very well.
I had to map on sprites (3D points of a constant screen size of 9x9 or 15x15 pixels IIRC) differents textures indicating each a different meaning for the user.
Edit:
If you don't feel comfy with all shader stuff, I would simply sort cubes by textures, and don't Z order the geometry. Then measure performances gains.
Also I would try to add a pre-Z pass where you render all your cubes in Z buffer only, then render normal scene, and see if it speed up things (if fragments bound, it could help).
You can pack your textures into one texture and offset the texture coordinates accordingly
glMatrixMode(GL_TEXTURE) will also allow you to perform transformations on the texture space (to avoid changing all the texture coords)
Also from NVIDIA:
Bindless Graphics