Implementing environment mapping and IBL for deferred shading - opengl

Our deferred renderer reached to a point where we need to apply environment maps & IBL to push the quality to a higher level ( As you can see the cubemaps are clearly missing ):
After a couple of hours of research on this topic i still found no solution which makes me really happy.
This is what i found so far
Additional forward pass whichs result is added to the LA-Buffer. This looks like a bad idea somehow, we use deferred shading to avoid multiple passes and then render everything again for cubemapping & IBL. But at least we could use the existing GBuffer-Data ( inc. depth ) to render the objects and the selection of the used cubemap can be easily done on the CPU.
Render a fullscreen plane and perform some crazy selection of the right cubemap in the shader ( what is done on the CPU in the shader ). This seems even more bad than rendering everything again, the glsl shader is going to be huge. Even if i implement tiled-deferred-rendering ( not done yet ), it still seems to be a bad idea.
Also storing the cubemap information directly into the GBuffer of the first pass is not applicable in my renderer. I've used all components of my 3 buffers ( 3 to stay compatible with ES 3.0 ), and already used compression on the color values ( YCoCg ) and the normals ( Spheremap Transform ).
Last but not least the very simple and not really a good solution: Use a single cubemap and apply it on the hole scene. This is not really a option, because this is going to have a huge impact on the quality.
I want to know if another approach exists for environment cubemapping. If not what is the best approach of them. My personal favorite is the second one so far even if this requires rendering the whole scene again ( At least on devices which only support 4 rendertargets ).

After testing different stuff i found out that the best idea is to use the second solution, however the implementation is tough. The best approach is to use compute shaders, however these are not supported on the most mobile devices these days.
So you need to use a single cubemap on mobile devices or get the data into the buffer in the first render pass. If this is not possible you need to render it tiled with some frustum culling for each tile ( to reduce number of operations on each pixel ).

Related

How should you efficiently batch complex meshes?

What is the best way to render complex meshes? I wrote different solutions below and wonder what is your opinion about them.
Let's take an example: how to render the 'Crytek-Sponza' mesh?
PS: I do not use Ubershader but only separate shaders
If you download the mesh on the following link:
http://graphics.cs.williams.edu/data/meshes.xml
and load it in Blender you'll see that the whole mesh is composed by about 400 sub-meshes with their own materials/textures respectively.
A dummy renderer (version 1) will render each of the 400 sub-mesh separately! It means (to simplify the situation) 400 draw calls with for each of them a binding to a material/texture. Very bad for performance. Very slow!
pseudo-code version_1:
foreach mesh in meshList //400 iterations :(!
mesh->BindVBO();
Material material = mesh->GetMaterial();
Shader bsdf = ShaderManager::GetBSDFByMaterial(material);
bsdf->Bind();
bsdf->SetMaterial(material);
bsdf->SetTexture(material->GetTexture()); //Bind texture
mesh->Render();
Now, if we take care of the materials being loaded we can notice that the Sponza is composed in reality of ONLY (if I have a good memory :)) 25 different materials!
So a smarter solution (version 2) should be to gather all the vertex/index data in batches (25 in our example) and not store VBO/IBO into sub-meshes classes but into a new class called Batch.
pseudo-code version_2:
foreach batch in batchList //25 iterations :)!
batch->BindVBO();
Material material = batch->GetMaterial();
Shader bsdf = ShaderManager::GetBSDFByMaterial(material);
bsdf->Bind();
bsdf->SetMaterial(material);
bsdf->SetTexture(material->GetTexture()); //Bind texture
batch->Render();
In this case each VBO contains data that share exactly the same texture/material settings!
It's so much better! Now I think 25 VBO for render the sponza is too much! The problem is the number of Buffer bindings to render the sponza! I think a good solution should be to allocate a new VBO if the first one if 'full' (for example let's assume that the maximum size of a VBO (value defined in the VBO class as attribute) is 4MB or 8MB).
pseudo-code version_3:
foreach vbo in vboList //for example 5 VBOs (depends on the maxVBOSize)
vbo->Bind();
BatchList batchList = vbo->GetBatchList();
foreach batch in batchList
Material material = batch->GetMaterial();
Shader bsdf = ShaderManager::GetBSDFByMaterial(material);
bsdf->Bind();
bsdf->SetMaterial(material);
bsdf->SetTexture(material->GetTexture()); //Bind texture
batch->Render();
In this case each VBO does not contain necessary data that share exactly the same texture/material settings! It depends of the sub-mesh loading order!
So OK, there are less VBO/IBO bindings but not necessary less draw calls! (are you OK by this affirmation ?). But in a general manner I think this version 3 is better than the previous one! What do you think about this ?
Another optimization should be to store all the textures (or group of textures) of the sponza model in array(s) of textures! But if you download the sponza package you will see that all texture has different sizes! So I think they can't be bound together because of their format differences.
But if it's possible, the version 4 of the renderer should use only less texture bindings rather than 25 bindings for the whole mesh! Do you think it's possible ?
So, according to you, what is the best way to render the sponza mesh ? Have you another suggestion ?
You are focused on the wrong things. In two ways.
First, there's no reason you can't stick all of the mesh's vertex data into a single buffer object. Note that this has nothing to do with batching. Remember: batching is about the number of draw calls, not the number of buffers you use. You can render 400 draw calls out of the same buffer.
This "maximum size" that you seem to want to have is a fiction, based on nothing from the real world. You can have it if you want. Just don't expect it to make your code faster.
So when rendering this mesh, there is no reason to be switching buffers at all.
Second, batching is not really about the number of draw calls (in OpenGL). It's really about the cost of the state changes between draw calls.
This video clearly spells out (about 31 minutes in), the relative cost of different state changes. Issuing two draw calls with no state changes between them is cheap (relatively speaking). But different kinds of state changes have different costs.
The cost of changing buffer bindings is quite small (assuming you're using separate vertex formats, so that changing buffers doesn't mean changing vertex formats). The cost of changing programs and even texture bindings is far greater. So even if you had to make multiple buffer objects (which again, you don't have to), that's not going to be the primary bottleneck.
So if performance is your goal, you'd be better off focusing on the expensive state changes, not the cheap ones. Making a single shader that can handle all of the material settings for the entire mesh, so that you only need to change uniforms between them. Use array textures so that you only have one texture binding call. This will turn a texture bind into a uniform setting, which is a much cheaper state change.
There are even fancier things you can do, involving base instance counts and the like. But that's overkill for a trivial example like this.

Sprite Batching: advanced technique

I'm making OpenGL2 based application, which renders over 200 sprites in each iteration. I would like to use less drawcalls, since often I render multiple sprites with same texture. Unfortunately, regular batching technique is not good for me because of Z-Sorting. Draworder of all elements is important, so I can't group them and draw by groups.
I was wondering, is there another batching technique to use in that situation. For example, I could modify shader to work with multiple textures at the same time (sounds like a bad decision though). Share your knowledge.
UPD 09.10.13: I also thought, that atlas textures will reduce drawcalls because of significant material number reducement.
I found that instanced rendering could speed things up A LOT ( tracing 100000 icosahedrons at 2 FPS with normal rendering to over 60 fps with instanced rendering ). There is a good section "Instanced Rendering" in the redbook about that subject. Hope this can be applied to your problem.

shader - CG/GLSL/HLSL how to implement jittering

In computer graphics it's a common technique to apply jittering to sampling positions in order to avoid visible sampling patterns.
What's the proper way to apply jittering to sampl-positions in a fragment shader? One way I could think of would be to feed a noise-texture into the shader, and then depending on the texlvalue of this noise texture alter the sampling-positions of whatever one wants to sample.
Is there a better way of implementing jittering?
The various hardware driver AA schemes usually are jitter-based already -- each vendor has their own favored patterns. And unfortunately the user/player can often mess with the settings and turn things on and off.
This might make you think that there's no "correct" way -- and you'd be right! There are ways, some of which are correct for some tasks. Learn and use what you like.
For surface-texture jittering, you might try something like so:
In photoshop, make a 256x512 rgb image (or 256x511, for some versions of the NVIDIA DDS exporter), fill it ALL with random noise, save as DDS with "use existing mip maps." this will give you roughly-uniform noise regardless of how much the texture is scaled (at least through many useful size ranges).
in your shader, read the noise texture, then apply it to the UVs, and then read your other texture(s), e.g.
float4 noise = tex2D(noiseSampler,OriginalUV);
float2 newUV = OriginalUV + someSmallScale*(noise.xy-0.5);
float4 jitteredColor = tex2D(colorSampler,newUV);
Try other variations as you like. Be aware that strong texture jittering can cause a lot of texture cache misses, which may have a performance cost, and be careful around normal maps, since you are introducing high-frequency components into a signal that's already been filtered.

OpenGL Picking from a large set

I'm trying to, in JOGL, pick from a large set of rendered quads (several thousands). Does anyone have any recommendations?
To give you more detail, I'm plotting a large set of data as billboards with procedurally created textures.
I've seen this post OpenGL GL_SELECT or manual collision detection? and have found it helpful. However it can take my program up to several minutes to complete a rendering of the full set, so I don't think drawing 2x (for color picking) is an option.
I'm currently drawing with calls to glBegin/glVertex.../glEnd. Given that I made the switch to batch rendering on the GPU with vao's and vbo's, do you think I would receive a speedup large enough to facilitate color picking?
If not, given all of the recommendations against using GL_SELECT, do you think it would be worth me using it?
I've investigated multithreaded CPU approaches to picking these quads that completely sidestep OpenGL all together. Do you think a OpenGL-less CPU solution is the way to go?
Sorry for all the questions. My main question remains to be, whats a good way that one can pick from a large set of quads using OpenGL (JOGL)?
The best way to pick from a large number of quad cannot be easily defined. I don't like color picking or similar techniques very much, because they seem to be to impractical for most situations. I never understood why there are so many tutorials that focus on people that are new to OpenGl or even programming focus on picking that is just useless for nearly everything. For exmaple: Try to get a pixel you clicked on in a heightmap: Not possible. Try to locate the exact mesh in a model you clicked on: Impractical.
If you have a large number of quads you will probably need a good spatial partitioning or at least (better also) a scene graph. Ok, you don't need this, but it helps A LOT. Look at some tutorials for scene graphs for further information's, it's a good thing to know if you start with 3D programming, because you get to know a lot of concepts and not only OpenGl code.
So what to do now to start with some picking? Take the inverse of your modelview matrix (iirc with glUnproject(...)) on the position where your mouse cursor is. With the orientation of your camera you can now cast a ray into your spatial structure (or your scene graph that holds a spatial structure). Now check for collisions with your quads. I currently have no link, but if you search for inverse modelview matrix you should find some pages that explain this better and in more detail than it would be practical to do here.
With this raycasting based technique you will be able to find your quad in O(log n), where n is the number of quads you have. With some heuristics based on the exact layout of your application (your question is too generic to be more specific) you can improve this a lot for most cases.
An easy spatial structure for this is for example a quadtree. However you should start with they raycasting first to fully understand this technique.
Never faced such problem, but in my opinion, I think the CPU based picking is the best way to try.
If you have a large set of quads, maybe you can group quads by space to avoid testing all quads. For example, you can group the quads in two boxes and firtly test which box you
I just implemented color picking but glReadPixels is slow here (I've read somehere that it might be bad for asynchron behaviour between GL and CPU).
Another possibility seems to me using transform feedback and a geometry shader that does the scissor test. The GS can then discard all faces that do not contain the mouse position. The transform feedback buffer contains then exactly the information about hovered meshes.
You probably want to write the depth to the transform feedback buffer too, so that you can find the topmost hovered mesh.
This approach works also nice with instancing (additionally write the instance id to the buffer)
I haven't tried it yet but I guess it will be a lot faster then using glReadPixels.
I only found this reference for this approach.
I'm using the solution that I've borrowed from DirectX SDK, there's a nice example how to detect the selected polygon in a vertext buffer object.
The same algorithm works nice with OpenGL.

OpenGL equivalent of GDI's HatchBrush or PatternBrush?

I have a VB6 application (please don't laugh) which does a lot of drawing via BitBlt and the standard VB6 drawing functions. I am running up against performance issues (yes, I do the regular tricks like drawing to memory). So, I decided to investigate other ways of drawing, and have come upon OpenGL.
I've been doing some experimenting, and it seems straightforward to do most of what I want; the application mostly only uses very simple drawing -- relatively large 2D rectangles of solid colors and such -- but I haven't been able to find an equivalent to something like a HatchBrush or PatternBrush.
More specifically, I want to be able to specify a small monochrome pixel pattern, choose a color, and whenever I draw a polygon (or whatever), instead of it being solid, have it automatically tiled with that pattern, not translated or rotated or skewed or stretched, with the "on" bits of the pattern showing up in the specified color, and the "off" bits of the pattern left displaying whatever had been drawn under the area that I am now drawing on.
Obviously I could do all the calculations myself. That is, instead of drawing as a polygon which will somehow automatically be tiled for me, I could calculate all of the lines or pixels or whatever that actually need to be drawn, then draw them as lines or pixels or whatever. But is there an easier way? Like in GDI, where you just say "draw this polygon using this brush"?
I am guessing that "textures" might be able to accomplish what I want, but it's not clear to me (I'm totally new to this and the documentation I've found is not entirely obvious); it seems like textures might skew or translate or stretch the pattern, based upon the vertices of the polygon? Whereas I want the pattern tiled.
Is there a way to do this, or something like it, other than brute force calculation of exactly the pixels/lines/whatever that need to be drawn?
Thanks in advance for any help.
If I understood correctly, you're looking for glPolygonStipple() or glLineStipple().
PolygonStipple is very limited as it allows only 32x32 pattern but it should work like PatternBrush. I have no idea how to implement it in VB though.
First of all, are you sure it's the drawing operation itself that is the bottleneck here? Visual Basic is known for being very slow (Especially if your program is compiled to intermediary VM code - which is the default AFAIRC. Be sure you check the option to compile to native code!), and if it is your code that is the bottleneck, then OpenGL won't help you much - you'll need to rewrite your code in some other language - probably C or C++, but any .NET lang should also do.
OpenGL contains functions that allow you to draw stippled lines and polygons, but you shouldn't use them. They're deprecated for a long time, and got removed from OpenGL in version 3.1 of the spec. And that's for a reason - these functions don't map well to the modern rendering paradigm and are not supported by modern graphics hardware - meaning you will most likely get a slow software fallback if you use them.
The way to go is to use a small texture as a mask, and tile it over the drawn polygons. The texture will get stretched or compressed to match the texture coordinates you specify with the vertices. You have to set the wrapping mode to GL_REPEAT for both texture coordinates, and calculate the right coordinates for each vertex so that the texture appears at its original size, repeated the right amount of times.
You could also use the stencil buffer as you described, but... how would you fill that buffer with the pattern, and do it fast? You would need a texture anyway. Remember that you need to clear the stencil buffer every frame, before you start drawing. Not doing so could cost you a massive performance hit (the exact value of "massive" depending on the graphics hardware and driver version).
It's also possible to achieve the desired effect using a fragment shader, but learning shaders for that would be an overkill for an OpenGL beginner like yourself :-).
Ah, I think I've found it! I can make a stencil across the entire viewport in the shape of the pattern I want (or its mask, I guess), and then enable that stencil when I want to draw with that pattern.
You could just use a texture. Put the pattern in as in image and turn on texture repeating and you are good to go.
Figured this out a a year or two ago.