OpenGL, are unused shadow maps in the shader bad? - opengl

Is it a bad idea to have something like this in your shader code?
uniform sampler2DShadow shadowMaps[8];
uniform int numShadowMaps;
With this one could have an universal shader that works for all cases from 0 to 8 shadow maps. Or is it better for performance to have 8 different shaders with hardcoded shadowmap numbers?

The issue isn't really one of performance so much as taking up so many binding points. Many implementations only allow 16 texture binding points (per shader stage), so you'd be using up half of them just for shadows. Even if they go unused, they're still taking up those resources.
It would be better to use a 2D array texture shadow map sampler (sampler2DArrayShadow). You would still have a uniform to tell you how many array layers are populated with useful data.
Plus, this way you are not hard-coding a limit into your shader. So if you decide that the upper limit of shadow maps is only 6 or should be expanded to 10, you don't have to change your shader code.
The downside of course is that all layers of an array texture must be the same size. So if you need some of your shadow maps to be smaller, you can't do that.

Related

Should Uniform Buffer Objects be used over glUniform when uniform block size is small?

I've recently replaced all usage of glUniform() in my code to instead use Uniform Buffer Objects (UBO) to better organize my object rendering code. However, my performance has sharply plummeted as the number of objects drawn increases (an object in my case is anything with it's own offset into a UBO; only one UBO is shared between everything that uses the same shader program).
I think this may be because my objects actually use very little uniform data, sometimes as little as one vec4, while a UBO requires each offset be in multiples of GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT which can be rather large.
I understood UBO to be a strict upgrade to using glUniform(), am I mistaken? Should I have a glUniform() path for these simple objects, or is there a necessary step to using UBOs effectively with many different simple objects that I haven't taken?
My scene is mostly quads, with very few objects using more than 4 verts, 6 indices. Some of these quads have large uniform blocks while most have very simple ones, maybe one or two vec4.
(For now, I've gotten around this problem by embedding this object data into the vertices, though it definitely feels wrong to do it this way.)

Shader storage buffer object slow – alternative?

I'm trying to make an array of vec3 available to a fragment shader. In the targeted application, there could be several hundred elements.
I tested transferring data in the form of a shader storage buffer object, declared as
layout(binding = 0) buffer voxels { vec3 xyz[]; }
and set using glBufferData, but I found that my fragment shader becomes very slow, even with only 33 elements.
Moreover, when I convert the same data into the GLSL code of a const vec3[] and include it in the shader code, the shader becomes noticeably faster.
Is there a better way – faster than an SSBO and more elegant than creating shader code?
As might already be apparent from the above, the array is only read from in the shader. It is constant within the shader as well as over shader invocations for different fragments, so effectively a uniform, and it is set only once or a few times over the runtime of the program.
I'd recommend using std430 layout specifier on the SSBO given that you are using vec3 data types, otherwise you'll be forced to pad the data, which isn't going to be great. In general, if the buffer is a fixed size, then prefer using glBufferSubData instead of glBufferData (the latter may reallocate memory on the GPU).
As yet another alternative, if you are able to target GL 4.4+, consider using glBufferStorage instead (or even better, if GL4.5 is available, use glCreateuffers, and glNamedBufferStorage). This let's you pass a few more hints to the GL driver about the way in which the buffer will be consumed. I'd try out a few options (e.g. mapping v.s. sub-data v.s. recreating each time).

How do I deal with many variables per triangle in OpenGL?

I'm working with OpenGL and am not totally happy with the standard method of passing values PER TRIANGLE (or in my case, quads) that need to make it to the fragment shader, i.e., assign them to each vertex of the primitive and pass them through the vertex shader to presumably be unnecessarily interpolated (unless using the "flat" directive) in the fragment shader (so in other words, non-varying per fragment).
Is there some way to store a value PER triangle (or quad) that needs to be accessed in the fragment shader in such a way that you don't need redundant copies of it per vertex? Is so, is this way better than the likely overhead of 3x (or 4x) the data moving code CPU side?
I am aware of using geometry shaders to spread the values out to new vertices, but I heard geometry shaders are terribly slow on non up to date hardware. Is this the case?
OpenGL fragment language supports the gl_PrimitiveID input variable, which will be the index of the primitive for the currently processed fragment (starting at 0 for each draw call). This can be used as an index into some data store which holds per-primitive data.
Depending on the amount of data that you will need per primitive, and the number of primitives in total, different options are available. For a small number of primitives, you could just set up a uniform array and index into that.
For a reasonably high number of primitives, I would suggest using a texture buffer object (TBO). This is basically an ordinary buffer object, which can be accessed read-only at random locations via the texelFetch GLSL operation. Note that TBOs are not really textures, they only reuse the existing texture object interface. Internally, it is still a data fetch from a buffer object, and it is very efficient with none of the overhead of the texture pipeline.
The only issue with this approach is that you cannot easily mix different data types. You have to define a base data type for your TBO, and every fetch will get you the data in that format. If you just need some floats/vectors per primitive, this is not a problem at all. If you e.g. need some ints and some floats per primitive, you could either use different TBOs, one for each type, or with modern GLSL (>=3.30), you could use an integer type for the TBO and reinterpret the integer bits as floating point with intBitsToFloat(), so you can get around that limitation, too.
You can use one element in the vertex array for rendering multiple vertices. It's called instanced vertex attributes.

Is it possible to loop through a second VBO in the vertex shader?

So, let say that I have two vertex buffers. One that describes the actual shape I want to draw, and the other one is able to influence the first one.
So, what I actually want to be able to do is something like this:
uniform VBO second_one;
void main()
{
for (int i = 0; i < size_of_array(second_one); ++i)
Do things with second_one[i] to alter the values
create the output informations
}
Things I might want to do can be gravity, that that each point in second_one tries to drag a bit the point closer to it and so on and then after the point is adjusted, apply the matrices to have its actual location.
I would be really surprise that it's possible, or something close to it. But the whole point is to be able to use a second VBO, or the make it as a uniform of type vec3 let say so I can access it.
For what you're wanting, you have three options.
An array of uniforms. GLSL lets you do uniform vec3 stuff[50];. And arrays in GLSL have a .length() method, so you can find out how big they are. Of course, there are limits to the number of uniforms you use, but you shouldn't need more than 20-30 of these. Anything more than that and you'll really feel the performance drain.
Uniform buffer objects. These can store a bit more data than non-block uniforms, but they still have limits. And the storage comes from a buffer object. But accesses to them are, depending on hardware, slightly slower than accesses to direct uniforms.
Buffer textures. This is a way to attach a buffer object to a texture. With this, you can access vast amounts of memory from within a shader. But be warned: they're not fast to access. If you can make due with one of the above methods, do so.
Note that #2 and #3 will only be found on hardware capable of supporting GL 3.x and above. So DX10-class hardware.

GLSL per vertex fixed size array

Is it possible in desktop GLSL to pass a fixed size array of floats to the vertex shader as an attribute? If yes, how?
I want to have per vertex weights for character animation so I would like to have something like the following in my vertex shader:
attribute float weights[25];
How would I fill the attribute array from my C++ & OpenGL program? I have seen in another question that I could get the attribute location of the array attribute and then just add the index to that location. Could someone give an example on that for my pretty large array?
Thanks.
Let's start with what you asked for.
On pretty much no hardware that exists currently will attribute float weights[25]; compile. While shaders can have arrays of attributes, each array index represents a new attribute index. And on all hardware the currently exists, the maximum number of attribute indices is... 16. You'd need 25, and that's just for the weights.
Now, you can mitigate this easily enough by remembering that you can use vec4 attributes. Thus, you store every four array elements in a single attribute. Your array would be attribute vec4 weights[7]; which is doable. Your weight-fetching logic will have to change of course.
Even so, you don't seem to be taking in the ramifications of what this would actually mean for your vertex data. Each attribute represents a component of a vertex's data. Each vertex for a rendering call will have the same amount of data; the contents of that data will differ, but not how much data.
In order to do what you're suggesting, every vertex in your mesh would need 25 floats describing the weight. Even if this was stored as normalized unsigned bytes, that's still 25 extra bytes worth of data at a minimum. That's a lot. Especially considering that for the vast majority of vertices, most of these values will be 0. Even in the worst case, you'd be looking at maybe 6-7 bones affecting an single vertex.
The way skinning is generally done in vertex shaders is to limit the number of bones that affects a single vertex to four. This way, you don't use an array of attributes; you just use a vec4 attribute for the weights. Of course, you also now need to say which bone is associated with which weight. So you have a second vec4 attribute that specifies the bone index for that weight.
This strikes a good balance. You only take up 2 extra attributes (which can be unsigned bytes in terms of size). And for the vast majority of vertices, you'll never even notice, because most vertices are only influenced by 1-3 bones. A few uses 4, and fewer still use 5+. In those cases, you just cut off the lowest weights and recompute the weights of the others proportionately.
Nicol Bolas already gave you an answer how to restructure your task. You should do it, because processing 25 floats for a vertex, probably through some quaternion multiplication will waste a lot of good GPU processing power; most of the attributes for a vertex will translate close to an identity transform anyway.
However for academic reasons I'm going to tell you, how to pass 25 floats per vertex. The key is not using attributes for this, but fetching the data from some buffer, a texture. The GLSL vertex shader stage has the builtin variable gl_VertexID, which passes the index of the currently processed vertex. With recent OpenGL you can access textures from the vertex shader as well. So you have a texture of size vertex_count × 25 holding the values. In your vertex shader you can access them using the texelFetch function, i.e. texelFetch(param_buffer, vec2(gl_VertexID, 3));
If used in skeletal animation this system is often referred to as texture skinning. However it should be used sparingly, as it's a real performance hog. But sometimes you can't avoid it, for example when implementing a facial animation system where you have to weight all the vertices to 26 muscles, if you want to accurately simulate a human face.