Shader storage buffer object slow – alternative? - opengl

I'm trying to make an array of vec3 available to a fragment shader. In the targeted application, there could be several hundred elements.
I tested transferring data in the form of a shader storage buffer object, declared as
layout(binding = 0) buffer voxels { vec3 xyz[]; }
and set using glBufferData, but I found that my fragment shader becomes very slow, even with only 33 elements.
Moreover, when I convert the same data into the GLSL code of a const vec3[] and include it in the shader code, the shader becomes noticeably faster.
Is there a better way – faster than an SSBO and more elegant than creating shader code?
As might already be apparent from the above, the array is only read from in the shader. It is constant within the shader as well as over shader invocations for different fragments, so effectively a uniform, and it is set only once or a few times over the runtime of the program.

I'd recommend using std430 layout specifier on the SSBO given that you are using vec3 data types, otherwise you'll be forced to pad the data, which isn't going to be great. In general, if the buffer is a fixed size, then prefer using glBufferSubData instead of glBufferData (the latter may reallocate memory on the GPU).
As yet another alternative, if you are able to target GL 4.4+, consider using glBufferStorage instead (or even better, if GL4.5 is available, use glCreateuffers, and glNamedBufferStorage). This let's you pass a few more hints to the GL driver about the way in which the buffer will be consumed. I'd try out a few options (e.g. mapping v.s. sub-data v.s. recreating each time).

Related

Should Uniform Buffer Objects be used over glUniform when uniform block size is small?

I've recently replaced all usage of glUniform() in my code to instead use Uniform Buffer Objects (UBO) to better organize my object rendering code. However, my performance has sharply plummeted as the number of objects drawn increases (an object in my case is anything with it's own offset into a UBO; only one UBO is shared between everything that uses the same shader program).
I think this may be because my objects actually use very little uniform data, sometimes as little as one vec4, while a UBO requires each offset be in multiples of GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT which can be rather large.
I understood UBO to be a strict upgrade to using glUniform(), am I mistaken? Should I have a glUniform() path for these simple objects, or is there a necessary step to using UBOs effectively with many different simple objects that I haven't taken?
My scene is mostly quads, with very few objects using more than 4 verts, 6 indices. Some of these quads have large uniform blocks while most have very simple ones, maybe one or two vec4.
(For now, I've gotten around this problem by embedding this object data into the vertices, though it definitely feels wrong to do it this way.)

Why samplers cannot be part of uniform blocks in OpenGL, and any ways to get around it?

I want to render a scene to texture, and share the texture sampler in several programs. Similar to share the project view matrix in multiple programs. Unlike project view matrix, which can be put into uniform blocks, "Samplers cannot be part of uniform blocks". https://www.opengl.org/wiki/Uniform_(GLSL)
Some discussion here describes why:
multiple issues:
GLSL samplers embody two things: source of texture data and how to filter from it (nearest, linear, mipmapping, anisotropic, etc)
Samples are opaque things, and to place them into a uniform buffer object requires they have a well defined (and cross-platform) size.
Those issues together make it dicey.
I want to see some explanation of why this cannot be handled and any other ways to achieve sharing texture samplers.
The explanation is right here:
Samples are opaque things, and to place them into a uniform buffer object requires they have a well defined (and cross-platform) size.
The purpose of uniform blocks is, that you can set then with a single OpenGL call from a single structure in your (client) program. But for this to work the memory layout of this structure must be known, so that the memory layout produced by the compiler matches that of the shader uniform block.
But since the memory layout of samplers is not defined a memory layout can not be determined. Without a definitive memory layout no structure, without a structure no block.

Shader Storage Block vs Uniform Blocks

I've read now, that you can't write to uniform blocks, so shader storage block has an advantage over uniform blocks. Furthermore the size of a shader storage block (the upper limit) is much higher.
What i don't get is the atomic operations attribute of a shader Storage Block, when can this become handy? Is there a real-life example?
Furthermore When i would prefer one over the other?
I think your question is ill-posed. It sounds like you are trying to figure out the difference between uniform buffers and shader storage buffers. Blocks are simply a way to organize your shader inputs and output.
As you noted, the biggest difference between uniform buffers and shader storage buffers is that you can write to shader storage buffers from your shader programs.
Asking why writing to a ssbo is handy is like asking why a variable is handy. Anytime you want to accumulate results or share data between render passes you can use the ssbo as "scratch memory".
In the old days (I believe) you would have had to do a render to texture if you wanted to share data, and that would have gone through the whole entire graphics pipeline with all the cost that that entails.
More about uniform buffer objects here:
https://www.opengl.org/wiki/Shader_Storage_Buffer_Object
To really make sure you understand the difference, look up the various ways to supply shaders with data in chronological order:
Textures & Frame Buffer Objects
Uniforms
Uniform Buffer Objects
Texture Buffer Objects
Textures with Image Load/Store
Shader Storage Buffer Objects
This answer on SO has a nice overview of almost all of them:
Passing a list of values to fragment shader

How do I deal with many variables per triangle in OpenGL?

I'm working with OpenGL and am not totally happy with the standard method of passing values PER TRIANGLE (or in my case, quads) that need to make it to the fragment shader, i.e., assign them to each vertex of the primitive and pass them through the vertex shader to presumably be unnecessarily interpolated (unless using the "flat" directive) in the fragment shader (so in other words, non-varying per fragment).
Is there some way to store a value PER triangle (or quad) that needs to be accessed in the fragment shader in such a way that you don't need redundant copies of it per vertex? Is so, is this way better than the likely overhead of 3x (or 4x) the data moving code CPU side?
I am aware of using geometry shaders to spread the values out to new vertices, but I heard geometry shaders are terribly slow on non up to date hardware. Is this the case?
OpenGL fragment language supports the gl_PrimitiveID input variable, which will be the index of the primitive for the currently processed fragment (starting at 0 for each draw call). This can be used as an index into some data store which holds per-primitive data.
Depending on the amount of data that you will need per primitive, and the number of primitives in total, different options are available. For a small number of primitives, you could just set up a uniform array and index into that.
For a reasonably high number of primitives, I would suggest using a texture buffer object (TBO). This is basically an ordinary buffer object, which can be accessed read-only at random locations via the texelFetch GLSL operation. Note that TBOs are not really textures, they only reuse the existing texture object interface. Internally, it is still a data fetch from a buffer object, and it is very efficient with none of the overhead of the texture pipeline.
The only issue with this approach is that you cannot easily mix different data types. You have to define a base data type for your TBO, and every fetch will get you the data in that format. If you just need some floats/vectors per primitive, this is not a problem at all. If you e.g. need some ints and some floats per primitive, you could either use different TBOs, one for each type, or with modern GLSL (>=3.30), you could use an integer type for the TBO and reinterpret the integer bits as floating point with intBitsToFloat(), so you can get around that limitation, too.
You can use one element in the vertex array for rendering multiple vertices. It's called instanced vertex attributes.

Is it possible to loop through a second VBO in the vertex shader?

So, let say that I have two vertex buffers. One that describes the actual shape I want to draw, and the other one is able to influence the first one.
So, what I actually want to be able to do is something like this:
uniform VBO second_one;
void main()
{
for (int i = 0; i < size_of_array(second_one); ++i)
Do things with second_one[i] to alter the values
create the output informations
}
Things I might want to do can be gravity, that that each point in second_one tries to drag a bit the point closer to it and so on and then after the point is adjusted, apply the matrices to have its actual location.
I would be really surprise that it's possible, or something close to it. But the whole point is to be able to use a second VBO, or the make it as a uniform of type vec3 let say so I can access it.
For what you're wanting, you have three options.
An array of uniforms. GLSL lets you do uniform vec3 stuff[50];. And arrays in GLSL have a .length() method, so you can find out how big they are. Of course, there are limits to the number of uniforms you use, but you shouldn't need more than 20-30 of these. Anything more than that and you'll really feel the performance drain.
Uniform buffer objects. These can store a bit more data than non-block uniforms, but they still have limits. And the storage comes from a buffer object. But accesses to them are, depending on hardware, slightly slower than accesses to direct uniforms.
Buffer textures. This is a way to attach a buffer object to a texture. With this, you can access vast amounts of memory from within a shader. But be warned: they're not fast to access. If you can make due with one of the above methods, do so.
Note that #2 and #3 will only be found on hardware capable of supporting GL 3.x and above. So DX10-class hardware.