GLSL : uniform buffer object example - c++

I have an array of GLubyte of variable size. I want to pass it to fragment shader. I have seen
This thread and this thread. So I decided to use "Uniform Buffer Objects". But being a newbie in GLSL, I do not know:
1 - If I am going to add this to fragment shader, how do I pass size? Should I create a struct?
layout(std140) uniform MyArray
{
GLubyte myDataArray[size]; //I know GLSL doesn't understand GLubyte
};
2- how and where in C++ code associate this buffer object ?
3 - how to deal with casting GLubyte to float?

1 - If I am going to add this to fragment shader, how do I pass size? Should I create a struct?
Using Uniform Buffers (UB), you cannot do this.
size must be static and known when you link your GLSL program. This means it has to be hard-coded into the actual shader.
The modern way around this is to use a feature from GL4 called Shader Storage Buffers (SSB).
SSBs can have variable length (the last field can be declared as an unsized array, like myDataArray[]) and they can also store much more data than UBs.
In older versions of GL, you can use a Buffer Texture to pass large amounts of dynamically sized data into a shader, but that is a cheap hack compared to SSBs and you cannot access the data using a nice struct-like interface either.
3 - how to deal with casting GLubyte to float?
You really would not do this at all, it is considerably more complicated.
The smallest data type you can use in a GLSL data structure is 32-bit. You can pack and unpack smaller pieces of data into a uint if need though using special functions like packUnorm4x8 (...). That was done intentionally, to avoid having to define new data types with smaller sizes.
You can do that even without using any special GLSL functions.
packUnorm4x8 (...) is roughly equivalent to performing the following:
for (int i = 0; i < 4; i++)
packed += round (clamp (vec [i], 0, 1) * 255.0) * pow (2, i * 8);
It takes a 4-component vector of floating-point values in the range [0,1] and does fixed-point arithmetic to pack each of them into an unsigned normalized (unorm) 8-bit integer occupying its own 1/4 of a uint.
Newer versions of GLSL introduce intrinsic functions that do that, but GPUs have actually been doing that sort of thing for as long as shaders have been around. Anytime you read/write a GL_RGBA8 texture from a shader you are basically packing or unpacking 4 8-bit unorms represented by a 32-bit integer.

Related

Vulkan: weird performance of uniform buffer

One of the inputs of my fragment shader is an array of 5 structures. The shader computes a color based on each of the 5 structures. In the end, these 5 colors are summed together to produce the final output. The total size of the array is 1440 bytes. To accommodate the alignment of the uniform buffer, the size of the uniform buffer changes to 1920 bytes.
1- If I define the array of 5 structures as a uniform buffer array, the rendering takes 5ms (measured by Nsight Graphics). The uniform buffer's memory property is 'VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT'. The uniform buffer in glsl is defined as follows
layout(set=0,binding=0) uniform UniformStruct { A a; } us[];
layout(location=0) out vec4 c;
void main()
{
vec4 col = vec4(0);
for (int i = 0; i < 5; i++)
col += func(us[nonuniformEXT(i)]);
c = col;
}
Besides, I'm using 'GL_EXT_nonuniform_qualifier' extension to access the uniform buffer array. This seems the most straightforward way for me but there are alternative implementations.
2- I can split the rendering from one vkCmdDraw to five vkCmdDraw, change the framebuffer's blend mode from overwriting to addition and define a uniform buffer instead of a uniform buffer array in the fragment shader. On the CPU side, I change the descriptor type from UNIFORM_BUFFER to UNIFORM_BUFFER_DYNAMICS. Before each vkCmdDraw, I bind the dynamic uniform buffer and the corresponding offsets. In the fragment shader, the for loop is removed. Although it seems that it should be slower than the first method, it is surprisingly much faster than the first method. The rendering only takes 2ms total for 5 draws.
3- If I define the array of 5 structures as a storage buffer and do one vkCmdDraw, the rendering takes only 1.4ms. In other words, if I change the array from the uniform buffer array to storage buffer but keep anything else the same as 1, it becomes faster.
4- If I define the array of 5 structures as a global constant in the glsl and do one vkCmdDraw, the rendering takes only 0.5ms.
In my opinion, 4 should be the fastest way, which is true in the test. Then 1 should be the next. Both 2 and 3 should be slower than 1. However, Neither 2 or 3 is slower than 1. In contrast, they are much faster than 1. Any ideas why using uniform buffer array slows down the rendering? Is it because it is a host visible buffer?
When it comes to UBOs, there are two kinds of hardware: the kind where UBOs are specialized hardware and the kind where they aren't. For GPUs where UBOs are not specialized hardware, a UBO is really just a readonly SSBO. You can usually tell the difference because hardware where UBOs are specialized will have different size limits on them from those of SSBOs.
For specialized hardware-based UBOs (which NVIDIA still uses, if I recall correctly), each UBO represents an upload from memory into a big block of constant data that all invocations of a particular shader stage can access.
For this kind of hardware, an array of UBOs is basically creating an array out of segments of this block of constant data. And some hardware has multiple blocks of constant data, so indexing then with non-constant expressions is tricky. This is why non-constant access to such indices is an optional feature of Vulkan.
By contrast, a UBO which contains an array is just one big UBO. It's special only in how big it is. Indexing through an array within a UBO is no different from indexing any array. There are no special rules with regard to the uniformity of the index of such accesses.
So stop using an array of UBOs and just use a single UBO which contains an array of data:
layout(set=0,binding=0) uniform UniformStruct { A a[5]; } us;
It'll also avoid additional padding due to alignment, additional descriptors, additional buffers, etc.
However, you might also speed things up by not lying to Vulkan. The expression nonuniformEXT(i) states that the expression i is not dynamically uniform. This is incorrect. Every shader invocation that executes this loop will generate i expressions that have values from 0 to 4. Every dynamic instance of the expression i for any invocation will have the same value at that place in the code as every other.
Therefore i is dynamically uniform, so telling Vulkan that it isn't is not helpful.

(unsigned) byte in GLSL

Some of my vertex attributes are single unsigned bytes, I need them in my GLSL fragment shader, not for any "real" calculations, but for comparing them (like enums if you will). I didnt find any unsigned byte or even byte data type in GLSL, so is there a way as using it as an input? If not (which at the moment it seems to be) what is the purpose of GL_UNSIGNED_BYTE?
GLSL doesn't deal in sized types (well, not sized types smaller than 32-bits). It only has signed/unsigned integers, floats, doubles, booleans, and vectors/matrices of them. If you pass an unsigned byte as an integer vertex attribute to a vertex shader, then it can read it as a uint type, which is 32-bits in size. Passing integral attributes requires the use of glVertexAttribIPointer/IFormat (note the "I").
The vertex shader can then pass this value to the fragment shader as a uint type (but only with the flat interpolation qualifier). Of course, every fragment for a triangle will get the same value.

How do I deal with many variables per triangle in OpenGL?

I'm working with OpenGL and am not totally happy with the standard method of passing values PER TRIANGLE (or in my case, quads) that need to make it to the fragment shader, i.e., assign them to each vertex of the primitive and pass them through the vertex shader to presumably be unnecessarily interpolated (unless using the "flat" directive) in the fragment shader (so in other words, non-varying per fragment).
Is there some way to store a value PER triangle (or quad) that needs to be accessed in the fragment shader in such a way that you don't need redundant copies of it per vertex? Is so, is this way better than the likely overhead of 3x (or 4x) the data moving code CPU side?
I am aware of using geometry shaders to spread the values out to new vertices, but I heard geometry shaders are terribly slow on non up to date hardware. Is this the case?
OpenGL fragment language supports the gl_PrimitiveID input variable, which will be the index of the primitive for the currently processed fragment (starting at 0 for each draw call). This can be used as an index into some data store which holds per-primitive data.
Depending on the amount of data that you will need per primitive, and the number of primitives in total, different options are available. For a small number of primitives, you could just set up a uniform array and index into that.
For a reasonably high number of primitives, I would suggest using a texture buffer object (TBO). This is basically an ordinary buffer object, which can be accessed read-only at random locations via the texelFetch GLSL operation. Note that TBOs are not really textures, they only reuse the existing texture object interface. Internally, it is still a data fetch from a buffer object, and it is very efficient with none of the overhead of the texture pipeline.
The only issue with this approach is that you cannot easily mix different data types. You have to define a base data type for your TBO, and every fetch will get you the data in that format. If you just need some floats/vectors per primitive, this is not a problem at all. If you e.g. need some ints and some floats per primitive, you could either use different TBOs, one for each type, or with modern GLSL (>=3.30), you could use an integer type for the TBO and reinterpret the integer bits as floating point with intBitsToFloat(), so you can get around that limitation, too.
You can use one element in the vertex array for rendering multiple vertices. It's called instanced vertex attributes.

(OpenGL 3.1 - 4.2) Dynamic Uniform Arrays?

Lets say I have 2 species such as humans and ponies. They have different skeletal systems so the uniform bone array will have to be different for each species. Do I have to implement two separate shader programs able to render each bone array properly or is there a way to dynamically declare uniform arrays and iterate through that dynamic array instead?
Keeping in mind performance (There's all of the shaders suck at decision branching going around).
Until OpenGL 4.3, arrays in GLSL had to be of a fixed, compile-time size. 4.3 allows the use of shader storage buffer objects, which allow for their ultimate length to be "unbounded". Basically, you can do this:
buffer BlockName
{
mat4 manyManyMatrices[];
};
OpenGL will figure out how many matrices are in this array at runtime based on how you use glBindBufferRange. So you can still use manyManyMatrices.length() to get the length, but it won't be a compile-time constant.
However, this feature is (at the time of this edit) very new and only implemented in beta. It also requires GL 4.x-class hardware (aka: Direct3D 11-class hardware). Lastly, since it uses shader storage blocks, accessing the data may be slower than one might hope for.
As such, I would suggest that you just use a uniform block with the largest number of matrices that you would use. If that becomes a memory issue (unlikely), then you can split your shaders based on array size or use shader storage blocks or whatever.
You can use n-by-1-Textures as a replacement for arrays. Texture size can be specified at run-time. I use this approach for passing an arbitrary number of lights to my shaders. I'm surprised how fast it runs despite the many loops and branches. For an example see the polygon.f shader file in the jogl3.glsl.nontransp in the jReality sources.
uniform sampler2D sys_globalLights;
uniform int sys_numGlobalDirLights;
uniform int sys_numGlobalPointLights;
uniform int sys_numGlobalSpotLights;
...
int lightTexSize = sys_numGlobalDirLights*3+sys_numGlobalPointLights*3+sys_numGlobalSpotLights*5;
for(int i = 0; i < numDir; i++){
vec4 dir = texture(sys_globalLights, vec2((3*i+1+0.5)/lightTexSize, 0));
...

Dynamic number of uniform blocks

Running openGL 3.1, the question is simple.
From GLSL site, here is how one can define array of uniform buffer blocks:
uniform BlockName
{
vec3 blockMember1, blockMember2;
float blockMember3;
} multiBlocks[3];
Now, is it possible to have dynamic number of these multiBlocks? There are no pointers in GLSL so no "new" statement etc.
If not, is there other approach to send dynamic number of elements?
My block is currently packing four floats and one vec2.
I haven't wrote shader yet so you can suggest anything, thanks ;)
You can't have a dynamic number of them, and you can't have a dynamic index into them. That means that even if you could change the count dynamically, it would be of little use since you'd still have to change the shader code to access the new elements.
One possible alternative would be to make the block members arrays:
#define BLOCK_COUNT %d
uniform BlockName
{
vec3 blockMember1[BLOCK_COUNT];
vec3 blockMember2[BLOCK_COUNT];
float blockMember3[BLOCK_COUNT];
}
multiBlocks;
Then you can alter BLOCK_COUNT to change the number of members, and you can use dynamic indexes just fine:
multiBlocks.blockMember2[i];
It still doesn't allow you to alter the number of elements without recompiling the shader, though.
Ok so i wrote also to openGL forum and this came out
So basicaly you have 3 solutions:
Uniform buffer objects or Texture buffers or static array with some high number of prepared elements and use another uniform for specifying actual size.
The last one could be upgraded with OneSadCookie's definition of max size in compile time.