Correct struct layout in GLSL bindless texture handles - c++

I've been trying to use the following code to do a global list of bindless texture handles, sent to the GPU using a UBO.
struct Material
{
sampler2D diff;
sampler2D spec;
sampler2D norm;
};
layout(std140, binding = 2) uniform Materials
{
Material materials[64];
};
However, I think I am filling in the buffer wrong in c++, not taking into account the correct offsets etc. I can't seem to find anything on how the std140 layout handles sampler2D. How should I be doing this? What offsets do I need to take into account?

There's nothing special about handles in this regard. The standard says:
If the member is a scalar consuming N basic machine units, the base align-
ment is N.
Samplers are effectively 64-bit integers as far as being "scalars" are concerned. So the base alignment of those members is 64-bit integers. But that's not really relevant, because in std140, the alignment of a struct is always rounded up to the size of a vec4. So that struct will take up 32 bytes.

Related

What use has the layout specifier scalar in EXT_scalar_block_layout?

Question
What use has the scalar layout specifier when accessing a storage buffer in GL_EXT_scalar_block_layout? (see below for example)
What would be use case for scalar?
Background
I recently programmed a simple Raytracer using Vulkan and NVidias VkRayTracing extension and was following this tutorial. In the section about the closest hit shader it is required to access some data that's stored in, well storage buffers (with usage flags vk::BufferUsageFlagBits::eStorageBuffer).
In the shader the extension GL_EXT_scalar_block_layout is used and those buffers are accessed like this:
layout(binding = 4, set = 1, scalar) buffer Vertices { Vertex v[]; } vertices[];
When I first used this code the validation layers told me that the structs like Vertex had an invalid layout, so I changed them to have each member aligned on 16byte blocks:
struct Vertex {
vec4 position;
vec4 normal;
vec4 texCoord;
};
with the corresponding struct in C++:
#pragma pack(push, 1)
struct Vertex {
glm::vec4 position_1unused;
glm::vec4 normal_1unused;
glm::vec4 texCoord_2unused;
};
#pragma pack(pop)
Errors disappeared and I got a working Raytracer. But I still don't understand why the scalar keyword is used here. I found this document talking about the GL_EXT_scalar_block_layout-extension, but I really don't understand it. Probably I'm just not used to glsl terminology? I can't see any reason why I would have to use this.
Also I just tried to remove the scalar and it still worked without any difference, warnings or erros whatsoever. Would be grateful for any clarification or further resources on this topic.
The std140 and std430 layouts do quite a bit of rounding of the offsets/alignments sizes of objects. std140 basically makes any non-scalar type aligned to the same alignment as a vec4. std430 relaxes that somewhat, but it still does a lot of rounding up to a vec4's alignment.
scalar layout means basically to layout the objects in accord with their component scalars. Anything that aggregates components (vectors, matrices, arrays, and structs) does not affect layout. In particular:
All types are sized/aligned only to the highest alignment of the scalar components that they actually use. So a struct containing a single uint is sized/aligned to the same size/alignment as a uint: 4 bytes. Under std140 rules, it would have 16-byte size and alignment.
Note that this layout makes vec3 and similar types actually viable, because C and C++ would then be capable of creating alignment rules that map to those of GLSL.
The array stride of elements in the array is based solely on the size/alignment of the element type, recursively. So an array of uint has an array stride of 4 bytes; under std140 rules, it would have a 16-byte stride.
Alignment and padding only matter for scalars. If you have a struct containing a uint followed by a uvec2, in std140/430, this will require 16 bytes, with 4 bytes of padding after the first uint. Under scalar layout, such a struct only takes 12 bytes (and is aligned to 4 bytes), with the uvec2 being conceptually misaligned. Padding therefore only exists if you have smaller scalars, like a uint16 followed by a uint.
In the specific case you showed, scalar layout was unnecessary since all of the types you used are vec4s.

GLSL array of struct members locations

For example, I have code like this:
uniform struct MyStruct {
mat4 model;
mat4 view;
mat4 projection;
float f1;
vec4 v1;
}, myStructs[4];
Can I be sure that location of myStructs[1].projection is location of myStructs[0].projection + 5?
I didn't find the exact information about this on khronos.org, but I found some blurry statement:
struct Thingy
{
vec4 an_array[3];
int foo;
};
layout(location = 2) uniform Thingy some_thingies[6];
Each Thingy takes up 4 uniform locations; the first three going to
an_array and the fourth going to foo. Thus, some_thingies takes up 24
uniform locations.
It isn't clear here whether locations one after another. Perhaps about this somewhere is said more accurately?
Unless you explicitly specify the location of the uniform variable, the locations of arrays of non-basic types are not strictly defined, relative to the location of any particular member of that array. So you must query the location of every member of every array/struct that you use.
Or just explicitly specify the location with layout(location). That's a much easier option; those are explicitly required to allocate their locations sequentially. And for bonus points, you don't have to query anything.
Your first example uniform struct MyStruct is a raw uniform whose member locations are arbitrary and must be queried:
Uniform locations are unique to a specific program. If you do not explicitly assign a uniform to a location (via the OpenGL 4.3 or ARB_explicit_uniform_location feature mentioned above), then OpenGL will assign them arbitrarily.
Your second example layout(location = 2) uniform Thingy some_thingies[6]; is defining a uniform block to which the following memory layout applies:
Quote Memory layout:
The specific size of basic types used by members of buffer-backed blocks is defined by OpenGL. However, implementations are allowed some latitude when assigning padding between members, as well as reasonable freedom to optimize away unused members. How much freedom implementations are allowed for specific blocks can be changed.
There are four memory layout qualifiers: shared, packed, std140, and std430. Defaults can be set the same as for matrix ordering (eg: layout(packed) buffer; sets all shader storage buffer blocks to use packed). The default is shared.
So it seems that they are sequential in memory, but as t.niese points out: only std140 and std430 provide you with those guarantees (note that std430 can only be used with shader storage blocks, not uniform blocks). Since the default layout is shared some parts of your uniform might have been optimised out or padded differently, depending on your driver.
Use glGetUniformLocation to query each location of the members separately:
Uniform variables that are structures or arrays of structures may be queried by calling glGetUniformLocation for each field within the structure. The array element operator "[]" and the structure field operator "." may be used in name​ in order to select elements within an array or fields within a structure. The result of using these operators is not allowed to be another structure, an array of structures, or a subcomponent of a vector or a matrix. Except if the last part of name​ indicates a uniform variable array, the location of the first element of an array can be retrieved by using the name of the array, or by using the name appended by "[0]".

Questions about uniform buffer objects

Is it guaranteed that if a uniform block is declared the same in multiple shader programs, say
uniform Matrices
{
mat4 ProjectionMatrix;
mat4 CameraMatrix;
mat4 ModelMatrix;
};
Will it have the same block index returned by glGetUniformBlockIndex(program, "Matrices")?
If the answer is yes, then I'm able to query the index of the block once and use it for all the shader programs that contain that block, right?
Second question: will ProjectionMatrix, CameraMatrix, ModelMatrix, always have the same layout order in memory, respectively? I'm asking this because the tutorial I read uses the next functions
// Query for the offsets of each block variable
const GLchar *names[] = { "InnerColor", "OuterColor",
"RadiusInner", "RadiusOuter" };
GLuint indices[4];
glGetUniformIndices(programHandle, 4, names, indices);
GLint offset[4];
glGetActiveUniformsiv(programHandle, 4, indices,
GL_UNIFORM_OFFSET, offset);
And I'm not sure if that's really needed, as long as I know the uniforms order inside the uniform block..?
will ProjectionMatrix, CameraMatrix, ModelMatrix, always have the same layout order in memory, respectively?
No. Here's what the standard states (emphasis mine):
If pname is UNIFORM_BLOCK_DATA_SIZE, then the implementation-
dependent minimum total buffer object size, in basic machine units, required to hold all active uniforms in the uniform block identified by uniformBlockIndex is returned. It is neither guaranteed nor expected that a given implementation will arrange uniform values as tightly packed in a buffer object. The exception to this is the std140 uniform block layout, which guarantees specific packing behavior and does not require the application to query for offsets and strides.
I'm not sure if that's really needed, as long as I know the uniforms order inside the uniform block..?
So, yes, the author is right in not assuming the layout is contiguous and does what's sensible (guaranteed to work always in all implementations): gets the uniform indices and assigns their values respectively.
Specifying layout(std140) will do the trick then, right?
Yes, you can avoid querying the location and uploading data every time by making use of both uniform buffer objects and std140. However, make sure you understand its alignment requirements. This information is detailed in ARB_uniform_buffer_object's specification. For an elaborate treatment with examples see OpenTK's article on Uniform Buffer Objects (UBO) using the std140 layout specification.
Is it guaranteed that if a uniform block is declared the same in multiple shader programs, will it have the same block index returned by glGetUniformBlockIndex(program, "Matrices")?
No. I've searched the OpenGL 3.3 specification which gives no such guarantees. From the standard's viewpoint, uniform blocks (default or named) are associated to a program, period. No existence/association of uniform blocks beyond a program is made in the specification.
Because there is no guarantee that uniform blocks will have the same index in different shader program, that means I need to call glBindBufferBase() everytime I switch programs, right?
Yes, see ARB_uniform_buffer_object's specification for an example.

When should I use STD140 in OpenGL?

When do I use the STD140 for uniform blocks in OpenGL?
Although I am not a 100% sure, I believe there is an alternative to it which can achieve the same thing, called "Shared".
Is it just preference for the coder? Or are there reasons to use one over the other?
Uniform buffer objects are described in http://www.opengl.org/registry/specs/ARB/uniform_buffer_object.txt
The data storage for a uniform block can be declared to use one of three layouts in memory: packed, shared, or std140.
packed uniform blocks have an implementation-dependent data layout for efficiency, and unused uniforms may be eliminated by the compiler to save space.
shared uniform blocks, the default layout, have an implementation-dependent data layout for efficiency, but the layout will be uniquely determined by the structure of the block, allowing data storage to be shared across programs.
std140 uniform blocks have a standard cross-platform cross-vendor layout. Unused uniforms will not be eliminated.
The std140 uniform block layout, which guarantees specific packing behavior and does not require the application to query for offsets and strides. In this case, the minimum size may still be queried, even though it is determined in advance based only on the uniform block declaration. The offset of each uniform in a uniform block can be derived from the definition of the uniform block by applying the set of rules described in the OpenGL specification.
std140 is most useful when you have a uniform block that you update all at once, for example a collection of matrix and lighting values for rendering a scene. Declare the block with std140 in your shader(s), and you can replicate the memory layout in C with a struct. Instead of having to query and save the offsets for every individual value within the block from C, you can just glBufferData(GL_UNIFORM_BUFFER, sizeof(my_struct), &my_struct, with one call.
You do need to be a little careful with alignment in C, for instance, a vec3 will take up 4 floats, not 3, but it is still much easier IMHO.

(OpenGL 3.1 - 4.2) Dynamic Uniform Arrays?

Lets say I have 2 species such as humans and ponies. They have different skeletal systems so the uniform bone array will have to be different for each species. Do I have to implement two separate shader programs able to render each bone array properly or is there a way to dynamically declare uniform arrays and iterate through that dynamic array instead?
Keeping in mind performance (There's all of the shaders suck at decision branching going around).
Until OpenGL 4.3, arrays in GLSL had to be of a fixed, compile-time size. 4.3 allows the use of shader storage buffer objects, which allow for their ultimate length to be "unbounded". Basically, you can do this:
buffer BlockName
{
mat4 manyManyMatrices[];
};
OpenGL will figure out how many matrices are in this array at runtime based on how you use glBindBufferRange. So you can still use manyManyMatrices.length() to get the length, but it won't be a compile-time constant.
However, this feature is (at the time of this edit) very new and only implemented in beta. It also requires GL 4.x-class hardware (aka: Direct3D 11-class hardware). Lastly, since it uses shader storage blocks, accessing the data may be slower than one might hope for.
As such, I would suggest that you just use a uniform block with the largest number of matrices that you would use. If that becomes a memory issue (unlikely), then you can split your shaders based on array size or use shader storage blocks or whatever.
You can use n-by-1-Textures as a replacement for arrays. Texture size can be specified at run-time. I use this approach for passing an arbitrary number of lights to my shaders. I'm surprised how fast it runs despite the many loops and branches. For an example see the polygon.f shader file in the jogl3.glsl.nontransp in the jReality sources.
uniform sampler2D sys_globalLights;
uniform int sys_numGlobalDirLights;
uniform int sys_numGlobalPointLights;
uniform int sys_numGlobalSpotLights;
...
int lightTexSize = sys_numGlobalDirLights*3+sys_numGlobalPointLights*3+sys_numGlobalSpotLights*5;
for(int i = 0; i < numDir; i++){
vec4 dir = texture(sys_globalLights, vec2((3*i+1+0.5)/lightTexSize, 0));
...