Properties of the stride in a glTF file? - c++

The Khronos docs define the stride as:
When a buffer view is used for vertex attribute data, it may have a byteStride property. This property defines the stride in bytes between each vertex.
I am somewhat confused as, many of the examples I have tried already (3 of them) had a stride of 0 so I merely ignored the attribute until now. Those examples render just fine.
I was inferring the "stride" from the type. e.g. if the type was a vec3 and the component type was float, I loaded every 12 bytes as one element. Some things that I am not entirely sure about reading the spec are,
When stride is non 0, does this mean the data could be interleaved?
When the stride is non 0, can the data be non continuous (e.g. padding bytes)?
In other words, can you run into situations where the buffer is not interleaved but the total size of sizeof(type_component) * element_count is not a divisor of the total subsection of memory to be read?

Yes, accessors (in glTF) are like vertex attributes in OpenGL/WebGL, and are allowed to interleave. The stride is on the bufferView, to force accessors that share that bufferView to all have the same stride. A value of zero means "tightly packed".
Note that you may interleave components of different sizes, such as vec3 (POSITION) with vec2 (TEXCOORD_0), so a stride might be the sum of different sizes.
Here's a diagram from the Data Interleaving section of the glTF tutorial. It's a little small here but you can click for a larger view. In this example, there are two accessors, one for POSITION and one for NORMAL, sharing a single BufferView.

Related

What use has the layout specifier scalar in EXT_scalar_block_layout?

Question
What use has the scalar layout specifier when accessing a storage buffer in GL_EXT_scalar_block_layout? (see below for example)
What would be use case for scalar?
Background
I recently programmed a simple Raytracer using Vulkan and NVidias VkRayTracing extension and was following this tutorial. In the section about the closest hit shader it is required to access some data that's stored in, well storage buffers (with usage flags vk::BufferUsageFlagBits::eStorageBuffer).
In the shader the extension GL_EXT_scalar_block_layout is used and those buffers are accessed like this:
layout(binding = 4, set = 1, scalar) buffer Vertices { Vertex v[]; } vertices[];
When I first used this code the validation layers told me that the structs like Vertex had an invalid layout, so I changed them to have each member aligned on 16byte blocks:
struct Vertex {
vec4 position;
vec4 normal;
vec4 texCoord;
};
with the corresponding struct in C++:
#pragma pack(push, 1)
struct Vertex {
glm::vec4 position_1unused;
glm::vec4 normal_1unused;
glm::vec4 texCoord_2unused;
};
#pragma pack(pop)
Errors disappeared and I got a working Raytracer. But I still don't understand why the scalar keyword is used here. I found this document talking about the GL_EXT_scalar_block_layout-extension, but I really don't understand it. Probably I'm just not used to glsl terminology? I can't see any reason why I would have to use this.
Also I just tried to remove the scalar and it still worked without any difference, warnings or erros whatsoever. Would be grateful for any clarification or further resources on this topic.
The std140 and std430 layouts do quite a bit of rounding of the offsets/alignments sizes of objects. std140 basically makes any non-scalar type aligned to the same alignment as a vec4. std430 relaxes that somewhat, but it still does a lot of rounding up to a vec4's alignment.
scalar layout means basically to layout the objects in accord with their component scalars. Anything that aggregates components (vectors, matrices, arrays, and structs) does not affect layout. In particular:
All types are sized/aligned only to the highest alignment of the scalar components that they actually use. So a struct containing a single uint is sized/aligned to the same size/alignment as a uint: 4 bytes. Under std140 rules, it would have 16-byte size and alignment.
Note that this layout makes vec3 and similar types actually viable, because C and C++ would then be capable of creating alignment rules that map to those of GLSL.
The array stride of elements in the array is based solely on the size/alignment of the element type, recursively. So an array of uint has an array stride of 4 bytes; under std140 rules, it would have a 16-byte stride.
Alignment and padding only matter for scalars. If you have a struct containing a uint followed by a uvec2, in std140/430, this will require 16 bytes, with 4 bytes of padding after the first uint. Under scalar layout, such a struct only takes 12 bytes (and is aligned to 4 bytes), with the uvec2 being conceptually misaligned. Padding therefore only exists if you have smaller scalars, like a uint16 followed by a uint.
In the specific case you showed, scalar layout was unnecessary since all of the types you used are vec4s.

Confusion About glVertexAttrib... Functions

After a lot of searching, I still am confused about what the glVertexAttrib... functions (glVertexAttrib1d, glVertexAttrib1f, etc.) do and what their purpose is.
My current understanding from reading this question and the documentation is that their purpose is to somehow set a vertex attribute as constant (i.e. don't use an array buffer). But the documentation also talks about how they interact with "generic vertex attributes" which are defined as follows:
Generic attributes are defined as four-component values that are organized into an array. The first entry of this array is numbered 0, and the size of the array is specified by the implementation-dependent constant GL_MAX_VERTEX_ATTRIBS. Individual elements of this array can be modified with a glVertexAttrib call that specifies the index of the element to be modified and a value for that element.
It says that they are all "four-component values", yet it is entirely possible to have more or less components than that in a vertex attribute.
What is this saying exactly? Does this only work for vec4 types? What would be the index of a "generic vertex attribute"? A clear explanation is probably what I really need.
In OpenGL, a vertex is specified as a set of vertex attributes. With the advent of the programmable pipleine, you are responsible for writing your own vertex processing functionality. The vertex shader does process one vertex, and gets this specific vertex' attributes as input.
These vertex attributes are called generic vertex attributes, since their meaning is completely defined by you as the application programmer (in contrast to the legacy fixed function pipeline, where the set of attributes were completely defined by the GL).
The OpenGL spec requires implementors to support at least 16 different vertex attributes. So each vertex attribute can be identified by its index from 0 to 15 (or whatever limit your implementation allows, see glGet(GL_MAX_VERTEX_ATTRIBS,...)).
A vertex attribute is conceptually treated as a four-dimensional vector. When you use less than vec4 in a shader, the additional elements are just ignored. If you specify less than 4 elements, the vector is always filled to the (0,0,0,1), which makes sense for both RGBA color vectors, as well as homogenous vertex coordinates.
Though you can declare vertex attributes of mat types, this will just be mapped to a number of consecutive vertex attribute indices.
The vertex attribute data can come from either a vertex array (nowadays, these are required to lie in a Vertex Buffer Object, possibly directly in VRAM, in legacy GL, they could also come from the ordinary client address space) or from the current value of that attribute.
You enable the fetching from attribute arrays via glEnableVertexAttribArray.If a vertex array for a particular attribute you access in your vertex shader is enabled, the GPU will fetch the i-th element from that arry when processing vertex i. FOr all other attributes you access, you will get the current value for that array.
The current value can be set via the glVertexAttrib[1234]* family of GL functions. They cannot be changed durint the draw call, so they remain constant during the whole draw call - just like uniform variables.
One important thing worth noting is that by default, vertex attributes are always floating point, ad you must declare in float/vec2/vec3/vec4 in the vertex shader to acces them. Setting the current value with for example glVertexAttrib4ubv or using GL_UNISGNED_BYTE as the type parameter of glVertexAttribPointer will not change this. The data will be automatically converted to floating-point.
Nowadays, the GL does support two other attribute data types, though: 32 bit integers, and 64 bit double precision floating-point values. YOu have to declare them as int/ivec*, uint/uvec* or double/dvec* respectively in the shader, and you have to use completely separate functions when setting up the array pointer or current values: glVertexAttribIPointer and glVertexAttribI* for signed/unsigned integers and
glVertexAttribLPointer and glVertexAttribL* for doubles ("long floats").

GLSL packing 4 float attributes into vec4

I have a question about resource consumption of attribute float in glsl.
Does it take as many resources as vec4, or no?
I ask this, because uniforms takes https://stackoverflow.com/a/20775024/1559666 (at least, they could)
If it is not, then does it makes any sense to pack 4 float's into one vec4 attribute?
Yes, all vertex attributes require some multiple of a 4-component vector for storage.
This means that a float vertex attribute takes 1 slot the same as a vec2, vec3 or vec4 would. And types larger than vec4 take multiple slots. A mat4 vertex attribute takes 4 x vec4 many units of storage. A dvec4 (double-precision vector) vertex attribute takes 2 x vec4. Since implementations are only required to offer 16 unique vertex attribute slots, if you naively used single float attributes, you could easily exhaust all available storage just to store a 4x4 matrix.
There is no getting around this. Unlike uniforms (scalar GPUs may be able to store float uniforms more efficiently than vec4), attributes are always tied to a 4-component data type. So for vertex attributes, packing attributes into vectors is quite important.
I have updated my answer to point out relevant excerpts from the GL and GLSL specifications:
OpenGL 4.4 Core Profile Specification - 10.2.1 Current Generic Attributes - pp. 307
Vertex shaders (see section 11.1) access an array of 4-component generic vertex
attributes. The first slot of this array is numbered zero, and the size of the array is
specified by the implementation-dependent constant GL_MAX_VERTEX_ATTRIBS.
GLSL 4.40 Specification - 4.4.1 Input Layout Qualifiers - pp. 60
If a vertex shader input is any scalar or vector type, it will consume a single location. If a non-vertex shader input is a scalar or vector type other than dvec3 or dvec4, it will consume a single location, while types dvec3 or dvec4 will consume two consecutive locations. Inputs of type double and dvec2 will consume only a single location, in all stages.
Admittedly, the behavior described for dvec4 differs slightly. In GL_ARB_vertex_attrib_64bit form, double-precision types may consume twice as much storage as floating-point, such that a dvec3 or dvec4 may consume two attribute slots. When it was promoted to core, that behavior changed... they are only supposed to consume 1 location in the vertex stage, potentially more in any other stage.
Original (extension) behaviour of double-precision vector types:
Name
ARB_vertex_attrib_64bit
[...]
Additionally, some vertex shader inputs using the wider 64-bit components
may count double against the implementation-dependent limit on the number
of vertex shader attribute vectors. A 64-bit scalar or a two-component
vector consumes only a single generic vertex attribute; three- and
four-component "long" may count as two. This approach is similar to the
one used in the current GL where matrix attributes consume multiple
attributes.
The attribute vec4 will take 4 times the memory of the attribute float.
On uniforms, due to some alignments, you may loose some components. (vec4 will be aligned to 4 bytes).

Usage of glBindBufferRange with transform feedback

I have a buffer that I would like to fill over successive transform feedbacks, and I am wondering how exactly to do this.
glBindBufferRange has five arguments, I understand that the first three are equivalent to the arguments of glBindBufferBase, but I have a few questions about the offset and size arguments.
If my first transform feedback produced n primitives, as retrieved from GL_TRANSFORM_FEEDBACK_PRIMITIVES_WRITTEN, my primitives are points, and I want to continue from that position in the buffer, should the offset of glBindBufferRange be set to n*4*sizeof(GLfloat)? (assuming I am retrieving a vec4 geometry shader output)
The docs just say that offset and size should be in basic machine units (although they have two different types, GLintptr and GLsizeiptr), but I'm not exactly sure what that means, so I assumed bytes, is this correct?
Yes, the amount of data written to a buffer during transform feedback is the number of primitives written * the number of components of those primitives * the size of a primitive. And yes, "basic machine units" is standardese for "byte".

How does interleaved vertex submission help performance?

I have read and seen other questions that all generally point to the suggestion to interleav vertex positions and colors, etc into one array, as this minimizes the data that gets sent from cpu to gpu.
What I'm not clear on is how OpenGL does this when, even with an interleaved array, you must still make separate GL calls for position and color pointers. If both pointers use the same array, just set to start at different points in that array, does the draw call not copy the array twice since it was the object of two different pointers?
This is mostly about cache. For example, imagine we have 4 vertex and 4 colors. You can provide the information this way (excuse me but I don't remember the exact function names)
glVertexPointer(..., vertex);
glColorPointer(..., colors);
What it internally does, is read vertex[0], then apply colors[0], then again vertex[1] with colors[1]. As you can see, if vertex is, for example, 20 megs long, vertex[0] and colors[0] will be, to say the least, 20 megabytes apart from each other.
Now, on the other hand, if you provide a structure like { vertex0, color0, vertex1, color1, etc.} there will be a lot of cache hits because, well, vertex0 and color0 are together, and so are vertex1 and color1.
Hope this helps answer the question
edit: on second read, I may not have answered the question. You might probably be wondering how does OpenGL know which values to read from that structure, maybe? Like I said before with a structure such as { vertex, color, vertex, color } you tell OpenGL that vertex is at position 0, with an offset of 2 (so next one will be at position 2, then 4, etc) and color starts at position 1, with an offset of 2 also (so position 1, then 3, etc).
addition: In case you want a more practical example, look at this link http://www.lwjgl.org/wiki/index.php?title=Using_Vertex_Buffer_Objects_(VBO). You can see there how it only provides the buffer once and then uses offsets to render efficiently.
I suggest reading: Vertex_Specification_Best_Practices
h4lc0n provided quite nice explanation, but I would like add some additional info:
interleaved data can actually hurt performance when your data often changes. For instance when you change position of point sprites, you update POS, but COLOR and TEXCOORD are usually the same. Then, when data is interleaved you must "touch" additional data. In that case it would be better to have one VBO for POS only (or in general for data that changes often) and the second VBO for data that is constant.
it is not easy to give strict rules about VBO layout, since it is very vendor/driver specific. Also your usage can be different from others. In general it is needed to make some benchmarks for your particular test cases
You could also make an argument for separating different attributes. Assuming a GPU does not process one vertex after another but rather a bunch (ex. 16) of them in parallel, you would would get something like this while executing a vertex shader:
read attribute A for all 16 vertices
perform some computations
read attribute B for all 16 vertices
perform some more computations
....
So you read one attribute for many vertices at once. From this reasoning it would seem that interleaving the attributes actually hurts the performance. Of cours this would only be visible if you are either bandwidth constrained or if the memory latency cannot be hidden for some reason (ex. a complex shader that requires many registers will reduce the number of vertices that can be in flight at a given time).