Aligning structures to std140, CPU side

Aligning structures to std140, CPU side - c++

I suppose this is a kind-of cross between a pure C++ question and an OpenGL question. I have a uniform buffer and am allocating space in it of sizeof(ShaderData) bytes. I'm using std140 layout on the GPU side in the shader.
According to std140 rules, I need to add padding in various places in my structure to make sure things like vectors are aligned correctly. The structure below is an example (for my light):
struct ShaderData {
float Light_Intensity;
float _pad1[3]; // align following vec3 on 4N boundary
Math::Vec3f Light_Position;
float _pad2; // align following vec4 on 4N boundary
Math::Colour4f Light_Ambient;
Math::Colour4f Light_Diffuse;
Math::Colour4f Light_Specular;
float Light_AttenuationMin;
float Light_AttenuationMax;
} MyShaderData;
Is this the way people generally do things in C++, or are there special keywords or pragmas for aligning individual elements of a structure CPU side that are a little tidier?

No, in this way you just waste space. You have to find the optimized layout according to std140 rules.
a float needs 4 bytes and it's 4 bytes aligned
a vec3 needs 12 bytes and it's 16 bytes aligned
a vec4 needs 16 bytes and it's 16 bytes aligned
This means that you can find a better layout for your struct:
float Light_Intensity; X
float _pad1[3]; XXX
Math::Vec3f Light_Position; XXX
float _pad2; X
As you can see you are wasting 4 bytes and what's worse is the fact that you can just do something like:
Math::Vec3f Light_Position XXX
float Light_Intensity; X
To have it aligned and without the need to waste a byte. This works because vec3 will be aligned to 16 bytes boundaries while the float will be still aligned to 4 bytes boundaries.

Related

Aligning memory of SSBO that is an array of structs containing an array?

I'm flattening out an octree and sending it to my fragment shader using an SSBO, and I believe I am running into some memory alignment issues. I'm using std430 for the layout and binding a vector of voxels to this SSBO this is the structure in my shader. I'm using GLSL 4.3 FYI
struct Voxel
{
bool data; // 4
vec4 pos; // 16
vec4 col; // 16
float size; // 4
int index; // 4
int pIndex; // 4
int cIdx[8]; // 4, 16 or 32 bytes?
};
layout (std430, binding=2) buffer octreeData
{
Voxel voxels[];
};
I'm not 100% sure but I think I'm running into an issue using the int cIdx[8] array inside of the struct, looking at the spec (page 124, section 7.6)
If the member is an array of scalars or vectors, the base alignment and array
stride are set to match the base alignment of a single array element, according
to rules (1), (2), and (3), and rounded up to the base alignment of a vec4. The
array may have padding at the end; the base offset of the member following
the array is rounded up to the next multiple of the base alignment.
I'm not entirely sure what the alignment is, I know the vec4's take up 16 bytes of memory, but how much does my array? If it was just sizeof(int)*8 that would be 32, but it says that it's set to the size of a single array element and then rounded up to a vec4 right? So does that mean my cIdx array has a base alignment of 16 bytes? There's no follow up members so is there padding getting added to my struct?
So total structure memory = 52 bytes (if we only allocate 4 bytes for cIdx), would that mean there is 12 bytes of padding being added on that I need to account for that may be causing me issues? If it was allocating 16 bytes would that be 64 bytes total for the structure and no memory alignment issues?
My corresponding c++ structure
struct Voxel
{
bool data;
glm::vec4 pos;
glm::vec4 col;
float size;
int index;
int pIndex;
int cIdx[8];
};
I'm then filling in my std::vector<Voxel> and passing it to my shader like so
glGenBuffers(1, &octreeSSBO);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, octreeSSBO);
glBufferData(GL_SHADER_STORAGE_BUFFER, voxelData.size()*sizeof(Voxel), voxelData.data(), GL_DYNAMIC_DRAW);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, octreeSSBO);
reading directly from the voxelData vector, I can confirm that the data is getting filled in correctly, and I can even occasionally see that the data is getting passed to the shader but behaving incorrectly compared to what I would expect to see based on the values I'm looking at.
Does it look like there are memory alignment issues here?

I'm not entirely sure what the alignment is
The specification is very clear as to what the base alignment of things are. Your problem is not in item #4 (std430 doesn't do the rounding specified in #4 anyway).
Your problem is in #2:
If the member is a two- or four-component vector with components consuming N basic machine units, the base alignment is 2N or 4N, respectively.
In GLSL, vec4 has a base alignment of 16. That means that any vec4 must be allocated on a 16-byte boundary.
pos must be on a 16-byte boundary. However, data is only 4 bytes. Therefore, 12 bytes of padding must be inserted between data and pos to satisfy std430's alignment requirements.
However, glm::vec4 has a C++ alignment of 4. So the C++ compiler does not insert a bunch of padding between data and pos. Thus, the types in the two languages do not agree.
You should explicitly align all GLM vectors in C++ structs that you want to match GLSL, using C++11's alignas keyword:
struct Voxel
{
bool data;
alignas(16) glm::vec4 pos;
alignas(16) glm::vec4 col;
float size;
int index;
int pIndex;
int cIdx[8];
};
Also, I would not assume that the C++ type bool and the GLSL type bool have the same size.

I need explanation about std140 uniform blocks offsets

Sorry for the title, but I do not really know how I can name my problem.
I am reading about uniform blocks in a opengl book and I am a bit confused about default std140 offsets shown there.
layout(std140) uniform TransformBlock
{
//component base alignment | offset | aligned offset
float scale; // 4 | 0 | 0
vec3 translation; // 16 | 4 | 16
float rotation[3]; // 16 | 28 | 32 (rotation[0])
// 48 (rotation[1])
// 64 (rotation[2])
mat4 projection_matrix; // 16 | 80 | 80 (column 0)
// 96 (column 1)
// 112 (column 2)
// 128 (column 3)
} transform;
I know that vec3's alignment = vec4's alignment = 32 bits.
Scale is the first component so offset is 0, also it is 4 bits, so it is clear to me that translation needs to be at - let's call it currentPosition - currentPosition + 4.
I do not understand why translation's offset's alignment is 16, though.
Also, it is unclear to me why rotation's offset is 28.
Translation is vec3, it means that there are 3 floats, so 3 * 4 = 12.
My first thought was that we maybe want to round it to a, I do not know how it is called, bit value, but 28 is not a value of that kind.
Same with projection_matrix's offset.
Could someone explain it to me as if I were an idiot, please?

OpenGL does not define a concept called "offset's alignment". Maybe your book is talking about some derived quantity, but since you did not name the book nor did you quote anything more than this example, I cannot say.
The quantities of worth when it comes to std140 layout are the size (how much space it takes up), base alignment, offset, and array stride (obviously only meaningful for arrays). The base alignment imposes a restriction on the offset; the offset must be divisible by the base alignment.
vec3 has a size of 12, since it contains 3 4-byte values. It has a base alignment of 16, because that's what the standard says it has:
If the member is a three-component vector with components consuming N
basic machine units, the base alignment is 4N.
The offset of a member is computed by computing the offset of the previous member, adding the previous member's size, and then applying the base alignment to that value.
So, given that scale has an offset of 0, size of 4, and a base alignment of 4, the offset of translation is 16 (4, rounded up to the nearest base alignment).
The base alignment and array stride of rotation is 16, because that's what the standard says:
If the member is an array of scalars or vectors, the base alignment and array
stride are set to match the base alignment of a single array element, according
to rules(1), (2), and (3), and rounded up to the base alignment of a vec4.
Emphasis added.
So, the offset of translation is 16, and its size is 12. Add them together, and you get 28. To get the offset of rotation, you apply rotation's base alignment, giving you 32.
Also, stop using vec3s.

GLSL skips "if" statement

My GLSL fragment shader skips the "if" statement. The shader itself is very short.
I send some data via a uniform buffer object and use it further in the shader. However, the thing skips the assignment attached to the "if" statement for whatever reason.
I checked the values of the buffer object using glGetBufferSubData (tested with specific non zero values). Everything is where it needs to be. So I'm really kinda lost here. Must be some GLSL weirdness I'm not aware of.
Currently the
#version 420
layout(std140, binding = 2) uniform textureVarBuffer
{
vec3 colorArray; // 16 bytes
int textureEnable; // 20 bytes
int normalMapEnable; // 24 bytes
int reflectionMapEnable; // 28 bytes
};
out vec4 frag_colour;
void main() {
frag_colour = vec4(1.0, 1.0, 1.0, 0.5);
if (textureEnable == 0) {
frag_colour = vec4(colorArray, 0.5);
}
}

You are confusing the base alignment rules with the offsets. The spec states:
The base offset of the first
member of a structure is taken from the aligned offset of the structure itself. The
base offset of all other structure members is derived by taking the offset of the
last basic machine unit consumed by the previous member and adding one. Each
structure member is stored in memory at its aligned offset. The members of a toplevel
uniform block are laid out in buffer storage by treating the uniform block as
a structure with a base offset of zero.
It is true that a vec3 requires a base alignment of 16 bytes, but it only consumes 12 bytes. As a result, the next element after the vec3 will begin 12 bytes after the aligned offset of the vec3 itself. Since the alignment rules for int are just 4 bytes, there will be no padding at all.

GLSL layout std140 padding dilemma

I have the following uniform buffer:
layout(std140) uniform Light
{
vec4 AmbientLight;
vec4 LightIntensity;
vec3 LightPosition;
float LightAttenuation;
};
I have some issues when buffering the data and the padding I need to add. I have read the http://ptgmedia.pearsoncmg.com/images/9780321552624/downloads/0321552628_AppL.pdf which says I have to add an extra 4 bytes at the end of the vec3 for padding - so I will upload a total of 13 bytes for 'Light'. When I do that however, 'LightAttenuation' gets the value I padded on 'LightPosition', rather than one byte ahead, so I get the correct values in the shader when I do NOT pad. Why is this?

See section 7.6.2.2 of the OpenGL spec for the details, but basically, std140 layout says that each variable will be laid out immediately after the previous variable with enough padding added for the alignment required for the variable's type. vec3 and vec4 both require 16-byte alignment and are 12 and 16 bytes respectively. float requires 4 byte alignment and has 4 byte size. So with std140 layout, LightPosition will get 16 byte alignment so will always end at an address that is 12 mod 16. Since this is 4-byte aligned, no extra padding will be inserted before LightAttenuation.

Usually yes, openGL will treat a vec3 as an vec4. But AFAIK in this case it appends the float LightAttenuation to the vec3 LightPosition - forming an overall vec4 (its some kind of optimization, done by the glsl compiler).
The whole structure will be of size 3x vec4.
Try out using a vec3 or vec4 for LightAttenuation.

Issue with glBindBufferRange() OpenGL 3.1

My vertex shader is ,
uniform Block1{ vec4 offset_x1; vec4 offset_x2;}block1;
out float value;
in vec4 position;
void main()
{
value = block1.offset_x1.x + block1.offset_x2.x;
gl_Position = position;
}
The code I am using to pass values is :
GLfloat color_values[8];// contains valid values
glGenBuffers(1,&buffer_object);
glBindBuffer(GL_UNIFORM_BUFFER,buffer_object);
glBufferData(GL_UNIFORM_BUFFER,sizeof(color_values),color_values,GL_STATIC_DRAW);
glUniformBlockBinding(psId,blockIndex,0);
glBindBufferRange(GL_UNIFORM_BUFFER,0,buffer_object,0,16);
glBindBufferRange(GL_UNIFORM_BUFFER,0,buffer_object,16,16);
Here what I am expecting is, to pass 16 bytes for each vec4 uniform. I get GL_INVALID_VALUE error for offset=16 , size = 16.
I am confused with offset value. Spec says it is corresponding to "buffer_object".

There is an alignment restriction for UBOs when binding. Any glBindBufferRange/Base's offset must be a multiple of GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT. This alignment could be anything, so you have to query it before building your array of uniform buffers. That means you can't do it directly in compile-time C++ logic; it has to be runtime logic.
Speaking of querying things at runtime, your code is horribly broken in many other ways. You did not define a layout qualifier for your uniform block; therefore, the default is used: shared. And you cannot use `shared* layout without querying the layout of each block's members from OpenGL. Ever.
If you had done a query, you would have quickly discovered that your uniform block is at least 32 bytes in size, not 16. And since you only provided 16 bytes in your range, undefined behavior (which includes the possibility of program termination) results.
If you want to be able to define C/C++ objects that map exactly to the uniform block definition, you need to use std140 layout and follow the rules of std140's layout in your C/C++ object.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js