I have bound the shader storage buffer to the shader storage block like so
GLuint index = glGetProgramResourceIndex(myprogram, GL_SHADER_STORAGE_BLOCK, name);
glShaderStorageBlockBinding(myprogram, index, mybindingpoint);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, mybuffer)
glBindBufferRange(GL_SHADER_STORAGE_BUFFER, mybindingpoint, mybuffer, 0, 48);
glBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, 48, &mydata);
mydata points to a std::vector containing 4 glm::vec3 objects.
Because I bound 48 bytes as the buffer range I expect lights[] to hold 48/(4*3) = 4 vec3s.
layout(std430) buffer light {
vec3 lights[];
};
The element at index 1 in my std::vector holds the data x=1.0, y=1.0, z=1.0.
But viewing the output by doing
gl_FragColor = vec4(lights[1], 1.0);
I see yellow (x=1.0, y=1.0, z=0.0) pixels. This is not what I loaded into the buffer.
Can somebody tell me what I am doing wrong?
EDIT
I just changend the shader storage block to
layout(std430) buffer light {
float lights[];
};
and output
gl_FragColor = vec4(lights[3],lights[4],lights[5],1.0);
and it works (white pixels).
If somebody can explain this, that would still be great.
It's because people don't take this simple advice: never use a vec3 in a UBO/SSBO.
The base alignment of a vec3 is 16 bytes. Always. Therefore, when it is arrayed, the array stride (the number of bytes from one element to the next) is always 16. Exactly the same as a vec4.
Yes, std430 layout is different from std140. But it's not that different. Specifically, it only prevents the base alignment and stride of array elements (and base alignment of structs) from being rounded up to that of a vec4. But since the base alignment of vec3 is always equal to that of a vec4, it changes nothing about them. It only affects scalars and vec2's.
Related
In my OpenGL ES 3.0 program I need to have two separate Uniform Buffer Objects (UBOs). With just one UBO, things work as expected. The code for that case looks as follows:
GLSL vertex shader:
version 300 es
layout (std140) uniform MatrixBlock
{
mat4 matrix[200];
};
C++ header file member variables:
GLint _matrixBlockLocation;
GLuint _matrixBuffer;
static constexpr GLuint _matrixBufferBindingPoint = 0;
glm::mat4 _matrixBufferContent[200];
C++ code to initialze the UBO:
_matrixBlockLocation = glGetUniformBlockIndex(_program, "MatrixBlock");
glGenBuffers(1, &_matrixBuffer);
glBindBuffer(GL_UNIFORM_BUFFER, _matrixBuffer);
glBufferData(GL_UNIFORM_BUFFER, 200 * sizeof(glm::mat4), _matrixBufferContent, GL_DYNAMIC_DRAW);
glBindBufferBase(GL_UNIFORM_BUFFER, _matrixBufferBindingPoint, _matrixBuffer);
glUniformBlockBinding(_program, _matrixBlockLocation, _matrixBufferBindingPoint);
To update the content of the UBO I modify the _matrixBufferContent array and then call
glBufferSubData(GL_UNIFORM_BUFFER, 0, 200 * sizeof(glm::mat4), _matrixBufferContent);
This works as I expect it. In the vertex shader I can access the matrices and the resulting image is as it should be.
The OpenGL ES 3.0 specification defines that the maximum available storage per UBO is 16K (GL_MAX_UNIFORM_BLOCK_SIZE). Because the size of my matrix array comes close to that limit I want to create a second UBO that stores additional data. But as soon as I add that second UBO I encounter problems. Here's the code to create the two UBOs:
GLSL vertex shader:
version 300 es
layout (std140) uniform MatrixBlock
{
mat4 matrix[200];
};
layout (std140) uniform HighlightingBlock
{
int highlighting[200];
};
C++ header file member variables:
GLint _matrixBlockLocation;
GLint _highlightingBlockLocation;
GLuint _uniformBuffers[2];
static constexpr GLuint _matrixBufferBindingPoint = 0;
static constexpr GLuint _highlightingBufferBindingPoint = 1;
glm::mat4 _matrixBufferContent[200];
int32_t _highlightingBufferContent[200];
C++ code to initialize both UBOs:
_matrixBlockLocation = glGetUniformBlockIndex(_program, "MatrixBlock");
_highlightingBlockLocation = glGetUniformBlockIndex(_program, "HighlightingBlock");
glGenBuffers(2, _uniformBuffers);
glBindBuffer(GL_UNIFORM_BUFFER, _uniformBuffers[0]);
glBufferData(GL_UNIFORM_BUFFER, 200 * sizeof(glm::mat4), _matrixBufferContent, GL_DYNAMIC_DRAW);
glBindBuffer(GL_UNIFORM_BUFFER, _uniformBuffers[1]);
glBufferData(GL_UNIFORM_BUFFER, 200 * sizeof(int32_t), _highlightingBufferContent, GL_DYNAMIC_DRAW);
glBindBufferBase(GL_UNIFORM_BUFFER, _matrixBufferBindingPoint, _uniformBuffers[0]);
glBindBufferBase(GL_UNIFORM_BUFFER, _highlightingBufferBindingPoint, _uniformBuffers[1]);
glUniformBlockBinding(_program, _matrixBlockLocation, _matrixBufferBindingPoint);
glUniformBlockBinding(_program, _highlightingBlockLocation, _highlightingBufferBindingPoint);
To update the first UBO I still modify the _matrixBufferContent array but then call
glBindBuffer(GL_UNIFORM_BUFFER, _uniformBuffers[0]);
glBufferSubData(GL_UNIFORM_BUFFER, 0, 200 * sizeof(glm::mat4), _matrixBufferContent);
To update the second UBO I modify the content of the _highlightingBufferContent array and then call
glBindBuffer(GL_UNIFORM_BUFFER, _uniformBuffers[1]);
glBufferSubData(GL_UNIFORM_BUFFER, 0, 200 * sizeof(int32_t), _highlightingBufferContent);
From what I see, the first UBO still works as expected. But the data that I obtain in the vertex shader is not what I originally put into _highlightingBufferContent. If I run this code as WebGL 2.0 code I'm getting the following warning in Google Chrome:
GL_INVALID_OPERATION: It is undefined behaviour to use a uniform buffer that is too small.
In Firefox I'm getting the following:
WebGL warning: drawElementsInstanced: Buffer for uniform block is smaller than UNIFORM_BLOCK_DATA_SIZE.
So, somehow the second UBO is not properly mapped somehow. But I'm failing to see where things go wrong. How do I create two separate UBOs and use both of them in the same vertex shader?
Edit
Querying the value behind GL_UNIFORM_BLOCK_DATA_SIZE that is expected by OpenGL reveals that it needs to be 4 times bigger than it is now. Here's how I query the values:
GLint matrixBlock = 0;
GLint highlightingBlock = 0;
glGetActiveUniformBlockiv(_program, _matrixBlockLocation, GL_UNIFORM_BLOCK_DATA_SIZE, &matrixBlock);
glGetActiveUniformBlockiv(_program, _highlightingBlockLocation, GL_UNIFORM_BLOCK_DATA_SIZE, &highlightingBlock);
Essentially, this means that the buffer size must be
200 * sizeof(int32_t) * 4
and not just
200 * sizeof(int32_t)
However, it is not clear to me why that it. I'm putting 32-bit integers into that array which I'd expect to be 4 byte in size but they seem to be 16 bytes long somehow. Not sure yet what is going on.
As hinted to by the edit section of the question and by Beko's comment, there are specific alignment rules associated with OpenGL's std140 layout. The OpenGL ES 3.0 standard specifies the following:
If the member is a scalar consuming N basic machine units, the base alignment
is N.
If the member is a two- or four-component vector with components consuming
N basic machine units, the base alignment is 2N or 4N, respectively.
If the member is a three-component vector with components consuming N
basic machine units, the base alignment is 4N.
If the member is an array of scalars or vectors, the base alignment and array
stride are set to match the base alignment of a single array element, according
to rules (1), (2), and (3), and rounded up to the base alignment of a vec4. The
array may have padding at the end; the base offset of the member following
the array is rounded up to the next multiple of the base alignment.
Note the emphasis "rounded up to the base alignment of a vec4". This means every integer in the array does not simply occupy 4 bytes but instead occupies the size of a vec4 which is 4 times larger.
Therefore, the array must be 4 times the original size. In addition, it is necessary to pad each integer to the corresponding size before copying the array content using glBufferSubData. If that is not done, the data is misaligned and hence gets misinterpreted by the GLSL shader.
Im passing uniform buffer to compute shader in vulkan. The buffer contains an array of 49 floating point numbers (gaussian matrix). Everything is fine, but when I read array in the shader, it gives only 13 values, the others are 0 or gunk, and they correspond to 0, 4, 8, etc. of initial array. I think its some kind of alignment problem
Shader layouts are
struct Pixel
{
vec4 value;
};
layout(push_constant) uniform params_t
{
int width;
int height;
} params;
layout(std140, binding = 0) buffer buf
{
Pixel imageData[];
};
layout (binding = 1) uniform sampler2D inputTex;
layout (binding = 2) uniform unf_t
{
float gauss[SAMPLE_SIZE*SAMPLE_SIZE];
};
Could that be binding 0 influencing binding 2? and if so how can I copy array to buffer with needed alignment? Currently I use
vkCmdUpdateBuffer(a_cmdBuff, a_uniform, 0, a_gaussSize, (const uint32_t *)gauss)
or may be better to split on different sets?
Edit: by expanding buffer and array i manage to pass it with alignment of 16 and all great, but it looks like a waste of memory. How can I align floats by 4?
Uniform blocks require that array elements are aligned to vec4 (16 bytes).
To work around this you use a vec4 instead and you can pass 52 floats and then take the correct component based on index/4 and index%4.
I'm trying my hand at shader storage buffer objects (aka Buffer Blocks) and there are a couple of things I don't fully grasp. What I'm trying to do is to store the (simplified) data of an indeterminate number of lights n in them, so my shader can iterate through them and perform calculations.
Let me start by saying that I get the correct results, and no errors from OpenGL. However, it bothers me not to know why it is working.
So, in my shader, I got the following:
struct PointLight {
vec3 pos;
float intensity;
};
layout (std430, binding = 0) buffer PointLights {
PointLight pointLights[];
};
void main() {
PointLight light;
for (int i = 0; i < pointLights.length(); i++) {
light = pointLights[i];
// etc
}
}
and in my application:
struct PointLightData {
glm::vec3 pos;
float intensity;
};
class PointLight {
// ...
PointLightData data;
// ...
};
std::vector<PointLight*> pointLights;
glGenBuffers(1, &BBO);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, BBO);
glNamedBufferStorage(BBO, n * sizeof(PointLightData), NULL, GL_DYNAMIC_STORAGE_BIT);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, BBO);
...
for (unsigned int i = 0; i < pointLights.size(); i++) {
glNamedBufferSubData(BBO, i * sizeof(PointLightData), sizeof(PointLightData), &(pointLights[i]->data));
}
In this last loop I'm storing a PointLightData struct with an offset equal to its size times the number of them I've already stored (so offset 0 for the first one).
So, as I said, everything seems correct. Binding points are correctly set to the zeroeth, I have enough memory allocated for my objects, etc. The graphical results are OK.
Now to my questions. I am using std430 as the layout - in fact, if I change it to std140 as I originally did it breaks. Why is that? My hypothesis is that the layout generated by std430 for the shader's PointLights buffer block happily matches that generated by the compiler for my application's PointLightData struct (as you can see in that loop I'm blindingly storing one after the other). Do you think that's the case?
Now, assuming I'm correct in that assumption, the obvious solution would be to do the mapping for the sizes and offsets myself, querying opengl with glGetUniformIndices and glGetActiveUniformsiv (the latter called with GL_UNIFORM_SIZE and GL_UNIFORM_OFFSET), but I got the sneaking suspicion that these two guys only work with Uniform Blocks and not Buffer Blocks like I'm trying to do. At least, when I do the following OpenGL throws a tantrum, gives me back a 1281 error and returns a very weird number as the indices (something like 3432898282 or whatever):
const char * names[2] = {
"pos", "intensity"
};
GLuint indices[2];
GLint size[2];
GLint offset[2];
glGetUniformIndices(shaderProgram->id, 2, names, indices);
glGetActiveUniformsiv(shaderProgram->id, 2, indices, GL_UNIFORM_SIZE, size);
glGetActiveUniformsiv(shaderProgram->id, 2, indices, GL_UNIFORM_OFFSET, offset);
Am I correct in saying that glGetUniformIndices and glGetActiveUniformsiv do not apply to buffer blocks?
If they do not, or the fact that it's working is like I imagine just a coincidence, how could I do the mapping manually? I checked appendix H of the programming guide and the wording for array of structures is somewhat confusing. If I can't query OpenGL for sizes/offsets for what I'm tryind to do, I guess I could compute them manually (cumbersome as it is) but I'd appreciate some help in there, too.
There's an array of uniform blocks in my shader which is defined as such:
layout (std140) uniform LightSourceBlock
{
int shadowMapID;
int type;
vec3 position;
vec4 color;
float dist;
vec3 direction;
float cutoffOuter;
float cutoffInner;
float attenuation;
} LightSources[12];
To be able to bind my buffer objects to each LightSource, I've bound each uniform to a uniform block index:
for(unsigned int i=0;i<12;i++)
glUniformBlockBinding(program,locLightSourceBlock[i],i); // locLightSourceBlock contains the locations of each element in LightSources[]
When rendering, I'm binding my buffers to the respective index using:
glBindBufferBase(GL_UNIFORM_BUFFER,i,buffer);
This works fine, as long as I only bind a single buffer to the binding index 0. As soon as there's more, everything is pitch black (Even things that use entirely different shaders). (glGetError returns no errors)
If I change the block indices range from 0-11 to 2-13 (Skipping index 1), everything works as it should. I figured if I use index 1, I'm overwriting something, but I don't have any other uniform blocks in my shader, and I'm not using glUniformBlockBinding or glBindBufferBase anywhere else in my code, so I'm not sure.
What could be causing such behavior? Is the index 1 reserved for something?
1) Dont use multiple blocks. Use one block with array. Something like this:
struct Light{
...
}
layout(std430, binding=0) uniform lightBuffer{
Light lights[42];
}
skip glUniformBlockBinding and only glBindBufferBase to index specified in shader
2) Read up on alignment for std140, std430. In short, buffer variable are aligned so they dont cross 128bit boundaries. So in your case position would start at byte 16 (not 8). This results in mismatch of CPU/GPU side access. (Reorder variables or add padding)
Suppose I have the following vertex shader code:
#version 330
uniform mat4 ProjectionMatrix, CameraMatrix, SingleModelMatrix;
uniform uint SingleModel;
layout (std140) uniform ModelBlock {
mat4 ModelMatrices[128];
};
void main(void) {
... something that uses ModelMatrices
}
If in my program I issue the following OpenGL call for a program object representing the shader above:
getUniformBlockParameter(GL_UNIFORM_BLOCK_DATA_SIZE)
I find out the following with an Intel HD Graphics 4000 card:
GLUniformBlock[
name: ModelBlock,
blockSize: 8192,
buffer: java.nio.HeapByteBuffer[pos=0 lim=8192 cap=8192],
metadata: {
ModelMatrices=GLUniformBlockAttributeMetadata[
name: ModelMatrices,
offset: 0,
arrayStride: 64,
matrixStride: 16,
matrixOrder: 0
]},
uniforms: GLUniformInterface[
uniforms: {
ModelMatrices=GLUFloatMat4 [
name:ModelMatrices,
idx:2,
loc:-1,
size:128,
glType:8b5c]
}
]
]
How should I interpret the blockSize parameter? The OpenGL 3 sdk docs state that:
If pname is GL_UNIFORM_BLOCK_DATA_SIZE, the implementation-dependent minimum total buffer object size, in basic machine units, required to hold all active uniforms in the uniform block identified by uniformBlockIndex is returned. It is neither guaranteed nor expected that a given implementation will arrange uniform values as tightly packed in a buffer object. The exception to this is the std140 uniform block layout, which guarantees specific packing behavior and does not require the application to query for offsets and strides. In this case the minimum size may still be queried, even though it is determined in advance based only on the uniform block declaration.
So in this case how should I measure that 8192 number in blockSize? In floats? In bytes?
If I do the numbers, a 4x4 matrix mat4 uniform has a total of 16 float components, which fit into 64 bytes, so at the very least I need about 16 x 4 x 128 bytes to store 128 matrices which is indeed, 8192.
Why then is the hardware also asking for 64 (bytes?) for array stride and 16 (bytes?) of matrix stride, yet requesting only 8192 bytes? Shouldn't getUniformBlockParameter(GL_UNIFORM_BLOCK_DATA_SIZE) have requested more space for alignment/padding purposes?
The array stride is the byte offset from the start of one array element to the start of the next. The array elements in this case are mat4s, which are 64 bytes in size. So the array stride is as small as it can be.
The matrix stride is the byte offset from one column/row of matrix data to the next. Each mat4 has a column or row size of a vec4, which means it is 16 bytes in size. So again, the matrix columns/rows are tightly packed.
So 8KiB is the expected size of such storage.
That being said, this is entirely irrelevant. You shouldn't be bothering to query this stuff at all. By using std140 layout, you have already forced OpenGL to adopt a specific layout. One that notably requires mat4s to have an array stride of 64 bytes and to have a matrix stride of 16 bytes. You don't have to ask the implementation for this; it's required by the standard.
You only need to ask if you're using shared or packed.