Meaning of size parameter in SSBO - opengl

I use two SSBO's in a fragment shader. For each fragment, I make a calculation and, if some condition is met, I write the worldspace coordinates of the fragment/pixel (they have been passed on to the fragment shader) to one SSBO and the fragment color to the other one. The SSBO's are then read by the application and those pixels which have been kept in the SSBO's are passed on to the next rendering.
The size parameter in
void glBufferData( GLenum target, GLsizeiptr size, const GLvoid * data, GLenum usage);
can have two values for the moment: 2500 or 20000.
For the passes where the size = 2500, everything works fine. As soon as size = 20000, then most pixels cease to be registered in the SSBO's.
My question: what is the actual meaning of the size parameter? Is it the size of what can be written in each fragment instanciation (in this case, it would be only one vec4 per SSBO per fragment) or is it the size of all the instanciations in each rendering pass (in this case 2500 or 20000 vec4 per SSBO)?

I suppose your SSBOs contain a vec4.
Their size is their total size is byte (as Reto Koradi said) for one frame, it you reset it every time with glBufferData. A vec4 is 16 bytes (4 bytes per float, they are 32 bit). So a size of 2500 means 2500/16 = 156(.25) vec4s. 20000 bytes is 1250 vec4s.

Related

How to use multiple Uniform Buffer Objects

In my OpenGL ES 3.0 program I need to have two separate Uniform Buffer Objects (UBOs). With just one UBO, things work as expected. The code for that case looks as follows:
GLSL vertex shader:
version 300 es
layout (std140) uniform MatrixBlock
{
mat4 matrix[200];
};
C++ header file member variables:
GLint _matrixBlockLocation;
GLuint _matrixBuffer;
static constexpr GLuint _matrixBufferBindingPoint = 0;
glm::mat4 _matrixBufferContent[200];
C++ code to initialze the UBO:
_matrixBlockLocation = glGetUniformBlockIndex(_program, "MatrixBlock");
glGenBuffers(1, &_matrixBuffer);
glBindBuffer(GL_UNIFORM_BUFFER, _matrixBuffer);
glBufferData(GL_UNIFORM_BUFFER, 200 * sizeof(glm::mat4), _matrixBufferContent, GL_DYNAMIC_DRAW);
glBindBufferBase(GL_UNIFORM_BUFFER, _matrixBufferBindingPoint, _matrixBuffer);
glUniformBlockBinding(_program, _matrixBlockLocation, _matrixBufferBindingPoint);
To update the content of the UBO I modify the _matrixBufferContent array and then call
glBufferSubData(GL_UNIFORM_BUFFER, 0, 200 * sizeof(glm::mat4), _matrixBufferContent);
This works as I expect it. In the vertex shader I can access the matrices and the resulting image is as it should be.
The OpenGL ES 3.0 specification defines that the maximum available storage per UBO is 16K (GL_MAX_UNIFORM_BLOCK_SIZE). Because the size of my matrix array comes close to that limit I want to create a second UBO that stores additional data. But as soon as I add that second UBO I encounter problems. Here's the code to create the two UBOs:
GLSL vertex shader:
version 300 es
layout (std140) uniform MatrixBlock
{
mat4 matrix[200];
};
layout (std140) uniform HighlightingBlock
{
int highlighting[200];
};
C++ header file member variables:
GLint _matrixBlockLocation;
GLint _highlightingBlockLocation;
GLuint _uniformBuffers[2];
static constexpr GLuint _matrixBufferBindingPoint = 0;
static constexpr GLuint _highlightingBufferBindingPoint = 1;
glm::mat4 _matrixBufferContent[200];
int32_t _highlightingBufferContent[200];
C++ code to initialize both UBOs:
_matrixBlockLocation = glGetUniformBlockIndex(_program, "MatrixBlock");
_highlightingBlockLocation = glGetUniformBlockIndex(_program, "HighlightingBlock");
glGenBuffers(2, _uniformBuffers);
glBindBuffer(GL_UNIFORM_BUFFER, _uniformBuffers[0]);
glBufferData(GL_UNIFORM_BUFFER, 200 * sizeof(glm::mat4), _matrixBufferContent, GL_DYNAMIC_DRAW);
glBindBuffer(GL_UNIFORM_BUFFER, _uniformBuffers[1]);
glBufferData(GL_UNIFORM_BUFFER, 200 * sizeof(int32_t), _highlightingBufferContent, GL_DYNAMIC_DRAW);
glBindBufferBase(GL_UNIFORM_BUFFER, _matrixBufferBindingPoint, _uniformBuffers[0]);
glBindBufferBase(GL_UNIFORM_BUFFER, _highlightingBufferBindingPoint, _uniformBuffers[1]);
glUniformBlockBinding(_program, _matrixBlockLocation, _matrixBufferBindingPoint);
glUniformBlockBinding(_program, _highlightingBlockLocation, _highlightingBufferBindingPoint);
To update the first UBO I still modify the _matrixBufferContent array but then call
glBindBuffer(GL_UNIFORM_BUFFER, _uniformBuffers[0]);
glBufferSubData(GL_UNIFORM_BUFFER, 0, 200 * sizeof(glm::mat4), _matrixBufferContent);
To update the second UBO I modify the content of the _highlightingBufferContent array and then call
glBindBuffer(GL_UNIFORM_BUFFER, _uniformBuffers[1]);
glBufferSubData(GL_UNIFORM_BUFFER, 0, 200 * sizeof(int32_t), _highlightingBufferContent);
From what I see, the first UBO still works as expected. But the data that I obtain in the vertex shader is not what I originally put into _highlightingBufferContent. If I run this code as WebGL 2.0 code I'm getting the following warning in Google Chrome:
GL_INVALID_OPERATION: It is undefined behaviour to use a uniform buffer that is too small.
In Firefox I'm getting the following:
WebGL warning: drawElementsInstanced: Buffer for uniform block is smaller than UNIFORM_BLOCK_DATA_SIZE.
So, somehow the second UBO is not properly mapped somehow. But I'm failing to see where things go wrong. How do I create two separate UBOs and use both of them in the same vertex shader?
Edit
Querying the value behind GL_UNIFORM_BLOCK_DATA_SIZE that is expected by OpenGL reveals that it needs to be 4 times bigger than it is now. Here's how I query the values:
GLint matrixBlock = 0;
GLint highlightingBlock = 0;
glGetActiveUniformBlockiv(_program, _matrixBlockLocation, GL_UNIFORM_BLOCK_DATA_SIZE, &matrixBlock);
glGetActiveUniformBlockiv(_program, _highlightingBlockLocation, GL_UNIFORM_BLOCK_DATA_SIZE, &highlightingBlock);
Essentially, this means that the buffer size must be
200 * sizeof(int32_t) * 4
and not just
200 * sizeof(int32_t)
However, it is not clear to me why that it. I'm putting 32-bit integers into that array which I'd expect to be 4 byte in size but they seem to be 16 bytes long somehow. Not sure yet what is going on.
As hinted to by the edit section of the question and by Beko's comment, there are specific alignment rules associated with OpenGL's std140 layout. The OpenGL ES 3.0 standard specifies the following:
If the member is a scalar consuming N basic machine units, the base alignment
is N.
If the member is a two- or four-component vector with components consuming
N basic machine units, the base alignment is 2N or 4N, respectively.
If the member is a three-component vector with components consuming N
basic machine units, the base alignment is 4N.
If the member is an array of scalars or vectors, the base alignment and array
stride are set to match the base alignment of a single array element, according
to rules (1), (2), and (3), and rounded up to the base alignment of a vec4. The
array may have padding at the end; the base offset of the member following
the array is rounded up to the next multiple of the base alignment.
Note the emphasis "rounded up to the base alignment of a vec4". This means every integer in the array does not simply occupy 4 bytes but instead occupies the size of a vec4 which is 4 times larger.
Therefore, the array must be 4 times the original size. In addition, it is necessary to pad each integer to the corresponding size before copying the array content using glBufferSubData. If that is not done, the data is misaligned and hence gets misinterpreted by the GLSL shader.

Only get garbage from Shader Storage Block?

I have bound the shader storage buffer to the shader storage block like so
GLuint index = glGetProgramResourceIndex(myprogram, GL_SHADER_STORAGE_BLOCK, name);
glShaderStorageBlockBinding(myprogram, index, mybindingpoint);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, mybuffer)
glBindBufferRange(GL_SHADER_STORAGE_BUFFER, mybindingpoint, mybuffer, 0, 48);
glBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, 48, &mydata);
mydata points to a std::vector containing 4 glm::vec3 objects.
Because I bound 48 bytes as the buffer range I expect lights[] to hold 48/(4*3) = 4 vec3s.
layout(std430) buffer light {
vec3 lights[];
};
The element at index 1 in my std::vector holds the data x=1.0, y=1.0, z=1.0.
But viewing the output by doing
gl_FragColor = vec4(lights[1], 1.0);
I see yellow (x=1.0, y=1.0, z=0.0) pixels. This is not what I loaded into the buffer.
Can somebody tell me what I am doing wrong?
EDIT
I just changend the shader storage block to
layout(std430) buffer light {
float lights[];
};
and output
gl_FragColor = vec4(lights[3],lights[4],lights[5],1.0);
and it works (white pixels).
If somebody can explain this, that would still be great.
It's because people don't take this simple advice: never use a vec3 in a UBO/SSBO.
The base alignment of a vec3 is 16 bytes. Always. Therefore, when it is arrayed, the array stride (the number of bytes from one element to the next) is always 16. Exactly the same as a vec4.
Yes, std430 layout is different from std140. But it's not that different. Specifically, it only prevents the base alignment and stride of array elements (and base alignment of structs) from being rounded up to that of a vec4. But since the base alignment of vec3 is always equal to that of a vec4, it changes nothing about them. It only affects scalars and vec2's.

Generating Smooth Normals from active Vertex Array

I'm attempting to hack and modify several rendering features of an old opengl fixed pipeline game, by hooking into OpenGl calls, and my current mission is to implement shader lighting. I've already created an appropriate shader program that lights most of my objects correctly, but this game's terrain is drawn with no normal data provided.
The game calls:
void glVertexPointer(GLint size, GLenum type, GLsizei stride, const GLvoid * pointer);
and
void glDrawElements(GLenum mode, GLsizei count, GLenum type, const GLvoid * indices);`
to define and draw the terrain, thus I have these functions both hooked, and I hope to loop through the given vertex array at the pointer, and calculate normals for each surface, on either every DrawElements call or VertexPointer call, but I'm having trouble coming up with an approach to do so - specifically, how to read, iterate over, and understand the data at the pointer. In this case, the usual parameters for the glVertexPointer calls are size = 3, type = GL_float, stride = 16, pointer = some pointer. Hooking glVertexPointer, I don't know how I could iterate through the pointer and grab all the vertices for the mesh, considering I don't know the total count of all the vertices, nor do I understand how the data is structured at the pointer given the stride - and similarly how i should structure the normal array
Would it be a better idea to try to calculate the normals in drawelements for each specified index in the indice array?
Depending on your vertex array building procedure, indices would be the only relevant information for building your normals.
Difining normal average for one vertex is simple if you add a normal field in your vertex array, and sum all the normal calculations parsing your indices array.
You have than to divide each normal sum by the number of repetition in indices, count that you can save in a temporary array following vertex indices (incremented each time a normal is added to the vertex)
so to be more clear:
Vertex[vertexCount]: {Pos,Normal}
normalCount[vertexCount]: int count
Indices[indecesCount]: int vertexIndex
You may have 6 normals per vertex so add a temporary array of normal array to averrage those for each vertex:
NormalTemp[vertexCount][6] {x,y,z}
than parsing your indice array (if it's triangle):
for i=0 to indicesCount step 3
for each triangle top (t from 0 to 2)
NormalTemp[indices[i + t]][normalCount[indices[i+t]]+1] = normal calculation with cross product of vectors ending with other summits or this triangle
normalCount[indices[i+t]]++
than you have to divide your sums by the count
for i=0 to vertexCount step 1
for j=0 to NormalCount[i] step 1
sum += NormalTemp[i][j]
normal[i] = sum / normacount[i]
While I like and have voted up the j-p's answer I would still like to point out that you could get away with calculating one normal per face and just using for all 3 vertices. It would be faster, and easier, and sometimes even more accurate.

OpenGL - Calling glBindBufferBase with index = 1 breaks rendering (Pitch black)

There's an array of uniform blocks in my shader which is defined as such:
layout (std140) uniform LightSourceBlock
{
int shadowMapID;
int type;
vec3 position;
vec4 color;
float dist;
vec3 direction;
float cutoffOuter;
float cutoffInner;
float attenuation;
} LightSources[12];
To be able to bind my buffer objects to each LightSource, I've bound each uniform to a uniform block index:
for(unsigned int i=0;i<12;i++)
glUniformBlockBinding(program,locLightSourceBlock[i],i); // locLightSourceBlock contains the locations of each element in LightSources[]
When rendering, I'm binding my buffers to the respective index using:
glBindBufferBase(GL_UNIFORM_BUFFER,i,buffer);
This works fine, as long as I only bind a single buffer to the binding index 0. As soon as there's more, everything is pitch black (Even things that use entirely different shaders). (glGetError returns no errors)
If I change the block indices range from 0-11 to 2-13 (Skipping index 1), everything works as it should. I figured if I use index 1, I'm overwriting something, but I don't have any other uniform blocks in my shader, and I'm not using glUniformBlockBinding or glBindBufferBase anywhere else in my code, so I'm not sure.
What could be causing such behavior? Is the index 1 reserved for something?
1) Dont use multiple blocks. Use one block with array. Something like this:
struct Light{
...
}
layout(std430, binding=0) uniform lightBuffer{
Light lights[42];
}
skip glUniformBlockBinding and only glBindBufferBase to index specified in shader
2) Read up on alignment for std140, std430. In short, buffer variable are aligned so they dont cross 128bit boundaries. So in your case position would start at byte 16 (not 8). This results in mismatch of CPU/GPU side access. (Reorder variables or add padding)

How to interpret the meaning of glGetActiveUniformBlockiv with GL_UNIFORM_BLOCK_DATA_SIZE

Suppose I have the following vertex shader code:
#version 330
uniform mat4 ProjectionMatrix, CameraMatrix, SingleModelMatrix;
uniform uint SingleModel;
layout (std140) uniform ModelBlock {
mat4 ModelMatrices[128];
};
void main(void) {
... something that uses ModelMatrices
}
If in my program I issue the following OpenGL call for a program object representing the shader above:
getUniformBlockParameter(GL_UNIFORM_BLOCK_DATA_SIZE)
I find out the following with an Intel HD Graphics 4000 card:
GLUniformBlock[
name: ModelBlock,
blockSize: 8192,
buffer: java.nio.HeapByteBuffer[pos=0 lim=8192 cap=8192],
metadata: {
ModelMatrices=GLUniformBlockAttributeMetadata[
name: ModelMatrices,
offset: 0,
arrayStride: 64,
matrixStride: 16,
matrixOrder: 0
]},
uniforms: GLUniformInterface[
uniforms: {
ModelMatrices=GLUFloatMat4 [
name:ModelMatrices,
idx:2,
loc:-1,
size:128,
glType:8b5c]
}
]
]
How should I interpret the blockSize parameter? The OpenGL 3 sdk docs state that:
If pname is GL_UNIFORM_BLOCK_DATA_SIZE, the implementation-dependent minimum total buffer object size, in basic machine units, required to hold all active uniforms in the uniform block identified by uniformBlockIndex is returned. It is neither guaranteed nor expected that a given implementation will arrange uniform values as tightly packed in a buffer object. The exception to this is the std140 uniform block layout, which guarantees specific packing behavior and does not require the application to query for offsets and strides. In this case the minimum size may still be queried, even though it is determined in advance based only on the uniform block declaration.
So in this case how should I measure that 8192 number in blockSize? In floats? In bytes?
If I do the numbers, a 4x4 matrix mat4 uniform has a total of 16 float components, which fit into 64 bytes, so at the very least I need about 16 x 4 x 128 bytes to store 128 matrices which is indeed, 8192.
Why then is the hardware also asking for 64 (bytes?) for array stride and 16 (bytes?) of matrix stride, yet requesting only 8192 bytes? Shouldn't getUniformBlockParameter(GL_UNIFORM_BLOCK_DATA_SIZE) have requested more space for alignment/padding purposes?
The array stride is the byte offset from the start of one array element to the start of the next. The array elements in this case are mat4s, which are 64 bytes in size. So the array stride is as small as it can be.
The matrix stride is the byte offset from one column/row of matrix data to the next. Each mat4 has a column or row size of a vec4, which means it is 16 bytes in size. So again, the matrix columns/rows are tightly packed.
So 8KiB is the expected size of such storage.
That being said, this is entirely irrelevant. You shouldn't be bothering to query this stuff at all. By using std140 layout, you have already forced OpenGL to adopt a specific layout. One that notably requires mat4s to have an array stride of 64 bytes and to have a matrix stride of 16 bytes. You don't have to ask the implementation for this; it's required by the standard.
You only need to ask if you're using shared or packed.