OpenGL 4.5 - Shader storage buffer objects layout - c++

I'm trying my hand at shader storage buffer objects (aka Buffer Blocks) and there are a couple of things I don't fully grasp. What I'm trying to do is to store the (simplified) data of an indeterminate number of lights n in them, so my shader can iterate through them and perform calculations.
Let me start by saying that I get the correct results, and no errors from OpenGL. However, it bothers me not to know why it is working.
So, in my shader, I got the following:
struct PointLight {
vec3 pos;
float intensity;
};
layout (std430, binding = 0) buffer PointLights {
PointLight pointLights[];
};
void main() {
PointLight light;
for (int i = 0; i < pointLights.length(); i++) {
light = pointLights[i];
// etc
}
}
and in my application:
struct PointLightData {
glm::vec3 pos;
float intensity;
};
class PointLight {
// ...
PointLightData data;
// ...
};
std::vector<PointLight*> pointLights;
glGenBuffers(1, &BBO);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, BBO);
glNamedBufferStorage(BBO, n * sizeof(PointLightData), NULL, GL_DYNAMIC_STORAGE_BIT);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, BBO);
...
for (unsigned int i = 0; i < pointLights.size(); i++) {
glNamedBufferSubData(BBO, i * sizeof(PointLightData), sizeof(PointLightData), &(pointLights[i]->data));
}
In this last loop I'm storing a PointLightData struct with an offset equal to its size times the number of them I've already stored (so offset 0 for the first one).
So, as I said, everything seems correct. Binding points are correctly set to the zeroeth, I have enough memory allocated for my objects, etc. The graphical results are OK.
Now to my questions. I am using std430 as the layout - in fact, if I change it to std140 as I originally did it breaks. Why is that? My hypothesis is that the layout generated by std430 for the shader's PointLights buffer block happily matches that generated by the compiler for my application's PointLightData struct (as you can see in that loop I'm blindingly storing one after the other). Do you think that's the case?
Now, assuming I'm correct in that assumption, the obvious solution would be to do the mapping for the sizes and offsets myself, querying opengl with glGetUniformIndices and glGetActiveUniformsiv (the latter called with GL_UNIFORM_SIZE and GL_UNIFORM_OFFSET), but I got the sneaking suspicion that these two guys only work with Uniform Blocks and not Buffer Blocks like I'm trying to do. At least, when I do the following OpenGL throws a tantrum, gives me back a 1281 error and returns a very weird number as the indices (something like 3432898282 or whatever):
const char * names[2] = {
"pos", "intensity"
};
GLuint indices[2];
GLint size[2];
GLint offset[2];
glGetUniformIndices(shaderProgram->id, 2, names, indices);
glGetActiveUniformsiv(shaderProgram->id, 2, indices, GL_UNIFORM_SIZE, size);
glGetActiveUniformsiv(shaderProgram->id, 2, indices, GL_UNIFORM_OFFSET, offset);
Am I correct in saying that glGetUniformIndices and glGetActiveUniformsiv do not apply to buffer blocks?
If they do not, or the fact that it's working is like I imagine just a coincidence, how could I do the mapping manually? I checked appendix H of the programming guide and the wording for array of structures is somewhat confusing. If I can't query OpenGL for sizes/offsets for what I'm tryind to do, I guess I could compute them manually (cumbersome as it is) but I'd appreciate some help in there, too.

Related

Unexpeced value upon accessing an SSBO float

I am trying to calculate a morph offset for a gpu driven animation.
To that effect I have the following function (and SSBOS):
layout(std140, binding = 7) buffer morph_buffer
{
vec4 morph_targets[];
};
layout(std140, binding = 8) buffer morph_weight_buffer
{
float morph_weights[];
};
vec3 GetMorphOffset()
{
vec3 offset = vec3(0);
for(int target_index=0; target_index < target_count; target_index++)
{
float w1 = morph_weights[1];
offset += w1 * morph_targets[target_index * vertex_count + gl_VertexIndex].xyz;
}
return offset;
}
I am seeing strange behaviour so I opened renderdoc to trace the state:
As you can see, index 1 of the morph_weights SSBO is 0. However if I step over in the built in debugger for renderdoc I obtain:
Or in short, the variable I get back is 1, not 0.
So I did a little experiment and changed one of the values and now the SSBO looks like this:
And now I get this:
So my SSBO of type float is being treated like an ssbo of vec4's it seems. I am aware of alignment issues with vec3's, but IIRC floats are fair game. What is happenning?
Upon doing a little bit of asking around.
The issue is the SSBO is marked as std140, the correct std for a float array is std430.
For the vulkan GLSL dialect, an alternative is to use the scalar qualifier.

How do I load multiple structs into a single UBO?

I am following the tutorials on: Here.
I have completed up till loading models so my code is similar to that point.
I am now trying to pass another struct to the Uniform Buffer Object, in a similar way to previously shown.
I have created another struct defined outside the application to store the data as follows:
struct Light{
alignas(16) glm::vec3 position;
alignas(16) glm::vec3 colour;
};
After doing this, I resized the uniform buffer size in the following way:
void createUniformBuffers() {
VkDeviceSize bufferSize = sizeof(CameraUBO) + sizeof(Light);
...
Next, when creating the descriptor sets, I added the lightBufferInfo below the already defined bufferInfo as shown below:
...
for (size_t i = 0; i < swapChainImages.size(); i++) {
VkDescriptorBufferInfo bufferInfo = {};
bufferInfo.buffer = uniformBuffers[i];
bufferInfo.offset = 0;
bufferInfo.range = sizeof(UniformBufferObject);
VkDescriptorBufferInfo lightBufferInfo = {};
lightBufferInfo.buffer = uniformBuffers[i];
lightBufferInfo.offset = 0;
lightBufferInfo.range = sizeof(Light);
...
I then added this to the descriptorWrites array:
...
descriptorWrites[2].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
descriptorWrites[2].dstSet = descriptorSets[i];
descriptorWrites[2].dstBinding = 2;
descriptorWrites[2].dstArrayElement = 0;
descriptorWrites[2].descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER;
descriptorWrites[2].descriptorCount = 1;
descriptorWrites[2].pBufferInfo = &lightBufferInfo;
...
Now similarly to the UniformBufferObject I plan to use the updateUniformBuffer(uint32_t currentImage) function to change the lights position and colour, but first I just tried to set the position to a desired value:
void updateUniformBuffer(uint32_t currentImage) {
...
ubo.proj[1][1] *= -1;
Light light = {};
light.position = glm::vec3(0, 10, 10);
light.colour = glm::vec3(1, 1, 0);
void* data;
vkMapMemory(device, uniformBuffersMemory[currentImage], 0, sizeof(ubo), 0, &data);
memcpy(data, &ubo, sizeof(ubo));
vkUnmapMemory(device, uniformBuffersMemory[currentImage]);
}
I do not understand how the offset works when trying to pass two objects to a uniform buffer, so I do not know how to copy the light object to uniformBuffersMemory.
How would the offsets be defined in order to get this to work?
A note before reading further: Splitting data for a single UBO into two different structs and descriptors makes passing data a bit more complicated, as all your sizes and writes need to be aligned to the minUniformBufferAlignment property of your device, making your code a bit more complicated. If you're starting with Vulkan you may want to split the data either into two UBOs (creating two buffers), or just pass all values as a single struct.
But if you want to continue with the way you described in your post:
First you need to properly size your array, because your copies need to be aligned to minUniformBufferAlignment you probably can't just copy your light data to the area right after your other data. If your device has an minUniformBufferAlignment of 256 bytes and you want to copy over two host structs you'r uniform buffers size needs to be at least 2 * 256 bytes and not just sizeof(matrices) + sizeof(lights). So you need to adjust your bufferSize in the VkDeviceSize structure accordingly.
Next you need to offset your lightBufferInfo VkDescriptorBufferInfo:
lightBufferInfo.offset = std::max(sizeof(Light), minUniformBufferOffsetAlignment);
This will let your vertex shader know where to start fetching data for that binding.
On most NVidia GPUs e.g., minUniformBufferOffsetAlignment is 256 bytes, where as the size of your Light struct is 32 bytes. So to make this work on such a GPU you have to align at 256 bytes instead of 32.
Inspecting your setup in RenderDoc should then look similar to this:
Note that for more complex allocations and scenarios you'd need to properly get the right alignment size depending on the size of your data structure instead of using a simple max like above.
And now when updating your uniform buffers you need to map and copy to the proper offset too:
void* mapped = nullptr;
// Copy matrix data to offset for binding 0
vkMapMemory(device, uniformBuffersMemory[currentImage].memory, 0, sizeof(ubo), 0, &mapped);
memcpy(mapped, &ubo, sizeof(ubo));
vkUnmapMemory(device, uniformBuffersMemory[currentImage].memory);
// Copy light data to offset for binding 1
vkMapMemory(device, uniformBuffersMemory[currentImage].memory, std::max(sizeof(ubo), minUniformBufferOffsetAlignment), sizeof(Light), 0, &mapped);
memcpy(mapped, &uboLight, sizeof(Light));
vkUnmapMemory(device, uniformBuffersMemory[currentImage].memory);
Note that you may want to only map once after creating the buffers for performance reasons rather than mapping on every update. Just store the offset pointer somewhere in your code.

OpenGL Shader Storage Buffer / memoryBarrierBuffer

I currently created two SSBO's to handle some lights because the VS-FS in out interface can't handle a lot of lights (Im using forward shading).
For the first one I only pass the values to the shader (basically a read only one) [cpp]:
struct GLightProperties
{
unsigned int numLights;
LightProperties properties[];
};
...
glp = (GLightProperties*)malloc(sizeof(GLightProperties) + sizeof(LightProperties) * lastSize);
...
glGenBuffers(1, &ssbo);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo);
glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(GLightProperties) + sizeof(LightProperties) * lastSize, glp, GL_DYNAMIC_COPY);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, ssbo);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
Shader file [GLSL]:
layout(std430, binding = 1) buffer Lights
{
uint numLights;
LightProperties properties[];
}lights;
So this first SSBO turns out to work fine. However, in the other one, which purpose is VS-FS interface, has some issues:
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo2);
glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(float) * 4 * 3 * lastSize, nullptr, GL_DYNAMIC_COPY);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
GLSL:
struct TangentProperties
{
vec4 TangentLightPos;
vec4 TangentViewPos;
vec4 TangentFragPos;
};
layout(std430, binding = 0) buffer TangentSpace
{
TangentProperties tangentProperties[];
}tspace;
So here you notice I pass nullptr to the glBufferData because the vs will write in the buffer and the fs will read its contents.
Like so in the VS Stage:
for(int i = 0; i < lights.numLights; i++)
{
tspace.tangentProperties[index].TangentLightPos.xyz = TBN * lights.properties[index].lightPosition.xyz;
tspace.tangentProperties[index].TangentViewPos.xyz = TBN * camPos;
tspace.tangentProperties[index].TangentFragPos.xyz = TBN * vec3(worldPosition);
memoryBarrierBuffer();
}
After this the FS reads the values, which turn out to be just garbage. Am I doing something wrong with memory barriers?
The output turns out this way:
OK, let's get the obvious bug out of the way:
for(int i = 0; i < lights.numLights; i++)
{
tspace.tangentProperties[index].TangentLightPos.xyz = TBN * lights.properties[index].lightPosition.xyz;
tspace.tangentProperties[index].TangentViewPos.xyz = TBN * camPos;
tspace.tangentProperties[index].TangentFragPos.xyz = TBN * vec3(worldPosition);
memoryBarrierBuffer();
}
index never changes in this loop, so you're only writing a single light, and you're only writing the last lights' values. All other lights will have garbage/undefined values.
So you probably meant i rather than index.
But that's only the beginning of the problem. See, if you make that change, you get this:
for(int i = 0; i < lights.numLights; i++)
{
tspace.tangentProperties[i].TangentLightPos.xyz = TBN * lights.properties[i].lightPosition.xyz;
tspace.tangentProperties[i].TangentViewPos.xyz = TBN * camPos;
tspace.tangentProperties[i].TangentFragPos.xyz = TBN * vec3(worldPosition);
}
memoryBarrierBuffer();
Note that the barrier is outside the loop.
That creates a new problem. This code will have every vertex shader invocation writing to the same memory buffer. SSBOs, after all, are not VS output variables. Output variables are stored as part of a vertex. The rasterizer then interpolates this vertex data across the primitive as it rasterizes it, which provides the input values for the FS. So one VS cannot stomp on the output variables of another VS.
That doesn't happen with SSBOs. Every VS is acting on the same SSBO memory. So if they write to the same indices of the same array, they're writing to the same memory address. Which is a race condition (since there can be no synchronization between sibling invocations) and therefore undefined behavior.
So, the only way what you're trying to do could possibly work is if your buffer has numLights entries for each vertex in the entire scene.
This is a fundamentally unreasonable amount of space. Even if you could get it down to just the number of vertices in a particular draw call (which is doable, but I'm not going to say how), you would still be behind in performance. Every FS invocation will have to perform reads of 144 bytes of data for each light (3 table entries, one for each vertex of the triangle), linearly interpolate those values, and then do lighting computations.
It would be faster for you to just pass the TBN matrix as a VS output and do the matrix multiplies in the FS. Yes, that's a lot of matrix multiplies, but GPUs are really fast at matrix multiplies, and are really slow at reading memory.
Also, reconsider whether you need the tangent-space fragment position. Generally speaking, you never do.

GLSL Compute Shader Setting buffer with lookup table results in no data written, setting the same buffer with other data works

I am attempting to implement a slightly modified version of this standard marching cubes algorithm in a compute shader.
I have reached the stage at which triTable is used to insert the correct vertex indices into a buffer and have modified the table to be 1 dimensional (const int triTable[4096]={-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,8,3...})
The following code shows the error that I am experiencing (this does not implement the algorithm however it demonstrates the current issue fully):
layout(binding=1) buffer Grid
{
float GridData[]; //contains 512*512*512 data volume previously generated, unused in this test case
};
uniform uint marchableCount;
uniform uint pointCount;
layout(std430, binding = 4) buffer X {uvec4 marchableList[];}; //format is x,y,z,cubeIndex
layout(std430, binding = 5) buffer v {vec4 vertices[];};
layout(std430,binding = 6) buffer n {vec4 normals[];};
layout(binding = 7) uniform atomic_uint triCount;
void main()
{
uvec3 gid = marchableList[gl_GlobalInvocationID.x].xyz; //xyz of grid cell
int E = int(edgeTable[marchableList[gl_GlobalInvocationID.x].w]);
if (E != 0)
{
uint cubeIndex = marchableList[gl_GlobalInvocationID.x].w;
uint index = atomicCounterIncrement(triCount);
int tCount = 0;//unused in this test, used for iteration in actual algorithm
int tGet = tCount+16*int(cubeIndex); //correction from converting 2d array to 1d array
vertices[index] = vec4(tGet);
}
}
This code produces expected values: the vertices buffer is filled with data and the atomic counter increments
changing this line:
vertices[index] = vec4(tGet);
to
vertices[index] = vec4(triTable[tGet]);
or
vertices[index] = vec4(triTable[tGet]+1);
(demonstrating that triTable is not coincidentally returning zeros)
results in what appears to be a complete failure of the shader: the buffer is filled with zeros and the atomic counter does not increment. No error messages are output when the shader is compiled. tGet is less than 4096.
The following test cases also produce the correct output:
vertices[index] = vec4(triTable[3]); //-1
vertices[index] = vec4(triTable[4095]); //also -1
showing that triTable is in fact implemented correctly
What causes the shader to have issues in these very specific cases?
I'm more surprised that const int triTable[4096] = {...}; compiles at all. That array, if it is actually needed, is 16KB in size. That's a lot for a shader, even if the array lives in shared memory.
What is most likely happening is that, whenever the compiler detects usage of this array that it can't optimize it out to a simple value (triTable[3] will always be 1, so the compiler doesn't need to store the whole table), the compilation either fails or results in a non-functional shader.
It would be best to make this table a uniform buffer. An SSBO might work too, but some hardware implements uniform blocks through specialized memory rather than with a global memory fetch.

Getting data back from compute shader

I'm fairly new to opengl and I found myself in situation where I need to get data from compute shader but as I miss some critical knowledge I can't get it to work. So I came here, so that maybe you can give me some hints.
Say I have a compute shader like this:
#version 430 core
struct rmTriangle
{
vec4 probeCenter;
vec4 triangles[3];
};
layout(std430, binding=9) buffer TriangleBuffer {
rmTriangle triangles[];
}trBuffer;
//other uniforms, variables and stuff
void main()
{
//here I make some computations and assign values to the
//trBuffer's triangles array
}
Now I would like to use the trBuffer's data in my application.
I was told to make a shader storage buffer
So this is what I did:
private int ssbo;
gl.glGenBuffers(1, &ssbo);
gl.glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo);
//just allocate enough amount of memory
gl.glBufferData(GL_SHADER_STORAGE_BUFFER, MAX_TRIANGLES * SIZEOF_TRIANGLE, null, GL_DYNAMIC_READ);
Then this:
int blockIndex = gl.glGetProgramResourceIndex(program,GL_SHADER_STORAGE_BLOCK, name.getBytes(), 0);
if (blockIndex != GL_INVALID_INDEX) {
gl.glShaderStorageBlockBinding(program, blockIndex, index);
} else {
System.err.println("Warning: binding " + name + " not found");
}
where name = "TriangleBuffer"
and index = 9
I know how to access the ssbo I created in my application. What I don't know is how to assign/transfer the TriangeBuffer data into my ssbo.
Add glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 9, ssbo);
Also, when I fetch data from SSBOs then I do glMapBufferRange and memcpy the stuff I need.