Getting data back from compute shader - opengl

I'm fairly new to opengl and I found myself in situation where I need to get data from compute shader but as I miss some critical knowledge I can't get it to work. So I came here, so that maybe you can give me some hints.
Say I have a compute shader like this:
#version 430 core
struct rmTriangle
{
vec4 probeCenter;
vec4 triangles[3];
};
layout(std430, binding=9) buffer TriangleBuffer {
rmTriangle triangles[];
}trBuffer;
//other uniforms, variables and stuff
void main()
{
//here I make some computations and assign values to the
//trBuffer's triangles array
}
Now I would like to use the trBuffer's data in my application.
I was told to make a shader storage buffer
So this is what I did:
private int ssbo;
gl.glGenBuffers(1, &ssbo);
gl.glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo);
//just allocate enough amount of memory
gl.glBufferData(GL_SHADER_STORAGE_BUFFER, MAX_TRIANGLES * SIZEOF_TRIANGLE, null, GL_DYNAMIC_READ);
Then this:
int blockIndex = gl.glGetProgramResourceIndex(program,GL_SHADER_STORAGE_BLOCK, name.getBytes(), 0);
if (blockIndex != GL_INVALID_INDEX) {
gl.glShaderStorageBlockBinding(program, blockIndex, index);
} else {
System.err.println("Warning: binding " + name + " not found");
}
where name = "TriangleBuffer"
and index = 9
I know how to access the ssbo I created in my application. What I don't know is how to assign/transfer the TriangeBuffer data into my ssbo.

Add glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 9, ssbo);
Also, when I fetch data from SSBOs then I do glMapBufferRange and memcpy the stuff I need.

Related

GLSL Error : Undefined layout buffer variable in compute shader, though it is defined

I'm trying to make a simple compute shader using a Shader Storage Buffer (SSBO) to pass data to the shader. I'm coding in C++ with GLFW3 and GLEW. I'm passing an array of integers into an SSBO, binding it to the index 0, and expecting to retrieve the data in the shader from a layout buffer variable (as explained on various websites). However, I get an unexpected "undefined variable" error on shader compilation concerning this layout buffer variable, though it is clearly declared.
Here is the GLSL code of the compute shader (this script is only at its beginning) :
#version 430
layout (local_size_x = 1, local_size_y = 1, local_size_z = 1) in;
layout (std430, binding = 0) buffer params
{
ivec3 dims;
};
int index(ivec3 coords){
ivec3 dims = params.dims;
return coords.x + dims.y * coords.y + dims.x * dims.y * coords.z;
}
void main() {
ivec3 coords = ivec3(gl_GlobalInvocationID);
int i = index(coords);
}
I get the error : 0(12) : error C1503: undefined variable "params"
Here is the C++ script that setups and runs the compute shader :
int dimensions[] {width, height, depth};
GLuint paramSSBO;
glGenBuffers(1, &paramSSBO);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, paramSSBO);
glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(dimensions), &dimensions, GL_STREAM_READ);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, paramSSBO);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
GLuint computeShaderID;
GLuint csProgramID;
char* computeSource;
loadShaderSource(computeSource, "compute.glsl");
computeShaderID = glCreateShader(GL_COMPUTE_SHADER);
compileShader(computeShaderID, computeSource);
delete[] computeSource;
csProgramID = glCreateProgram();
glAttachShader(csProgramID, computeShaderID);
glLinkProgram(csProgramID);
glDeleteShader(computeShaderID);
glUseProgram(csProgramID);
glDispatchCompute(width, height, depth);
glMemoryBarrier(GL_BUFFER_UPDATE_BARRIER_BIT);
glUseProgram(0);
glDeleteBuffers(1, &paramSSBO);
width, height and depth are int variables defined earlier in the program. I'm binding the dimensions array to the index 0 and I expect to retrieve it in the ivec3 params.dims variable in the shader. However the params variable is said to be undefined when used in the index() function.
This script is just the beginning and I wanted to add a second buffer where the shader would actually write its result, but I'm stuck here. For clarification : in the complete script I expect not to write in any texture (as all online examples show), but write the results in the second buffer from which I will get the data back into a C++ array for further use.
params is not a variable. Nor is it a struct or class. It is the name of an interface block. And the name of an interface block is not really part of GLSL itself; it's part of OpenGL. It's the name used by the OpenGL API to represent that particular block.
You never use an interface block's name in the shader text itself, outside of defining it.
Unless you give your interface block an instance name, the names of all variables within that block are essentially part of the global namespace. Indeed, scoping those names is the whole point of giving the block an instance name.
So the correct way to access the dims field in the interfae block is as "dims".

OpenGL Shader Storage Buffer / memoryBarrierBuffer

I currently created two SSBO's to handle some lights because the VS-FS in out interface can't handle a lot of lights (Im using forward shading).
For the first one I only pass the values to the shader (basically a read only one) [cpp]:
struct GLightProperties
{
unsigned int numLights;
LightProperties properties[];
};
...
glp = (GLightProperties*)malloc(sizeof(GLightProperties) + sizeof(LightProperties) * lastSize);
...
glGenBuffers(1, &ssbo);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo);
glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(GLightProperties) + sizeof(LightProperties) * lastSize, glp, GL_DYNAMIC_COPY);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, ssbo);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
Shader file [GLSL]:
layout(std430, binding = 1) buffer Lights
{
uint numLights;
LightProperties properties[];
}lights;
So this first SSBO turns out to work fine. However, in the other one, which purpose is VS-FS interface, has some issues:
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo2);
glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(float) * 4 * 3 * lastSize, nullptr, GL_DYNAMIC_COPY);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
GLSL:
struct TangentProperties
{
vec4 TangentLightPos;
vec4 TangentViewPos;
vec4 TangentFragPos;
};
layout(std430, binding = 0) buffer TangentSpace
{
TangentProperties tangentProperties[];
}tspace;
So here you notice I pass nullptr to the glBufferData because the vs will write in the buffer and the fs will read its contents.
Like so in the VS Stage:
for(int i = 0; i < lights.numLights; i++)
{
tspace.tangentProperties[index].TangentLightPos.xyz = TBN * lights.properties[index].lightPosition.xyz;
tspace.tangentProperties[index].TangentViewPos.xyz = TBN * camPos;
tspace.tangentProperties[index].TangentFragPos.xyz = TBN * vec3(worldPosition);
memoryBarrierBuffer();
}
After this the FS reads the values, which turn out to be just garbage. Am I doing something wrong with memory barriers?
The output turns out this way:
OK, let's get the obvious bug out of the way:
for(int i = 0; i < lights.numLights; i++)
{
tspace.tangentProperties[index].TangentLightPos.xyz = TBN * lights.properties[index].lightPosition.xyz;
tspace.tangentProperties[index].TangentViewPos.xyz = TBN * camPos;
tspace.tangentProperties[index].TangentFragPos.xyz = TBN * vec3(worldPosition);
memoryBarrierBuffer();
}
index never changes in this loop, so you're only writing a single light, and you're only writing the last lights' values. All other lights will have garbage/undefined values.
So you probably meant i rather than index.
But that's only the beginning of the problem. See, if you make that change, you get this:
for(int i = 0; i < lights.numLights; i++)
{
tspace.tangentProperties[i].TangentLightPos.xyz = TBN * lights.properties[i].lightPosition.xyz;
tspace.tangentProperties[i].TangentViewPos.xyz = TBN * camPos;
tspace.tangentProperties[i].TangentFragPos.xyz = TBN * vec3(worldPosition);
}
memoryBarrierBuffer();
Note that the barrier is outside the loop.
That creates a new problem. This code will have every vertex shader invocation writing to the same memory buffer. SSBOs, after all, are not VS output variables. Output variables are stored as part of a vertex. The rasterizer then interpolates this vertex data across the primitive as it rasterizes it, which provides the input values for the FS. So one VS cannot stomp on the output variables of another VS.
That doesn't happen with SSBOs. Every VS is acting on the same SSBO memory. So if they write to the same indices of the same array, they're writing to the same memory address. Which is a race condition (since there can be no synchronization between sibling invocations) and therefore undefined behavior.
So, the only way what you're trying to do could possibly work is if your buffer has numLights entries for each vertex in the entire scene.
This is a fundamentally unreasonable amount of space. Even if you could get it down to just the number of vertices in a particular draw call (which is doable, but I'm not going to say how), you would still be behind in performance. Every FS invocation will have to perform reads of 144 bytes of data for each light (3 table entries, one for each vertex of the triangle), linearly interpolate those values, and then do lighting computations.
It would be faster for you to just pass the TBN matrix as a VS output and do the matrix multiplies in the FS. Yes, that's a lot of matrix multiplies, but GPUs are really fast at matrix multiplies, and are really slow at reading memory.
Also, reconsider whether you need the tangent-space fragment position. Generally speaking, you never do.

OpenGL 4.5 - Shader storage buffer objects layout

I'm trying my hand at shader storage buffer objects (aka Buffer Blocks) and there are a couple of things I don't fully grasp. What I'm trying to do is to store the (simplified) data of an indeterminate number of lights n in them, so my shader can iterate through them and perform calculations.
Let me start by saying that I get the correct results, and no errors from OpenGL. However, it bothers me not to know why it is working.
So, in my shader, I got the following:
struct PointLight {
vec3 pos;
float intensity;
};
layout (std430, binding = 0) buffer PointLights {
PointLight pointLights[];
};
void main() {
PointLight light;
for (int i = 0; i < pointLights.length(); i++) {
light = pointLights[i];
// etc
}
}
and in my application:
struct PointLightData {
glm::vec3 pos;
float intensity;
};
class PointLight {
// ...
PointLightData data;
// ...
};
std::vector<PointLight*> pointLights;
glGenBuffers(1, &BBO);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, BBO);
glNamedBufferStorage(BBO, n * sizeof(PointLightData), NULL, GL_DYNAMIC_STORAGE_BIT);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, BBO);
...
for (unsigned int i = 0; i < pointLights.size(); i++) {
glNamedBufferSubData(BBO, i * sizeof(PointLightData), sizeof(PointLightData), &(pointLights[i]->data));
}
In this last loop I'm storing a PointLightData struct with an offset equal to its size times the number of them I've already stored (so offset 0 for the first one).
So, as I said, everything seems correct. Binding points are correctly set to the zeroeth, I have enough memory allocated for my objects, etc. The graphical results are OK.
Now to my questions. I am using std430 as the layout - in fact, if I change it to std140 as I originally did it breaks. Why is that? My hypothesis is that the layout generated by std430 for the shader's PointLights buffer block happily matches that generated by the compiler for my application's PointLightData struct (as you can see in that loop I'm blindingly storing one after the other). Do you think that's the case?
Now, assuming I'm correct in that assumption, the obvious solution would be to do the mapping for the sizes and offsets myself, querying opengl with glGetUniformIndices and glGetActiveUniformsiv (the latter called with GL_UNIFORM_SIZE and GL_UNIFORM_OFFSET), but I got the sneaking suspicion that these two guys only work with Uniform Blocks and not Buffer Blocks like I'm trying to do. At least, when I do the following OpenGL throws a tantrum, gives me back a 1281 error and returns a very weird number as the indices (something like 3432898282 or whatever):
const char * names[2] = {
"pos", "intensity"
};
GLuint indices[2];
GLint size[2];
GLint offset[2];
glGetUniformIndices(shaderProgram->id, 2, names, indices);
glGetActiveUniformsiv(shaderProgram->id, 2, indices, GL_UNIFORM_SIZE, size);
glGetActiveUniformsiv(shaderProgram->id, 2, indices, GL_UNIFORM_OFFSET, offset);
Am I correct in saying that glGetUniformIndices and glGetActiveUniformsiv do not apply to buffer blocks?
If they do not, or the fact that it's working is like I imagine just a coincidence, how could I do the mapping manually? I checked appendix H of the programming guide and the wording for array of structures is somewhat confusing. If I can't query OpenGL for sizes/offsets for what I'm tryind to do, I guess I could compute them manually (cumbersome as it is) but I'd appreciate some help in there, too.

Compute Shader not writing to buffer

I'm looking for help calling a compute shader from Qt using QOpenGLFunctions_4_3_Core OpenGL functions.
Specifically, my call to glDispatchCompute(1024, 1, 1); does not seem to have any effect on the buffer bound to it. How do you bind a buffer to a compute shader in QT such that the results of the shader can be read back to the C++?
I create my program and bind it with (Squircle.cpp):
computeProgram_ = new QOpenGLShaderProgram();
computeProgram_->addShaderFromSourceFile(QOpenGLShader::Compute, "../app/shaders/pointcloud.comp");
computeProgram_->bindAttributeLocation("Particles", 0);
m_ParticlesLoc = 0;
computeProgram_->link();
And then bind my QOpenGLBuffer with (Squircle.cpp):
// Setup our vertex buffer object.
pointOpenGLBuffer_.create();
pointOpenGLBuffer_.bind();
pointOpenGLBuffer_.allocate(pointBuffer_.data(), pointBuffer_.vertexCount() * pointBuffer_.stride_);
Then I invoke the compute shader with (Squircle.cpp):
computeProgram_->bind();
// ...
pointOpenGLBuffer_.bind();
glDispatchCompute(1024, 1, 1);
But when I read my buffer, either with read() or by map()'ing, the values are never changed, they're just what I originally inserted.
From the compute shader perspective, I accept my input with (pointcloud.comp):
#version 430
layout(local_size_x = 1024) in;
struct ParticleData
{
vec4 position;
};
// Particles from previous frame
layout (std430, binding = 0) coherent buffer Particles
{
ParticleData particles[];
} data;
Am I not binding my buffer properly maybe? Or is there another OpenGL command to call to actually dispatch the compute? I've tried different usages, etc.
I've posted all the relevant code here.
It seems that problem is in wrong buffer binding understanding.
pointOpenGLBuffer_.bind();
only binds your buffer to your OGL context, not to your shader buffer, calling it twice won't do the trick.
Second time instead of just bind you need to call
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, pointOpenGLBuffer_.bufferId());
where 0 comes from your layout (std430, binding = 0)

Shader storage buffer object with bytes

I am working on a compute shader where the output is written to SSBO.Now,the consumer of this buffer is CUDA which expects it to contain unsigned bytes.I currently can't see find the way how to write a byte per index in SSBO.With texture or image the normalized float to unsigned byte conversion is handled by OpenGL.For example I can attach a texture with internal format R8 and store byte per entry.But nothing like this is possible with SSBO.Does it mean that except of bool data type all the numerical storage types in SSBO can be at least 4 bytes per entry only?
Practically speaking I would like to be able to do the following:
Compute shader:
#version 430 core
layout (local_size_x = 8,local_size_y = 8 ) in;
struct SSBOBlock
{
byte mydata;
};
layout(std430,binding = BUFFER_OUTPUT) writeonly buffer bBuffer
{
SSBOBlock Ouput[];
} Out;
void main()
{
//..... Compute shader stuff...
//.......
Out.Ouput[globalIndex].mydata = val;//where val is normalized float
}
The smallest type exposed on GPUs tends to be 32-bit for scalars. Even the boolean type you mentioned is actually 32-bit. The same goes for languages like C often; a boolean does not need anything more than 1-bit but even so bool is not synonymous with "give me the smallest data type available."
There are intrinsic functions you can use to pack and unpack data types however and I will show an example of how to use them below:
#version 420 core
layout (local_size_x = 8,local_size_y = 8 ) in;
struct SSBOBlock
{
uint mydata;
};
layout(std430,binding = BUFFER_OUTPUT) writeonly buffer bBuffer
{
SSBOBlock Ouput[];
} Out;
void main()
{
//..... Compute shader stuff...
//.......
Out.Output [globalIndex].mydata = packUnorm4x8 (val)
// where val is a 4-component unsigned normalized vector to pack into globalIndex
}
Your sample shader shows an attempt to write a single scalar to a "byte" data type, that is not possible and you are going to have to modify this somehow to work with indices that reference a packed group of 4 scalars. In the worst-case, this might mean unpacking three values and then re-packing the entire thing just to write one scalar.
This intrinsic function is discussed in the extension specification for GL_ARB_shading_languge_packing and is core in GL 4.2 and later.
Even if you were on an implementation that does not support that extension, it is explained in the text of the extension specification exactly what each does. The equivalent operation for packUnorm4x8 is:
uint fixed_val = round(clamp(float_val, 0, +1) * 255.0);
Some bit-shifts will be necessary to properly pack each component, but those are trivial.
I found a way to write unsigned byte data into buffer in compute shader.Buffer texture does the job.It is basically image texture with buffer as storage.This way I can specify image format to be R8 which allows me to store byte size values on each index of the buffer.
GLuint _tbo_buffer,_tbo_tex;
glGenBuffers(1, &_tbo_buffer);
glBindBuffer(GL_TEXTURE_BUFFER, _tbo_buffer);
glBufferData(GL_TEXTURE_BUFFER, SCREEN_WIDTH * SCREEN_HEIGHT, NULL, GL_DYNAMIC_COPY);
glGenTextures(1, &_tbo_tex);
glBindTexture(GL_TEXTURE_BUFFER, _tbo_tex);
//attach the TBO to the texture:
glTexBuffer(GL_TEXTURE_BUFFER, GL_R8, _tbo_buffer);
glBindImageTexture(0, _tbo_tex, 0, GL_FALSE, 0, GL_WRITE_ONLY, GL_R8);
Compute shader:
#version 430 core
layout (local_size_x = 8,local_size_y = 8 ) in;
layout(binding=0) uniform sampler2D TEX_IN;
layout(r8) writeonly uniform imageBuffer mybuffer;
void main(){
vec2 texSize = vec2(textureSize(TEX_IN,0));
vec2 uv = vec2(gl_GlobalInvocationID.xy / texSize);
vec4 tex = texture(TEX_IN,uv);
uint globalIndex = gl_GlobalInvocationID.y * nThreads.x + gl_GlobalInvocationID.x;
//store only r:
imageStore(mybuffer,int(globalIndex),vec4(0.5,0,0,0));
}
Then we can read byte by byte on CPU or map to CUDA buffer resource:
GLubyte* ptr = (GLubyte*)glMapBuffer(GL_TEXTURE_BUFFER, GL_READ_ONLY);