Only first Compute Shader array element appears updated - c++

Trying to send an array of integer to a compute shader, sets an arbitrary value to each integer and then reads back on CPU/HOST. The problem is that only the first element of my array gets updated. My array is initialized with all elements = 5 in the CPU, then I try to sets all the values to 2 in the Compute Shader:
C++ Code:
this->numOfElements = std::vector<int> numOfElements; //num of elements for each voxel
//Set the reset grid program as current program
glUseProgram(this->resetGridProgHandle);
//Binds and fill the buffer
glBindBuffer(GL_SHADER_STORAGE_BUFFER, this->counterBufferHandle);
glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(int) * numOfVoxels, this->numOfElements.data(), GL_DYNAMIC_DRAW);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, this->counterBufferHandle);
//Flag used in the buffer map function
GLint bufMask = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT;
//calc maximum size for workgroups
//glGetIntegerv(GL_MAX_COMPUTE_WORK_GROUP_SIZE, &result);
//Executes the compute shader
glDispatchCompute(32, 1, 1); //
glMemoryBarrier(GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT);
//Gets a pointer to the returned data
int* returnArray = (int *)glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_READ_WRITE);
//Free the buffer mapping
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
Shader:
#version 430
layout (local_size_x = 32) in;
layout(binding = 0) buffer SSBO{
int counter[];
};
void main(){
counter[gl_WorkGroupID.x * gl_WorkGroupSize.x + gl_LocalInvocationID.x] = 2;
}
If I print returnArray[0] it's 2 (correct), but any index > 0 gives me 5, which is the initial value initialized in host.

glMemoryBarrier(GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT);
//Gets a pointer to the returned data
int* returnArray = (int *)glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_READ_WRITE);
The bit you use for glMemoryBarrier represents the way you want to read the data written by the shader. GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT says "I'm going to read this written data by using the buffer for vertex attribute arrays". In reality, you are going to read the buffer by mapping it.
So you should use the proper barrier bit:
glMemoryBarrier(GL_BUFFER_UPDATE_BARRIER_BIT);

Ok i had this problem as well
i changed:
layout(binding = 0) buffer SSBO{
to:
layout(binding = 0, std430) buffer SSBO{

Related

OpenGL how to get offset of array element in shared layout uniform block?

I have a shared layout uniform block in shader:
layout(shared) uniform TestBlock
{
int test[5];
};
How to get offset of test[3]?
When I try to use glGetUniformIndices to get index of test[3], it will return the same number of test[0]'s index.
So I cannot use glGetActiveUniformsiv to get offset of index of test[3].
Then, how to get offset of test[3]?
(Note that I don't want to use layout std140.)
Arrays of basic types like ints are treated as a single value. You can't get the offset of an individual element in the array. You can however query the array stride, the number of bytes from one element in the array to the next. Then you can just do the multiplication.
Using the new program introspection API:
auto ix = glGetProgramResourceIndex(prog, GL_UNIFORM, "TestBlock.test");
GLenum props[] = {GL_ARRAY_STRIDE, GL_OFFSET};
GLint values[2] = {};
glGetProgramResourceiv(prog, GL_UNIFORM, ix, 2, &props, 2, NULL, &values);
auto byteOffset = values[1] + (3 * values[0]);

Update SSBO in Compute shader

I am currently trying to update a SSBO linked/bound to a Computeshader. Doing it this way, I only write the first 32byte into the out_picture, because i only memcpy that many (sizeof(pstruct)).
Computeshader:
#version 440 core
struct Pstruct{
float picture[1920*1080*3];
float factor;
};
layout(std430, binding = 0) buffer Result{
float out_picture[];
};
layout(std430, binding = 1) buffer In_p1{
Pstruct in_p1;
};
layout(local_size_x = 1000) in;
void main() {
out_picture[gl_GlobalInvocationID.x] = out_picture[gl_GlobalInvocationID.x] +
in_p1.picture[gl_GlobalInvocationID.x] * in_p1.factor;
}
GLSL:
struct Pstruct{
std::vector<float> picture;
float factor;
};
Pstruct tmp;
tmp.factor = 1.0f;
for(int i = 0; i < getNUM_PIX(); i++){
tmp.picture.push_back(5.0f);
}
SSBO ssbo;
glGenBuffers(1, &ssbo.handle);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, ssbo.handle);
glBufferData(GL_SHADER_STORAGE_BUFFER, (getNUM_PIX() + 1) * sizeof(float), NULL, GL_DYNAMIC_DRAW);
...
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo.handle);
Pstruct* ptr = (Pstruct *) glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_WRITE_ONLY);
memcpy(ptr, &pstruct, sizeof(pstruct));
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
...
glUseProgram(program);
glDispatchCompute(getNUM_PIX() / getWORK_GROUP_SIZE(), 1, 1);
glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);
How can I copy both my picture array and my float factor at the same time?
Do I have to split the memcpy call into array and float? and if yes how? I can copy the first part, but I am not allowed to add an offset to the ptr.
First of all,
float picture[1920*1080*3];
clearly should be either a texture (you're only reading from it anyway) or at least an image.
Second:
struct Pstruct{
std::vector<float> picture;
float factor;
};
This definition does not match the definition in your shader in any way. The std::vector object will just be a meta object internally managing the data storage used by the vector. memcpy that to a GL buffer and passing that to the GPU does not make sense at all.
The correct approach would be to either copy the contents of that vector separately into the appropriate places inside the buffer, or to just us a struct definition on your client side which actually matches the one you're using in the shader (and taking all the rules of std430 into account). But, as my first point already was, the correct solution here is most likely to use a texture or image object instead.

Behaviour of length function on buffer sized not divisible by size of its type

What will return the length function if buffer bound to 0 SSBO binding point is of size=36 (not divisible by size of uvec4 = 16)? And what's the rule?..
#version 430 core
layout(local_size_x=256) in;
layout(std430, binding=0) buffer B { uvec4 data[]; };
void main() {
uint s = data.length();
//some other code...
}
For a shader storage block, the length() method, on the unsized (run-time sized) array as its last member, will return an value of type int, calculated by the following formula:
max((buffer_object_size - offset_of_array) / stride_of_array, 0)
This means if a buffer with a size of 36 bytes is bound to the following shader storage block
layout(std430, binding=0) buffer B { uvec4 data[]; };
then data.length() will return 2.
buffer_object_size = 36
offset_of_array = 0
stride_of_array = 16
max((36 - 0) / 16, 0) = 2
See ARB_shader_storage_buffer_object; Issue (19) (far at the end of the document):
In this expression, we allow unsized arrays at the end of shader storage blocks, and allow the ".length()" method to be used to determine the size of such arrays based on the size of the provided buffer object.
The derived array size can be derived by reversing the process described in issue (16):
array.length() =
max((buffer_object_size - offset_of_array) / stride_of_array, 0)

Find the maximum float in the array

I have a compute shader program which looks for the maximum value in the float array. it uses reduction (compare two values and save the bigger one to the output buffer).
Now I am not quite sure how to run this program from the Java code (using jogamp). In the display() method I run the program once (every time with the halved array in the input SSBO = result from previous iteration) and finish this when the array with results has only one item - the maximum.
Is this the correct method? Every time in the display() method creating and binding input and output SSBO, running the shader program and then check how many items was returned?
Java code:
FloatBuffer inBuffer = Buffers.newDirectFloatBuffer(array);
gl.glBindBuffer(GL3ES3.GL_SHADER_STORAGE_BUFFER, buffersNames.get(1));
gl.glBufferData(GL3ES3.GL_SHADER_STORAGE_BUFFER, itemsCount * Buffers.SIZEOF_FLOAT, inBuffer,
GL3ES3.GL_STREAM_DRAW);
gl.glBindBufferBase(GL3ES3.GL_SHADER_STORAGE_BUFFER, 1, buffersNames.get(1));
gl.glDispatchComputeGroupSizeARB(groupsCount, 1, 1, groupSize, 1, 1);
gl.glMemoryBarrier(GL3ES3.GL_SHADER_STORAGE_BARRIER_BIT);
ByteBuffer output = gl.glMapNamedBuffer(buffersNames.get(1), GL3ES3.GL_READ_ONLY);
Shader code:
#version 430
#extension GL_ARB_compute_variable_group_size : enable
layout (local_size_variable) in;
layout(std430, binding = 1) buffer MyData {
vec4 elements[];
} data;
void main() {
uint index = gl_GlobalInvocationID.x;
float n1 = data.elements[index].x;
float n2 = data.elements[index].y;
float n3 = data.elements[index].z;
float n4 = data.elements[index].w;
data.elements[index].x = max(max(n1, n2), max(n3, n4));
}

Shader storage block name issue

Something weird is happening with my shader storage blocks.
I have 2 SSBs:
#version 450 core
out vec4 out_color;
layout (binding = 0, std430) buffer A_SSB
{
float a_data[];
};
layout (binding = 1, std430) buffer B_SSB
{
float b_data[];
};
void main()
{
a_data[0] = 0.0f;
a_data[1] = 1.0f;
a_data[2] = 2.0f;
a_data[3] = 3.0f;
b_data[0] = 90.0f;
b_data[1] = 81.0f;
b_data[2] = 72.0f;
b_data[3] = 63.0f;
out_color = vec4(0.0f, 0.8f, 1.0f, 1.0f);
}
This is working well, but if i swap the SSB names like that:
layout (binding = 0, std430) buffer B_SSB
{
float a_data[];
};
layout (binding = 1, std430) buffer A_SSB
{
float b_data[];
};
the SSB indexes are swapped although they are hardcoded and data which should be written to a_data is written to b_data and vice versa.
Both SSBs are 250MB large, the max size is more than 2GB. It seems that the indexes are allocated alphabetical but this shouldn't happen. I'm binding the buffers like that:
glCreateBuffers(1, &a_ssb);
glNamedBufferStorage(a_ssb, 7187400 * 9 * sizeof(float), nullptr, GL_MAP_READ_BIT);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, a_ssb);
glShaderStorageBlockBinding(test_prog, 0, 0);
glCreateBuffers(1, &b_ssb);
glNamedBufferStorage(b_ssb, 7187400 * 9 * sizeof(float), nullptr, GL_MAP_READ_BIT);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, b_ssb);
glShaderStorageBlockBinding(test_prog, 1, 1);
Is this a bug or my fault? Also i would like to ask why i'm getting the error "lvalue in array access too complex or possible array index out of bounds" if i'm assigning values in a for loop?
for(unsigned int i = 0; i < 4; ++i)
a_data[i] = float(i);
glShaderStorageBlockBinding(test_prog, 0, 0);
This is your problem.
You assigned the binding index in the shader. You do not need to assign it again.
Your problem comes from the fact that you assigned it incorrectly.
The second parameter to this function is the index of the block you are assigning a binding index to. The only way to get a correct index is to query it via Program Introspection APIs. The block index is the resource index, queried through this call:
auto block_index = glGetProgramResourceIndex​(test_prog, GL_SHADER_STORAGE_BLOCK, "A_SSB");
It just so happened, in your original code, that the shader compiler assigned A_SSB's resource index to 0 and B_SSB's resource index to 1. This assignment was probably arbitrarily done based on their names. Thus, when you changed the names on them, the resource indices didn't change. So A_SSB was still resource index 0, but your shader assigned it binding index 1. Which was fine...
Until your C++ code overrode that assignment with your glShaderStorageBlockBinding(test_prog, 0, 0). That assigned resource index 0 (A_SSB) to binding index 0.
You should either set the binding index in the shader or in C++ code. Not in both.