Update SSBO in Compute shader - opengl

I am currently trying to update a SSBO linked/bound to a Computeshader. Doing it this way, I only write the first 32byte into the out_picture, because i only memcpy that many (sizeof(pstruct)).
Computeshader:
#version 440 core
struct Pstruct{
float picture[1920*1080*3];
float factor;
};
layout(std430, binding = 0) buffer Result{
float out_picture[];
};
layout(std430, binding = 1) buffer In_p1{
Pstruct in_p1;
};
layout(local_size_x = 1000) in;
void main() {
out_picture[gl_GlobalInvocationID.x] = out_picture[gl_GlobalInvocationID.x] +
in_p1.picture[gl_GlobalInvocationID.x] * in_p1.factor;
}
GLSL:
struct Pstruct{
std::vector<float> picture;
float factor;
};
Pstruct tmp;
tmp.factor = 1.0f;
for(int i = 0; i < getNUM_PIX(); i++){
tmp.picture.push_back(5.0f);
}
SSBO ssbo;
glGenBuffers(1, &ssbo.handle);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, ssbo.handle);
glBufferData(GL_SHADER_STORAGE_BUFFER, (getNUM_PIX() + 1) * sizeof(float), NULL, GL_DYNAMIC_DRAW);
...
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo.handle);
Pstruct* ptr = (Pstruct *) glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_WRITE_ONLY);
memcpy(ptr, &pstruct, sizeof(pstruct));
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
...
glUseProgram(program);
glDispatchCompute(getNUM_PIX() / getWORK_GROUP_SIZE(), 1, 1);
glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);
How can I copy both my picture array and my float factor at the same time?
Do I have to split the memcpy call into array and float? and if yes how? I can copy the first part, but I am not allowed to add an offset to the ptr.

First of all,
float picture[1920*1080*3];
clearly should be either a texture (you're only reading from it anyway) or at least an image.
Second:
struct Pstruct{
std::vector<float> picture;
float factor;
};
This definition does not match the definition in your shader in any way. The std::vector object will just be a meta object internally managing the data storage used by the vector. memcpy that to a GL buffer and passing that to the GPU does not make sense at all.
The correct approach would be to either copy the contents of that vector separately into the appropriate places inside the buffer, or to just us a struct definition on your client side which actually matches the one you're using in the shader (and taking all the rules of std430 into account). But, as my first point already was, the correct solution here is most likely to use a texture or image object instead.

Related

Why are my Uniforms and Storage Buffers showing the wrong data in Vulkan using GLSL?

EDIT2: I found the error, the code that creates the buffer was overwriting one of the storage buffers with one of the uniform buffers that I create afterwards because of a copy paste error.
So I'm currently trying to adapt the Ray Tracing Weekend project (https://raytracing.github.io/) from a CPU program into a compute shader using Vulkan. I'm writing the compute shader using GLSL which is compiled to SPIRV.
I send the scene in the form of a struct containing arrays of structs to the GPU as a storage buffer which looks like this on the CPU (world_gpu being the storage buffer):
struct sphere_gpu
{
point3 centre;
float radius;
};
struct material_gpu
{
vec3 albedo;
float refraction_index;
float fuzz;
uint32_t material_type;
};
struct world_gpu
{
sphere_gpu spheres[484];
material_gpu materials[484];
uint32_t size;
};
and this on the GPU:
// Struct definitions to mirror the CPU representation
struct sphere{
vec4 centre;
float radius;
};
struct material{
vec4 albedo;
float refraction_index;
float fuzz;
uint material_type;
};
// Input scene
layout(std430, binding = 0) buffer world{
sphere[MAX_SPHERES] spheres;
material[MAX_SPHERES] materials;
uint size;
} wrld;
I've already fixed the alignment problem for vec3 on the CPU side by using alignas(16) for my vec3 type: class alignas (16) vec3, and changing the type on the GPU representation to be vec4s as shown above to match the alignment of the data I'm sending over
However, whilst testing this I only seem to be able to read 0s for the spheres when I inspect the data after the compute shader has finished running (I've hijacked my output pixel array in the shader which I write debug data to so that I can read it and debug certain things).
Is there anything obviously stupid that I'm doing here, aside from being a Vulkan noob in general?
EDIT:
Here's my buffer uploading code. set_manual_buffer_data is where the data is actually copied to the buffer, create_manual_buffer is where the buffer and memory itself are created.
template <typename T>
void set_manual_buffer_data(vk::Device device, vk::Buffer& buffer, vk::DeviceMemory& buffer_memory, T* elements, uint32_t num_elements,uint32_t element_size)
{
uint32_t size = element_size * num_elements;
// Get a pointer to the device memory
void* buffer_ptr = device.mapMemory(buffer_memory, 0, element_size * num_elements);
// Copy data to buffer
memcpy(buffer_ptr, elements, element_size * num_elements);
device.unmapMemory(buffer_memory);
}
// call with physical_device.getMemoryProperties() for second argument
void create_manual_buffer(vk::Device device, vk::PhysicalDeviceMemoryProperties memory_properties, uint32_t queue_family_index, const uint32_t buffer_size, vk::BufferUsageFlagBits buffer_usage, vk::Buffer& buffer, vk::DeviceMemory& buffer_memory)
{
vk::BufferCreateInfo buffer_create_info{};
buffer_create_info.flags = vk::BufferCreateFlags();
buffer_create_info.size = buffer_size;
buffer_create_info.usage = buffer_usage; // Play with this
buffer_create_info.sharingMode = vk::SharingMode::eExclusive; //concurrent or exclusive
buffer_create_info.pQueueFamilyIndices = &queue_family_index;
buffer_create_info.queueFamilyIndexCount = 1;
buffer = device.createBuffer(buffer_create_info);
vk::MemoryRequirements memory_requirements = device.getBufferMemoryRequirements(buffer);
uint32_t memory_type_index = static_cast<uint32_t>(~0);
vk::DeviceSize memory_heap_size = static_cast<uint32_t>(~0);
for (uint32_t current_memory_type_index = 0; current_memory_type_index < memory_properties.memoryTypeCount; ++current_memory_type_index)
{
// search for desired memory type from the device memory
vk::MemoryType MemoryType = memory_properties.memoryTypes[current_memory_type_index];
if ((vk::MemoryPropertyFlagBits::eHostVisible & MemoryType.propertyFlags) &&
(vk::MemoryPropertyFlagBits::eHostCoherent & MemoryType.propertyFlags))
{
memory_heap_size = memory_properties.memoryHeaps[MemoryType.heapIndex].size;
memory_type_index = current_memory_type_index;
break;
}
}
// Create device memory
vk::MemoryAllocateInfo buffer_allocate_info(memory_requirements.size, memory_type_index);
buffer_memory = device.allocateMemory(buffer_allocate_info);
device.bindBufferMemory(buffer, buffer_memory, 0);
}
This code is then called here (I haven't got to the refactoring stage yet so please forgive the spaghetti):
std::vector<vk::Buffer> uniform_buffers;
std::vector<vk::DeviceMemory> uniform_buffers_memory;
std::vector<vk::Buffer> storage_buffers;
std::vector<vk::DeviceMemory> storage_buffers_memory;
void run_compute(Vulkan_Wrapper &vulkan, Vulkan_Compute &compute, world_gpu *world, color* image, uint32_t image_size, image_info img_info, camera_gpu camera_gpu)
{
vulkan.init();
uniform_buffers.resize(2);
uniform_buffers_memory.resize(2);
storage_buffers.resize(2);
storage_buffers_memory.resize(2);
vulkan.create_manual_buffer(vulkan.m_device, vulkan.m_physical_device.getMemoryProperties(),
vulkan.m_queue_family_index, sizeof(world_gpu),
vk::BufferUsageFlagBits::eStorageBuffer, storage_buffers[0],
storage_buffers_memory[0]);
vulkan.create_manual_buffer(vulkan.m_device, vulkan.m_physical_device.getMemoryProperties(),
vulkan.m_queue_family_index, image_size * sizeof(color),
vk::BufferUsageFlagBits::eStorageBuffer, storage_buffers[1],
storage_buffers_memory[1]);
vulkan.set_manual_buffer_data(vulkan.m_device, storage_buffers[0], storage_buffers_memory[0], world, 1, sizeof(world_gpu));
vulkan.set_manual_buffer_data(vulkan.m_device, storage_buffers[1], storage_buffers_memory[1], image, image_size, sizeof(color));
vulkan.create_manual_buffer(vulkan.m_device, vulkan.m_physical_device.getMemoryProperties(),
vulkan.m_queue_family_index, sizeof(image_info),
vk::BufferUsageFlagBits::eUniformBuffer, storage_buffers[0],
uniform_buffers_memory[0]);
vulkan.create_manual_buffer(vulkan.m_device, vulkan.m_physical_device.getMemoryProperties(),
vulkan.m_queue_family_index, sizeof(camera_gpu),
vk::BufferUsageFlagBits::eUniformBuffer, uniform_buffers[1],
uniform_buffers_memory[1]);
vulkan.set_manual_buffer_data(vulkan.m_device, uniform_buffers[0], uniform_buffers_memory[0], &img_info, 1, sizeof(img_info));
vulkan.set_manual_buffer_data(vulkan.m_device, uniform_buffers[1], uniform_buffers_memory[1], &camera_gpu, 1, sizeof(camera_gpu));
// Run pipeline etc
I should note that it works perfectly fine when I check the values stored in the image storage buffer (storage_buffers_memory[1]), it's the other 3 that is giving me difficulties

Why do I got junk data, when sending a structure to shader storage buffer?

I am working on an opengl based raytracer application in c++. I would like to send data from the cpu side to the fragment shader. The data is a bounding volume hierarchies (BVH) tree.
Unfortunately on the shader side I got junk data, when I am trying to write out the coordinates as color with gl_fragcolor. I don't know why. Any help is appreciated.
I have this structure on CPU side:
struct FlatBvhNode {
glm::vec4 min;
glm::vec4 max;
int order;
bool isLeaf;
bool createdEmpty;
vector<glm::vec4> indices;
FlatBvhNode() { }
FlatBvhNode(glm::vec3 min, glm::vec3 max, int ind, bool isLeaf, bool createdEmpty, vector<glm::vec3> indices) {
this->min=glm::vec4(min.x, min.y, min.z, 1.0f);
this->max=glm::vec4(max.x, max.y, max.z, 1.0f);
this->order=ind;
this->isLeaf=isLeaf;
this->createdEmpty=createdEmpty;
indices.reserve(2);
for (int i = 0; i < indices.size(); i++) {
this->indices.push_back(glm::vec4(indices.at(i).x, indices.at(i).y, indices.at(i).z, 1));
}
}
};
I would like to receive the above structure in the fragment shader as:
struct FlatBvhNode{ //base aligment aligned offset
vec3 min; // 16 byte 0
vec3 max; // 16 byte 16
int order; // 4 byte 20
bool isLeaf; // 4 byte 24
bool createdEmpty; // 4 byte 28
vec3 indices[2]; // 32 byte 32
};
I am not sure that the aligned offsets are right. I have read that vec3s are treated as 16 bytes , booleans and integers as 4 bytes. Secondly the aligned offset has to be equal or multiple of the base aligment.
I am sending the above structure with these code:
vector<BvhNode::FlatBvhNode>* nodeArrays=bvhNode.putNodeIntoArray();
unsigned int nodesArraytoSendtoShader;
glGenBuffers(1, &nodesArraytoSendtoShader);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, nodesArraytoSendtoShader);
glBufferData(GL_SHADER_STORAGE_BUFFER, nodeArrays->size() * sizeof(BvhNode::FlatBvhNode), nodeArrays, GL_STATIC_DRAW);
glBindBufferRange(GL_SHADER_STORAGE_BUFFER, 2, nodesArraytoSendtoShader, 0, nodeArrays->size() * sizeof(BvhNode::FlatBvhNode));
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
When you specify the Shader Storage Buffer Object, then you have to use the std430 qualifier.
See OpenGL 4.6 API Core Profile Specification; 7.6.2.2 Standard Uniform Block Layout.
If you want to use the following data structure in the shader:
struct FlatBvhNode
{
// base aligment aligned offset
vec4 min; // 16 byte 0
vec4 max; // 16 byte 16
int order; // 4 byte 32
int isLeaf; // 4 byte 36
int createdEmpty; // 4 byte 40
vec4 indices[2]; // 32 byte 48
};
layout(std430) buffer TNodes
{
FlatBvhNode nodes[];
}
then you have to consider, that indices is an array of type vec4. Hence indices is aligned to 16 bytes.
This corresponds to the following data structure in c ++
struct FlatBvhNode {
glm::vec4 min;
glm::vec4 max;
int order;
int isLeaf;
int createdEmpty;
int dummy; // alignment
std::array<glm::vec4, 2> indices;
}
Note, while std::vector encapsulates dynamic size arrays, std::array encapsulates fixed size arrays. std::array<glm::vec4, 2> corresponds to glm::vec4[2], but std::vector<glm::vec4> is more like glm::vec4* and some additional size information.

Find the maximum float in the array

I have a compute shader program which looks for the maximum value in the float array. it uses reduction (compare two values and save the bigger one to the output buffer).
Now I am not quite sure how to run this program from the Java code (using jogamp). In the display() method I run the program once (every time with the halved array in the input SSBO = result from previous iteration) and finish this when the array with results has only one item - the maximum.
Is this the correct method? Every time in the display() method creating and binding input and output SSBO, running the shader program and then check how many items was returned?
Java code:
FloatBuffer inBuffer = Buffers.newDirectFloatBuffer(array);
gl.glBindBuffer(GL3ES3.GL_SHADER_STORAGE_BUFFER, buffersNames.get(1));
gl.glBufferData(GL3ES3.GL_SHADER_STORAGE_BUFFER, itemsCount * Buffers.SIZEOF_FLOAT, inBuffer,
GL3ES3.GL_STREAM_DRAW);
gl.glBindBufferBase(GL3ES3.GL_SHADER_STORAGE_BUFFER, 1, buffersNames.get(1));
gl.glDispatchComputeGroupSizeARB(groupsCount, 1, 1, groupSize, 1, 1);
gl.glMemoryBarrier(GL3ES3.GL_SHADER_STORAGE_BARRIER_BIT);
ByteBuffer output = gl.glMapNamedBuffer(buffersNames.get(1), GL3ES3.GL_READ_ONLY);
Shader code:
#version 430
#extension GL_ARB_compute_variable_group_size : enable
layout (local_size_variable) in;
layout(std430, binding = 1) buffer MyData {
vec4 elements[];
} data;
void main() {
uint index = gl_GlobalInvocationID.x;
float n1 = data.elements[index].x;
float n2 = data.elements[index].y;
float n3 = data.elements[index].z;
float n4 = data.elements[index].w;
data.elements[index].x = max(max(n1, n2), max(n3, n4));
}

Shader storage block name issue

Something weird is happening with my shader storage blocks.
I have 2 SSBs:
#version 450 core
out vec4 out_color;
layout (binding = 0, std430) buffer A_SSB
{
float a_data[];
};
layout (binding = 1, std430) buffer B_SSB
{
float b_data[];
};
void main()
{
a_data[0] = 0.0f;
a_data[1] = 1.0f;
a_data[2] = 2.0f;
a_data[3] = 3.0f;
b_data[0] = 90.0f;
b_data[1] = 81.0f;
b_data[2] = 72.0f;
b_data[3] = 63.0f;
out_color = vec4(0.0f, 0.8f, 1.0f, 1.0f);
}
This is working well, but if i swap the SSB names like that:
layout (binding = 0, std430) buffer B_SSB
{
float a_data[];
};
layout (binding = 1, std430) buffer A_SSB
{
float b_data[];
};
the SSB indexes are swapped although they are hardcoded and data which should be written to a_data is written to b_data and vice versa.
Both SSBs are 250MB large, the max size is more than 2GB. It seems that the indexes are allocated alphabetical but this shouldn't happen. I'm binding the buffers like that:
glCreateBuffers(1, &a_ssb);
glNamedBufferStorage(a_ssb, 7187400 * 9 * sizeof(float), nullptr, GL_MAP_READ_BIT);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, a_ssb);
glShaderStorageBlockBinding(test_prog, 0, 0);
glCreateBuffers(1, &b_ssb);
glNamedBufferStorage(b_ssb, 7187400 * 9 * sizeof(float), nullptr, GL_MAP_READ_BIT);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, b_ssb);
glShaderStorageBlockBinding(test_prog, 1, 1);
Is this a bug or my fault? Also i would like to ask why i'm getting the error "lvalue in array access too complex or possible array index out of bounds" if i'm assigning values in a for loop?
for(unsigned int i = 0; i < 4; ++i)
a_data[i] = float(i);
glShaderStorageBlockBinding(test_prog, 0, 0);
This is your problem.
You assigned the binding index in the shader. You do not need to assign it again.
Your problem comes from the fact that you assigned it incorrectly.
The second parameter to this function is the index of the block you are assigning a binding index to. The only way to get a correct index is to query it via Program Introspection APIs. The block index is the resource index, queried through this call:
auto block_index = glGetProgramResourceIndex​(test_prog, GL_SHADER_STORAGE_BLOCK, "A_SSB");
It just so happened, in your original code, that the shader compiler assigned A_SSB's resource index to 0 and B_SSB's resource index to 1. This assignment was probably arbitrarily done based on their names. Thus, when you changed the names on them, the resource indices didn't change. So A_SSB was still resource index 0, but your shader assigned it binding index 1. Which was fine...
Until your C++ code overrode that assignment with your glShaderStorageBlockBinding(test_prog, 0, 0). That assigned resource index 0 (A_SSB) to binding index 0.
You should either set the binding index in the shader or in C++ code. Not in both.

Only first Compute Shader array element appears updated

Trying to send an array of integer to a compute shader, sets an arbitrary value to each integer and then reads back on CPU/HOST. The problem is that only the first element of my array gets updated. My array is initialized with all elements = 5 in the CPU, then I try to sets all the values to 2 in the Compute Shader:
C++ Code:
this->numOfElements = std::vector<int> numOfElements; //num of elements for each voxel
//Set the reset grid program as current program
glUseProgram(this->resetGridProgHandle);
//Binds and fill the buffer
glBindBuffer(GL_SHADER_STORAGE_BUFFER, this->counterBufferHandle);
glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(int) * numOfVoxels, this->numOfElements.data(), GL_DYNAMIC_DRAW);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, this->counterBufferHandle);
//Flag used in the buffer map function
GLint bufMask = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT;
//calc maximum size for workgroups
//glGetIntegerv(GL_MAX_COMPUTE_WORK_GROUP_SIZE, &result);
//Executes the compute shader
glDispatchCompute(32, 1, 1); //
glMemoryBarrier(GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT);
//Gets a pointer to the returned data
int* returnArray = (int *)glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_READ_WRITE);
//Free the buffer mapping
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
Shader:
#version 430
layout (local_size_x = 32) in;
layout(binding = 0) buffer SSBO{
int counter[];
};
void main(){
counter[gl_WorkGroupID.x * gl_WorkGroupSize.x + gl_LocalInvocationID.x] = 2;
}
If I print returnArray[0] it's 2 (correct), but any index > 0 gives me 5, which is the initial value initialized in host.
glMemoryBarrier(GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT);
//Gets a pointer to the returned data
int* returnArray = (int *)glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_READ_WRITE);
The bit you use for glMemoryBarrier represents the way you want to read the data written by the shader. GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT says "I'm going to read this written data by using the buffer for vertex attribute arrays". In reality, you are going to read the buffer by mapping it.
So you should use the proper barrier bit:
glMemoryBarrier(GL_BUFFER_UPDATE_BARRIER_BIT);
Ok i had this problem as well
i changed:
layout(binding = 0) buffer SSBO{
to:
layout(binding = 0, std430) buffer SSBO{