Say I have some struct MyUniform:
struct MyUniform {
/*...*/
};
and I have an array of 10 of them so on the host in C like:
MyUniform my_uniform_data[10] = /*...*/;
and I want to access all ten (as an array) from a shader.
In VkDescriptorSetLayoutBinding it has a field descriptorCount:
descriptorCount is the number of descriptors contained in the binding, accessed in a shader as
an array.
So I assume at least one way to get this array to a shader is to set descriptorCount to 10, and then I would be able in GLSL to write:
layout(set = 2, binding = 4) uniform MyUniform {
/*...*/
} my_uniform_data[10];
Now when writing the VkDescriptorSet for this I would have a single buffer with 10 * sizeof(MyUniform) bytes. So in a VkWriteDescriptorSet I would also set descriptorCount to 10, and then I would have to create an array of 10 VkDescriptorBufferInfo:
VkDescriptorBufferInfo bi[10];
for (int i = 0; i < 10; i++) {
bi[i].buffer = my_buffer;
bi[i].offset = i*sizeof(MyUniform);
bi[i].range = sizeof(MyUniform);
}
This kind of arrangement clearly accomodates where each array element can come from a different buffer and offset.
Is there a way to arrange the descriptor layout and updating such that the entire array is written with a single descriptor?
Or is the only way to update a GLSL uniform array to use multiple descriptors in this fashion?
Descriptor arrays create arrays of descriptors(for example buffers); But what you need is array of structs in a buffer:
struct myData{
/*...*/
};
layout(set = 2, binding = 4)uniform myUniform {
myData data[];
};
And remember about alignment.
Related
EDIT2: I found the error, the code that creates the buffer was overwriting one of the storage buffers with one of the uniform buffers that I create afterwards because of a copy paste error.
So I'm currently trying to adapt the Ray Tracing Weekend project (https://raytracing.github.io/) from a CPU program into a compute shader using Vulkan. I'm writing the compute shader using GLSL which is compiled to SPIRV.
I send the scene in the form of a struct containing arrays of structs to the GPU as a storage buffer which looks like this on the CPU (world_gpu being the storage buffer):
struct sphere_gpu
{
point3 centre;
float radius;
};
struct material_gpu
{
vec3 albedo;
float refraction_index;
float fuzz;
uint32_t material_type;
};
struct world_gpu
{
sphere_gpu spheres[484];
material_gpu materials[484];
uint32_t size;
};
and this on the GPU:
// Struct definitions to mirror the CPU representation
struct sphere{
vec4 centre;
float radius;
};
struct material{
vec4 albedo;
float refraction_index;
float fuzz;
uint material_type;
};
// Input scene
layout(std430, binding = 0) buffer world{
sphere[MAX_SPHERES] spheres;
material[MAX_SPHERES] materials;
uint size;
} wrld;
I've already fixed the alignment problem for vec3 on the CPU side by using alignas(16) for my vec3 type: class alignas (16) vec3, and changing the type on the GPU representation to be vec4s as shown above to match the alignment of the data I'm sending over
However, whilst testing this I only seem to be able to read 0s for the spheres when I inspect the data after the compute shader has finished running (I've hijacked my output pixel array in the shader which I write debug data to so that I can read it and debug certain things).
Is there anything obviously stupid that I'm doing here, aside from being a Vulkan noob in general?
EDIT:
Here's my buffer uploading code. set_manual_buffer_data is where the data is actually copied to the buffer, create_manual_buffer is where the buffer and memory itself are created.
template <typename T>
void set_manual_buffer_data(vk::Device device, vk::Buffer& buffer, vk::DeviceMemory& buffer_memory, T* elements, uint32_t num_elements,uint32_t element_size)
{
uint32_t size = element_size * num_elements;
// Get a pointer to the device memory
void* buffer_ptr = device.mapMemory(buffer_memory, 0, element_size * num_elements);
// Copy data to buffer
memcpy(buffer_ptr, elements, element_size * num_elements);
device.unmapMemory(buffer_memory);
}
// call with physical_device.getMemoryProperties() for second argument
void create_manual_buffer(vk::Device device, vk::PhysicalDeviceMemoryProperties memory_properties, uint32_t queue_family_index, const uint32_t buffer_size, vk::BufferUsageFlagBits buffer_usage, vk::Buffer& buffer, vk::DeviceMemory& buffer_memory)
{
vk::BufferCreateInfo buffer_create_info{};
buffer_create_info.flags = vk::BufferCreateFlags();
buffer_create_info.size = buffer_size;
buffer_create_info.usage = buffer_usage; // Play with this
buffer_create_info.sharingMode = vk::SharingMode::eExclusive; //concurrent or exclusive
buffer_create_info.pQueueFamilyIndices = &queue_family_index;
buffer_create_info.queueFamilyIndexCount = 1;
buffer = device.createBuffer(buffer_create_info);
vk::MemoryRequirements memory_requirements = device.getBufferMemoryRequirements(buffer);
uint32_t memory_type_index = static_cast<uint32_t>(~0);
vk::DeviceSize memory_heap_size = static_cast<uint32_t>(~0);
for (uint32_t current_memory_type_index = 0; current_memory_type_index < memory_properties.memoryTypeCount; ++current_memory_type_index)
{
// search for desired memory type from the device memory
vk::MemoryType MemoryType = memory_properties.memoryTypes[current_memory_type_index];
if ((vk::MemoryPropertyFlagBits::eHostVisible & MemoryType.propertyFlags) &&
(vk::MemoryPropertyFlagBits::eHostCoherent & MemoryType.propertyFlags))
{
memory_heap_size = memory_properties.memoryHeaps[MemoryType.heapIndex].size;
memory_type_index = current_memory_type_index;
break;
}
}
// Create device memory
vk::MemoryAllocateInfo buffer_allocate_info(memory_requirements.size, memory_type_index);
buffer_memory = device.allocateMemory(buffer_allocate_info);
device.bindBufferMemory(buffer, buffer_memory, 0);
}
This code is then called here (I haven't got to the refactoring stage yet so please forgive the spaghetti):
std::vector<vk::Buffer> uniform_buffers;
std::vector<vk::DeviceMemory> uniform_buffers_memory;
std::vector<vk::Buffer> storage_buffers;
std::vector<vk::DeviceMemory> storage_buffers_memory;
void run_compute(Vulkan_Wrapper &vulkan, Vulkan_Compute &compute, world_gpu *world, color* image, uint32_t image_size, image_info img_info, camera_gpu camera_gpu)
{
vulkan.init();
uniform_buffers.resize(2);
uniform_buffers_memory.resize(2);
storage_buffers.resize(2);
storage_buffers_memory.resize(2);
vulkan.create_manual_buffer(vulkan.m_device, vulkan.m_physical_device.getMemoryProperties(),
vulkan.m_queue_family_index, sizeof(world_gpu),
vk::BufferUsageFlagBits::eStorageBuffer, storage_buffers[0],
storage_buffers_memory[0]);
vulkan.create_manual_buffer(vulkan.m_device, vulkan.m_physical_device.getMemoryProperties(),
vulkan.m_queue_family_index, image_size * sizeof(color),
vk::BufferUsageFlagBits::eStorageBuffer, storage_buffers[1],
storage_buffers_memory[1]);
vulkan.set_manual_buffer_data(vulkan.m_device, storage_buffers[0], storage_buffers_memory[0], world, 1, sizeof(world_gpu));
vulkan.set_manual_buffer_data(vulkan.m_device, storage_buffers[1], storage_buffers_memory[1], image, image_size, sizeof(color));
vulkan.create_manual_buffer(vulkan.m_device, vulkan.m_physical_device.getMemoryProperties(),
vulkan.m_queue_family_index, sizeof(image_info),
vk::BufferUsageFlagBits::eUniformBuffer, storage_buffers[0],
uniform_buffers_memory[0]);
vulkan.create_manual_buffer(vulkan.m_device, vulkan.m_physical_device.getMemoryProperties(),
vulkan.m_queue_family_index, sizeof(camera_gpu),
vk::BufferUsageFlagBits::eUniformBuffer, uniform_buffers[1],
uniform_buffers_memory[1]);
vulkan.set_manual_buffer_data(vulkan.m_device, uniform_buffers[0], uniform_buffers_memory[0], &img_info, 1, sizeof(img_info));
vulkan.set_manual_buffer_data(vulkan.m_device, uniform_buffers[1], uniform_buffers_memory[1], &camera_gpu, 1, sizeof(camera_gpu));
// Run pipeline etc
I should note that it works perfectly fine when I check the values stored in the image storage buffer (storage_buffers_memory[1]), it's the other 3 that is giving me difficulties
I have a shared layout uniform block in shader:
layout(shared) uniform TestBlock
{
int test[5];
};
How to get offset of test[3]?
When I try to use glGetUniformIndices to get index of test[3], it will return the same number of test[0]'s index.
So I cannot use glGetActiveUniformsiv to get offset of index of test[3].
Then, how to get offset of test[3]?
(Note that I don't want to use layout std140.)
Arrays of basic types like ints are treated as a single value. You can't get the offset of an individual element in the array. You can however query the array stride, the number of bytes from one element in the array to the next. Then you can just do the multiplication.
Using the new program introspection API:
auto ix = glGetProgramResourceIndex(prog, GL_UNIFORM, "TestBlock.test");
GLenum props[] = {GL_ARRAY_STRIDE, GL_OFFSET};
GLint values[2] = {};
glGetProgramResourceiv(prog, GL_UNIFORM, ix, 2, &props, 2, NULL, &values);
auto byteOffset = values[1] + (3 * values[0]);
So I've read about Storage Buffers being able to contain a variable length array at end:
SSBOs can have variable storage, up to whatever buffer range was bound for that particular buffer; UBOs must have a specific, fixed storage size. This means that you can have an array of arbitrary length in an SSBO (at the end, rather). The actual size of the array, based on the range of the buffer bound, can be queried at runtime in the shader using the length function on the unbounded array variable
I know how to pass a storage buffer as a struct with simple fields like this:
struct GPUStorage {
glm::vec3 mesh_pos;
glm::vec4 mesh_rot;
};
And know how to pass a storage buffer as an array of structs by shoving them into a vector and do memcpy on vector.data() with copy length as sizeof(GPUStorage) * vector.size().
But I haven't found anywhere how does the C++ syntax look for a struct containing a variable length array ?
struct GPUMesh {
glm::vec3 mesh_pos;
glm::vec4 mesh_rot;
};
struct GPUStorage {
??? // variable length array of GPUMesh
};
You have tricked yourself into thinking of things in a very limited way. Namely, that the only way to use a resource buffer (UBO/SSBO) from C++ is to define a C++ object type whose layout matches that of the resource. It's not.
The layout of a buffer-backed interface block defines how GLSL will interpret the bytes of data provided by the buffer. How those bytes get in those positions is entirely up to you.
A storage block defined as:
layout(binding = 0, std430) buffer Data
{
vec2 first;
vec4 array[];
};
The layout of this data is such that the first 8 bytes represent two floating-point values. Following this is 8 bytes that are skipped, followed by some multiple of 16 bytes with each 16-byte datum representing 4 floating-point values.
How you create that in the buffer object is up to you. You could do this:
std::size_t numArrayEntries = <get number of array entries>;
glNamedBufferStorage(buff, 16 + (16 * numArrayEntries), nullptr, GL_DYNAMIC_STORAGE_BIT);
glm::vec2 first = <get first>;
glNamedBufferSubData(buff, 0, 8, glm::value_ptr(first));
for(std::size_t entry = 0; entry < numArrayEntries; ++entry)
{
glm::vec4 entryValue = <get actual entry value>;
glNamedBufferSubData(buff, 16, 16, glm::value_ptr(entryValue));
}
That's obviously heavy-handed and involving a lot of uploading, but it is serviceable. You could also get the data there via mapping the buffer:
std::size_t numArrayEntries = <get number of array entries>;
std::size_t buffSize = 16 + (16 * numArrayEntries)
glNamedBufferStorage(buff, buffSize, nullptr, GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT);
unsigned char *data = (unsigned char*)glMapNamedBufferRange(buff, 0, buffSize, GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT);
glm::vec2 first = <get first>;
memcpy(data, glm::value_ptr(first), 8);
for(std::size_t entry = 0; entry < numArrayEntries; ++entry)
{
data += 16
glm::vec4 entryValue = <get actual entry value>;
memcpy(data, glm::value_ptr(entryValue));
}
Or by building a temporary memory buffer first and doing the copy when you create the storage. Or any other technique.
What will return the length function if buffer bound to 0 SSBO binding point is of size=36 (not divisible by size of uvec4 = 16)? And what's the rule?..
#version 430 core
layout(local_size_x=256) in;
layout(std430, binding=0) buffer B { uvec4 data[]; };
void main() {
uint s = data.length();
//some other code...
}
For a shader storage block, the length() method, on the unsized (run-time sized) array as its last member, will return an value of type int, calculated by the following formula:
max((buffer_object_size - offset_of_array) / stride_of_array, 0)
This means if a buffer with a size of 36 bytes is bound to the following shader storage block
layout(std430, binding=0) buffer B { uvec4 data[]; };
then data.length() will return 2.
buffer_object_size = 36
offset_of_array = 0
stride_of_array = 16
max((36 - 0) / 16, 0) = 2
See ARB_shader_storage_buffer_object; Issue (19) (far at the end of the document):
In this expression, we allow unsized arrays at the end of shader storage blocks, and allow the ".length()" method to be used to determine the size of such arrays based on the size of the provided buffer object.
The derived array size can be derived by reversing the process described in issue (16):
array.length() =
max((buffer_object_size - offset_of_array) / stride_of_array, 0)
Trying to send an array of integer to a compute shader, sets an arbitrary value to each integer and then reads back on CPU/HOST. The problem is that only the first element of my array gets updated. My array is initialized with all elements = 5 in the CPU, then I try to sets all the values to 2 in the Compute Shader:
C++ Code:
this->numOfElements = std::vector<int> numOfElements; //num of elements for each voxel
//Set the reset grid program as current program
glUseProgram(this->resetGridProgHandle);
//Binds and fill the buffer
glBindBuffer(GL_SHADER_STORAGE_BUFFER, this->counterBufferHandle);
glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(int) * numOfVoxels, this->numOfElements.data(), GL_DYNAMIC_DRAW);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, this->counterBufferHandle);
//Flag used in the buffer map function
GLint bufMask = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT;
//calc maximum size for workgroups
//glGetIntegerv(GL_MAX_COMPUTE_WORK_GROUP_SIZE, &result);
//Executes the compute shader
glDispatchCompute(32, 1, 1); //
glMemoryBarrier(GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT);
//Gets a pointer to the returned data
int* returnArray = (int *)glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_READ_WRITE);
//Free the buffer mapping
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
Shader:
#version 430
layout (local_size_x = 32) in;
layout(binding = 0) buffer SSBO{
int counter[];
};
void main(){
counter[gl_WorkGroupID.x * gl_WorkGroupSize.x + gl_LocalInvocationID.x] = 2;
}
If I print returnArray[0] it's 2 (correct), but any index > 0 gives me 5, which is the initial value initialized in host.
glMemoryBarrier(GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT);
//Gets a pointer to the returned data
int* returnArray = (int *)glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_READ_WRITE);
The bit you use for glMemoryBarrier represents the way you want to read the data written by the shader. GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT says "I'm going to read this written data by using the buffer for vertex attribute arrays". In reality, you are going to read the buffer by mapping it.
So you should use the proper barrier bit:
glMemoryBarrier(GL_BUFFER_UPDATE_BARRIER_BIT);
Ok i had this problem as well
i changed:
layout(binding = 0) buffer SSBO{
to:
layout(binding = 0, std430) buffer SSBO{