Why are my Uniforms and Storage Buffers showing the wrong data in Vulkan using GLSL? - c++

EDIT2: I found the error, the code that creates the buffer was overwriting one of the storage buffers with one of the uniform buffers that I create afterwards because of a copy paste error.
So I'm currently trying to adapt the Ray Tracing Weekend project (https://raytracing.github.io/) from a CPU program into a compute shader using Vulkan. I'm writing the compute shader using GLSL which is compiled to SPIRV.
I send the scene in the form of a struct containing arrays of structs to the GPU as a storage buffer which looks like this on the CPU (world_gpu being the storage buffer):
struct sphere_gpu
{
point3 centre;
float radius;
};
struct material_gpu
{
vec3 albedo;
float refraction_index;
float fuzz;
uint32_t material_type;
};
struct world_gpu
{
sphere_gpu spheres[484];
material_gpu materials[484];
uint32_t size;
};
and this on the GPU:
// Struct definitions to mirror the CPU representation
struct sphere{
vec4 centre;
float radius;
};
struct material{
vec4 albedo;
float refraction_index;
float fuzz;
uint material_type;
};
// Input scene
layout(std430, binding = 0) buffer world{
sphere[MAX_SPHERES] spheres;
material[MAX_SPHERES] materials;
uint size;
} wrld;
I've already fixed the alignment problem for vec3 on the CPU side by using alignas(16) for my vec3 type: class alignas (16) vec3, and changing the type on the GPU representation to be vec4s as shown above to match the alignment of the data I'm sending over
However, whilst testing this I only seem to be able to read 0s for the spheres when I inspect the data after the compute shader has finished running (I've hijacked my output pixel array in the shader which I write debug data to so that I can read it and debug certain things).
Is there anything obviously stupid that I'm doing here, aside from being a Vulkan noob in general?
EDIT:
Here's my buffer uploading code. set_manual_buffer_data is where the data is actually copied to the buffer, create_manual_buffer is where the buffer and memory itself are created.
template <typename T>
void set_manual_buffer_data(vk::Device device, vk::Buffer& buffer, vk::DeviceMemory& buffer_memory, T* elements, uint32_t num_elements,uint32_t element_size)
{
uint32_t size = element_size * num_elements;
// Get a pointer to the device memory
void* buffer_ptr = device.mapMemory(buffer_memory, 0, element_size * num_elements);
// Copy data to buffer
memcpy(buffer_ptr, elements, element_size * num_elements);
device.unmapMemory(buffer_memory);
}
// call with physical_device.getMemoryProperties() for second argument
void create_manual_buffer(vk::Device device, vk::PhysicalDeviceMemoryProperties memory_properties, uint32_t queue_family_index, const uint32_t buffer_size, vk::BufferUsageFlagBits buffer_usage, vk::Buffer& buffer, vk::DeviceMemory& buffer_memory)
{
vk::BufferCreateInfo buffer_create_info{};
buffer_create_info.flags = vk::BufferCreateFlags();
buffer_create_info.size = buffer_size;
buffer_create_info.usage = buffer_usage; // Play with this
buffer_create_info.sharingMode = vk::SharingMode::eExclusive; //concurrent or exclusive
buffer_create_info.pQueueFamilyIndices = &queue_family_index;
buffer_create_info.queueFamilyIndexCount = 1;
buffer = device.createBuffer(buffer_create_info);
vk::MemoryRequirements memory_requirements = device.getBufferMemoryRequirements(buffer);
uint32_t memory_type_index = static_cast<uint32_t>(~0);
vk::DeviceSize memory_heap_size = static_cast<uint32_t>(~0);
for (uint32_t current_memory_type_index = 0; current_memory_type_index < memory_properties.memoryTypeCount; ++current_memory_type_index)
{
// search for desired memory type from the device memory
vk::MemoryType MemoryType = memory_properties.memoryTypes[current_memory_type_index];
if ((vk::MemoryPropertyFlagBits::eHostVisible & MemoryType.propertyFlags) &&
(vk::MemoryPropertyFlagBits::eHostCoherent & MemoryType.propertyFlags))
{
memory_heap_size = memory_properties.memoryHeaps[MemoryType.heapIndex].size;
memory_type_index = current_memory_type_index;
break;
}
}
// Create device memory
vk::MemoryAllocateInfo buffer_allocate_info(memory_requirements.size, memory_type_index);
buffer_memory = device.allocateMemory(buffer_allocate_info);
device.bindBufferMemory(buffer, buffer_memory, 0);
}
This code is then called here (I haven't got to the refactoring stage yet so please forgive the spaghetti):
std::vector<vk::Buffer> uniform_buffers;
std::vector<vk::DeviceMemory> uniform_buffers_memory;
std::vector<vk::Buffer> storage_buffers;
std::vector<vk::DeviceMemory> storage_buffers_memory;
void run_compute(Vulkan_Wrapper &vulkan, Vulkan_Compute &compute, world_gpu *world, color* image, uint32_t image_size, image_info img_info, camera_gpu camera_gpu)
{
vulkan.init();
uniform_buffers.resize(2);
uniform_buffers_memory.resize(2);
storage_buffers.resize(2);
storage_buffers_memory.resize(2);
vulkan.create_manual_buffer(vulkan.m_device, vulkan.m_physical_device.getMemoryProperties(),
vulkan.m_queue_family_index, sizeof(world_gpu),
vk::BufferUsageFlagBits::eStorageBuffer, storage_buffers[0],
storage_buffers_memory[0]);
vulkan.create_manual_buffer(vulkan.m_device, vulkan.m_physical_device.getMemoryProperties(),
vulkan.m_queue_family_index, image_size * sizeof(color),
vk::BufferUsageFlagBits::eStorageBuffer, storage_buffers[1],
storage_buffers_memory[1]);
vulkan.set_manual_buffer_data(vulkan.m_device, storage_buffers[0], storage_buffers_memory[0], world, 1, sizeof(world_gpu));
vulkan.set_manual_buffer_data(vulkan.m_device, storage_buffers[1], storage_buffers_memory[1], image, image_size, sizeof(color));
vulkan.create_manual_buffer(vulkan.m_device, vulkan.m_physical_device.getMemoryProperties(),
vulkan.m_queue_family_index, sizeof(image_info),
vk::BufferUsageFlagBits::eUniformBuffer, storage_buffers[0],
uniform_buffers_memory[0]);
vulkan.create_manual_buffer(vulkan.m_device, vulkan.m_physical_device.getMemoryProperties(),
vulkan.m_queue_family_index, sizeof(camera_gpu),
vk::BufferUsageFlagBits::eUniformBuffer, uniform_buffers[1],
uniform_buffers_memory[1]);
vulkan.set_manual_buffer_data(vulkan.m_device, uniform_buffers[0], uniform_buffers_memory[0], &img_info, 1, sizeof(img_info));
vulkan.set_manual_buffer_data(vulkan.m_device, uniform_buffers[1], uniform_buffers_memory[1], &camera_gpu, 1, sizeof(camera_gpu));
// Run pipeline etc
I should note that it works perfectly fine when I check the values stored in the image storage buffer (storage_buffers_memory[1]), it's the other 3 that is giving me difficulties

Related

Array of uniform buffer(s) in Vulkan/GLSL?

Say I have some struct MyUniform:
struct MyUniform {
/*...*/
};
and I have an array of 10 of them so on the host in C like:
MyUniform my_uniform_data[10] = /*...*/;
and I want to access all ten (as an array) from a shader.
In VkDescriptorSetLayoutBinding it has a field descriptorCount:
descriptorCount is the number of descriptors contained in the binding, accessed in a shader as
an array.
So I assume at least one way to get this array to a shader is to set descriptorCount to 10, and then I would be able in GLSL to write:
layout(set = 2, binding = 4) uniform MyUniform {
/*...*/
} my_uniform_data[10];
Now when writing the VkDescriptorSet for this I would have a single buffer with 10 * sizeof(MyUniform) bytes. So in a VkWriteDescriptorSet I would also set descriptorCount to 10, and then I would have to create an array of 10 VkDescriptorBufferInfo:
VkDescriptorBufferInfo bi[10];
for (int i = 0; i < 10; i++) {
bi[i].buffer = my_buffer;
bi[i].offset = i*sizeof(MyUniform);
bi[i].range = sizeof(MyUniform);
}
This kind of arrangement clearly accomodates where each array element can come from a different buffer and offset.
Is there a way to arrange the descriptor layout and updating such that the entire array is written with a single descriptor?
Or is the only way to update a GLSL uniform array to use multiple descriptors in this fashion?
Descriptor arrays create arrays of descriptors(for example buffers); But what you need is array of structs in a buffer:
struct myData{
/*...*/
};
layout(set = 2, binding = 4)uniform myUniform {
myData data[];
};
And remember about alignment.

Update SSBO in Compute shader

I am currently trying to update a SSBO linked/bound to a Computeshader. Doing it this way, I only write the first 32byte into the out_picture, because i only memcpy that many (sizeof(pstruct)).
Computeshader:
#version 440 core
struct Pstruct{
float picture[1920*1080*3];
float factor;
};
layout(std430, binding = 0) buffer Result{
float out_picture[];
};
layout(std430, binding = 1) buffer In_p1{
Pstruct in_p1;
};
layout(local_size_x = 1000) in;
void main() {
out_picture[gl_GlobalInvocationID.x] = out_picture[gl_GlobalInvocationID.x] +
in_p1.picture[gl_GlobalInvocationID.x] * in_p1.factor;
}
GLSL:
struct Pstruct{
std::vector<float> picture;
float factor;
};
Pstruct tmp;
tmp.factor = 1.0f;
for(int i = 0; i < getNUM_PIX(); i++){
tmp.picture.push_back(5.0f);
}
SSBO ssbo;
glGenBuffers(1, &ssbo.handle);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, ssbo.handle);
glBufferData(GL_SHADER_STORAGE_BUFFER, (getNUM_PIX() + 1) * sizeof(float), NULL, GL_DYNAMIC_DRAW);
...
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo.handle);
Pstruct* ptr = (Pstruct *) glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_WRITE_ONLY);
memcpy(ptr, &pstruct, sizeof(pstruct));
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
...
glUseProgram(program);
glDispatchCompute(getNUM_PIX() / getWORK_GROUP_SIZE(), 1, 1);
glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);
How can I copy both my picture array and my float factor at the same time?
Do I have to split the memcpy call into array and float? and if yes how? I can copy the first part, but I am not allowed to add an offset to the ptr.
First of all,
float picture[1920*1080*3];
clearly should be either a texture (you're only reading from it anyway) or at least an image.
Second:
struct Pstruct{
std::vector<float> picture;
float factor;
};
This definition does not match the definition in your shader in any way. The std::vector object will just be a meta object internally managing the data storage used by the vector. memcpy that to a GL buffer and passing that to the GPU does not make sense at all.
The correct approach would be to either copy the contents of that vector separately into the appropriate places inside the buffer, or to just us a struct definition on your client side which actually matches the one you're using in the shader (and taking all the rules of std430 into account). But, as my first point already was, the correct solution here is most likely to use a texture or image object instead.

Metal Prevented Device Address Mode

I am creating a graphics application that uses Metal to render everything. When I did a frame debug under pipeline statistics for all of my draw calls there is a !! priority alert titled "Prevented Device Address Mode Load" with the details:
Indexing using unsigned int for offset prevents addressing calculation in device. To prevent this extra ALU operation use int for offset.
So for my simplest draw call that involves this here is what is going on. There is a large amount of vertex data followed by an index buffer. The index buffer is created and filled at the start and is then constant from then on. The vertex data is constantly all changing.
I have the following types:
struct Vertex {
float3 data;
};
typedef int32_t indexType;
Then the following draw call
[encoder drawIndexedPrimitives:MTLPrimitiveTypeTriangle indexCount:/*int here*/ indexType:MTLIndexTypeUInt32 indexBuffer:indexBuffer indexBufferOffset:0];
Which goes to the following vertex function
vertex VertexOutTC vertex_fun(constant Vertex * vertexBuffer [[ buffer(0) ]],
indexType vid [[ vertex_id ]],
constant matrix_float3x3* matrix [[buffer(1)]]) {
const float2 coords[] = {float2(-1, -1), float2(-1, 1), float2(1, -1), float2(1, 1)};
CircleVertex vert = vertexBuffer[vid];
VertexOutTC out;
out.position = float4((*matrix * float3(vert.data.x, vert.data.y, 1.0)).xy, ((float)((int)vid/4))/10000.0, 1.0);
out.color = HSVtoRGB(vert.data.z, 1.0, 1.0);
out.tc = coords[vid % 4];
return out;
}
I am very confused what exactly I am doing wrong here. The error would seem to suggest I shouldnt use an unsigned type for the offset which I am guessing is the index buffer.
The thing is is ultimately for the index buffer there is only MTLIndexTypeUInt32 and MTLIndexTypeUInt16 both of which are unsigned. Furthermore if I try to use a raw int as the type the shader wont compile. What is going on here?
In Table 5.1 of the Metal Shading Language Specification, they list the "Corresponding Data Type" for vertex_id as ushort or uint. (There are similar tables in that document for all the rest of the types, my examples will use thread_position_in_grid which is the same).
Meanwhile, the hardware prefers signed types for addressing. So if you do
kernel void test(uint position [[thread_position_in_grid]], device float *test) {
test[position] = position;
test[position + 1] = position;
test[position + 2] = position;
}
we are indexing test by an unsigned integer. Debugging this shader we can see that it involves 23 instructions, and has the "Prevented Device Mode Store" warning:
If we convert to int instead, this uses only 18 instructions:
kernel void test(uint position [[thread_position_in_grid]], device float *test) {
test[(int)position] = position;
test[(int)position + 1] = position;
test[(int)position + 2] = position;
}
However, not all uint can fit into int, so this optimization only works for half the range of uint. Still, that's many usecases.
What about ushort? Well,
kernel void test(ushort position [[thread_position_in_grid]], device float *test) {
test[position] = position;
test[position + 1] = position;
test[position + 2] = position;
}
This version is only 17 instructions. We are also "warned" about using unsigned indexing here, even though it is faster than the signed versions above. This suggests to me the warning is not especially well-designed and requires significant interpretation.
kernel void test(ushort position [[thread_position_in_grid]], device float *test) {
short p = position;
test[p] = position;
test[p + 1] = position;
test[p + 2] = position;
}
This is the signed version of short, and fixes the warning, but is also 17 instructions. So it makes Xcode happier, but I'm not sure it's actually better.
Finally, here's the case I was in. My position ranges above signed short, but below unsigned short. Does it make sense to promote short to int for the indexing?
kernel void test(ushort position [[thread_position_in_grid]], device float *test) {
int p = position;
test[p] = position;
test[p + 1] = position;
test[p + 2] = position;
}
This is also 17 instructions, and generates the device store warning. I believe the compiler proves ushort fits into int, and ignores the conversion. This "unsigned" arithmetic then produces a warning telling me to use int, even though that's exactly what I did.
In summary, these warnings are a bit naive, and should really be confirmed or refuted through on-device testing.

Find the maximum float in the array

I have a compute shader program which looks for the maximum value in the float array. it uses reduction (compare two values and save the bigger one to the output buffer).
Now I am not quite sure how to run this program from the Java code (using jogamp). In the display() method I run the program once (every time with the halved array in the input SSBO = result from previous iteration) and finish this when the array with results has only one item - the maximum.
Is this the correct method? Every time in the display() method creating and binding input and output SSBO, running the shader program and then check how many items was returned?
Java code:
FloatBuffer inBuffer = Buffers.newDirectFloatBuffer(array);
gl.glBindBuffer(GL3ES3.GL_SHADER_STORAGE_BUFFER, buffersNames.get(1));
gl.glBufferData(GL3ES3.GL_SHADER_STORAGE_BUFFER, itemsCount * Buffers.SIZEOF_FLOAT, inBuffer,
GL3ES3.GL_STREAM_DRAW);
gl.glBindBufferBase(GL3ES3.GL_SHADER_STORAGE_BUFFER, 1, buffersNames.get(1));
gl.glDispatchComputeGroupSizeARB(groupsCount, 1, 1, groupSize, 1, 1);
gl.glMemoryBarrier(GL3ES3.GL_SHADER_STORAGE_BARRIER_BIT);
ByteBuffer output = gl.glMapNamedBuffer(buffersNames.get(1), GL3ES3.GL_READ_ONLY);
Shader code:
#version 430
#extension GL_ARB_compute_variable_group_size : enable
layout (local_size_variable) in;
layout(std430, binding = 1) buffer MyData {
vec4 elements[];
} data;
void main() {
uint index = gl_GlobalInvocationID.x;
float n1 = data.elements[index].x;
float n2 = data.elements[index].y;
float n3 = data.elements[index].z;
float n4 = data.elements[index].w;
data.elements[index].x = max(max(n1, n2), max(n3, n4));
}

How to get all vertex cordinates from DirectXTK (ToolKit) DirectX::Model class to use for collision detection

I'm, doing some basic rendering with DirectXToolKit and I would like to be able to get the vertex coordinates for each model in order to compute collisions between models.
currently, I have some test code to load the model, but the ID3D11Buffer loads internally using CreateFromSDKMESH
void Model3D::LoadSDKMESH(ID3D11Device* p_device, ID3D11DeviceContext* device_context, const wchar_t* file_mesh)
{
mAlpha = 1.0f;
mTint = DirectX::Colors::White.v;
mStates.reset(new DirectX::CommonStates(p_device));
auto fx = new DirectX::EffectFactory(p_device);
fx->SetDirectory(L"media");
mFxFactory.reset(fx);
mBatch.reset(new DirectX::PrimitiveBatch<DirectX::VertexPositionColor>(device_context));
mBatchEffect.reset(new DirectX::BasicEffect(p_device));
mBatchEffect->SetVertexColorEnabled(true);
{
void const* shaderByteCode;
size_t byteCodeLength;
mBatchEffect->GetVertexShaderBytecode(&shaderByteCode, &byteCodeLength);
HR(p_device->CreateInputLayout(DirectX::VertexPositionColor::InputElements,
DirectX::VertexPositionColor::InputElementCount,
shaderByteCode, byteCodeLength,
mBatchInputLayout.ReleaseAndGetAddressOf()));
}
mModel = DirectX::Model::CreateFromSDKMESH(p_device, file_mesh, *mFxFactory);
}
I know there is a way to get vertexes from the ID3D11Buffer, answered here:
How to read vertices from vertex buffer in Direct3d11
But they suggest not loading from GPU memory. So I assume it's better to load vertices ahead of time into a separate container.
I looked into CreateFromSDKMESH and there are a few functions that are publicly accessible without making changes to XTK
In order to get Vertices while loading a model, replace the line mModel = DirectX::Model::CreateFromSDKMESH(p_device, file_mesh, *mFxFactory); in the question above with:
size_t data_size = 0;
std::unique_ptr<uint8_t[]> v_data;
HRESULT hr = DirectX::BinaryReader::ReadEntireFile(file_mesh, v_data, &data_size);
if (FAILED(hr))
{
DirectX::DebugTrace("CreateFromSDKMESH failed (%08X) loading '%ls'\n", hr, file_mesh);
throw std::exception("CreateFromSDKMESH");
}
uint8_t* mesh_data = v_data.get();
mModel = DirectX::Model::CreateFromSDKMESH(p_device, v_data.get(), data_size, *mFxFactory, false, false);
mModel->name = file_mesh;
auto v_header = reinterpret_cast<const DXUT::SDKMESH_HEADER*>(mesh_data);
auto vb_array = reinterpret_cast<const DXUT::SDKMESH_VERTEX_BUFFER_HEADER*>(mesh_data + v_header->VertexStreamHeadersOffset);
if(v_header->NumVertexBuffers < 1)
throw std::exception("Vertex Buffers less than 1");
auto& vertex_header = vb_array[0];
uint64_t buffer_data_offset = v_header->HeaderSize + v_header->NonBufferDataSize;
uint8_t* buffer_data = mesh_data + buffer_data_offset;
auto verts_pairs = reinterpret_cast<std::pair<Vector3,Vector3>*>(buffer_data + (vertex_header.DataOffset - buffer_data_offset));
There, accessing a coordinate should be as simple as
float x = verts_pairs[0].first.x;
and the total number of vertices is stored in
vertex_header.NumVertices
Dont forget that after loading the vertex buffer gets deleted, so you may want to do something like:
memcpy(vertexBuffer, reinterpret_cast<std::pair<Vector3,Vector3>*>(buffer_data + (vertex_header.DataOffset - buffer_data_offset)), vertexCnt);
Also, vertex buffer doesn't get transformed with draw functions, so you will need to do transforms yourselves
Thanks,