Vulkan Payload doesn't update some struct member (RayRracing) - glsl

I have a problem with updating some value in a Payload. I generate the payload inside the rgen shader like
struct HitPayload
{
vec3 hitValue;
uint32_t depth;
vec3 worldPos;
uint32_t instanceCustomIndexEXT;
uint32_t primitiveID;
float thereIsNearObj;
float refractionIndex;
float _pad0;
};
layout(location = 0) rayPayloadEXT HitPayload prd;
Inside the other shader rchit(closest) and rahit(any) i access the payload with
struct HitPayload
{
vec3 hitValue;
uint32_t depth;
vec3 worldPos;
uint32_t instanceCustomIndexEXT;
uint32_t primitiveID;
float thereIsNearObj;
float refractionIndex;
float _pad0;
};
layout(location = 0) rayPayloadInEXT HitPayload prd;
The shader compile and the program run. The problem is that i can update only the hitValue and depth value. Other value never change. I update value in the shaders like this
prd.hitValue = vec3(1.0,1.0,0.0); //works
prd.thereIsNearObj = 1.0; //not working
I checked:
Alignament
remove ignoreIntersectionEXT (block payload update)
Implicit cast from int -> uint32_t

i think a good sleep is my best friends (joking). The problem is that Vulkan doesn't support extension type in the payload. I replace all the uint32_t with int and now it works fine. I haven't found a valid source for my solution, but it works.

Related

Why are my Uniforms and Storage Buffers showing the wrong data in Vulkan using GLSL?

EDIT2: I found the error, the code that creates the buffer was overwriting one of the storage buffers with one of the uniform buffers that I create afterwards because of a copy paste error.
So I'm currently trying to adapt the Ray Tracing Weekend project (https://raytracing.github.io/) from a CPU program into a compute shader using Vulkan. I'm writing the compute shader using GLSL which is compiled to SPIRV.
I send the scene in the form of a struct containing arrays of structs to the GPU as a storage buffer which looks like this on the CPU (world_gpu being the storage buffer):
struct sphere_gpu
{
point3 centre;
float radius;
};
struct material_gpu
{
vec3 albedo;
float refraction_index;
float fuzz;
uint32_t material_type;
};
struct world_gpu
{
sphere_gpu spheres[484];
material_gpu materials[484];
uint32_t size;
};
and this on the GPU:
// Struct definitions to mirror the CPU representation
struct sphere{
vec4 centre;
float radius;
};
struct material{
vec4 albedo;
float refraction_index;
float fuzz;
uint material_type;
};
// Input scene
layout(std430, binding = 0) buffer world{
sphere[MAX_SPHERES] spheres;
material[MAX_SPHERES] materials;
uint size;
} wrld;
I've already fixed the alignment problem for vec3 on the CPU side by using alignas(16) for my vec3 type: class alignas (16) vec3, and changing the type on the GPU representation to be vec4s as shown above to match the alignment of the data I'm sending over
However, whilst testing this I only seem to be able to read 0s for the spheres when I inspect the data after the compute shader has finished running (I've hijacked my output pixel array in the shader which I write debug data to so that I can read it and debug certain things).
Is there anything obviously stupid that I'm doing here, aside from being a Vulkan noob in general?
EDIT:
Here's my buffer uploading code. set_manual_buffer_data is where the data is actually copied to the buffer, create_manual_buffer is where the buffer and memory itself are created.
template <typename T>
void set_manual_buffer_data(vk::Device device, vk::Buffer& buffer, vk::DeviceMemory& buffer_memory, T* elements, uint32_t num_elements,uint32_t element_size)
{
uint32_t size = element_size * num_elements;
// Get a pointer to the device memory
void* buffer_ptr = device.mapMemory(buffer_memory, 0, element_size * num_elements);
// Copy data to buffer
memcpy(buffer_ptr, elements, element_size * num_elements);
device.unmapMemory(buffer_memory);
}
// call with physical_device.getMemoryProperties() for second argument
void create_manual_buffer(vk::Device device, vk::PhysicalDeviceMemoryProperties memory_properties, uint32_t queue_family_index, const uint32_t buffer_size, vk::BufferUsageFlagBits buffer_usage, vk::Buffer& buffer, vk::DeviceMemory& buffer_memory)
{
vk::BufferCreateInfo buffer_create_info{};
buffer_create_info.flags = vk::BufferCreateFlags();
buffer_create_info.size = buffer_size;
buffer_create_info.usage = buffer_usage; // Play with this
buffer_create_info.sharingMode = vk::SharingMode::eExclusive; //concurrent or exclusive
buffer_create_info.pQueueFamilyIndices = &queue_family_index;
buffer_create_info.queueFamilyIndexCount = 1;
buffer = device.createBuffer(buffer_create_info);
vk::MemoryRequirements memory_requirements = device.getBufferMemoryRequirements(buffer);
uint32_t memory_type_index = static_cast<uint32_t>(~0);
vk::DeviceSize memory_heap_size = static_cast<uint32_t>(~0);
for (uint32_t current_memory_type_index = 0; current_memory_type_index < memory_properties.memoryTypeCount; ++current_memory_type_index)
{
// search for desired memory type from the device memory
vk::MemoryType MemoryType = memory_properties.memoryTypes[current_memory_type_index];
if ((vk::MemoryPropertyFlagBits::eHostVisible & MemoryType.propertyFlags) &&
(vk::MemoryPropertyFlagBits::eHostCoherent & MemoryType.propertyFlags))
{
memory_heap_size = memory_properties.memoryHeaps[MemoryType.heapIndex].size;
memory_type_index = current_memory_type_index;
break;
}
}
// Create device memory
vk::MemoryAllocateInfo buffer_allocate_info(memory_requirements.size, memory_type_index);
buffer_memory = device.allocateMemory(buffer_allocate_info);
device.bindBufferMemory(buffer, buffer_memory, 0);
}
This code is then called here (I haven't got to the refactoring stage yet so please forgive the spaghetti):
std::vector<vk::Buffer> uniform_buffers;
std::vector<vk::DeviceMemory> uniform_buffers_memory;
std::vector<vk::Buffer> storage_buffers;
std::vector<vk::DeviceMemory> storage_buffers_memory;
void run_compute(Vulkan_Wrapper &vulkan, Vulkan_Compute &compute, world_gpu *world, color* image, uint32_t image_size, image_info img_info, camera_gpu camera_gpu)
{
vulkan.init();
uniform_buffers.resize(2);
uniform_buffers_memory.resize(2);
storage_buffers.resize(2);
storage_buffers_memory.resize(2);
vulkan.create_manual_buffer(vulkan.m_device, vulkan.m_physical_device.getMemoryProperties(),
vulkan.m_queue_family_index, sizeof(world_gpu),
vk::BufferUsageFlagBits::eStorageBuffer, storage_buffers[0],
storage_buffers_memory[0]);
vulkan.create_manual_buffer(vulkan.m_device, vulkan.m_physical_device.getMemoryProperties(),
vulkan.m_queue_family_index, image_size * sizeof(color),
vk::BufferUsageFlagBits::eStorageBuffer, storage_buffers[1],
storage_buffers_memory[1]);
vulkan.set_manual_buffer_data(vulkan.m_device, storage_buffers[0], storage_buffers_memory[0], world, 1, sizeof(world_gpu));
vulkan.set_manual_buffer_data(vulkan.m_device, storage_buffers[1], storage_buffers_memory[1], image, image_size, sizeof(color));
vulkan.create_manual_buffer(vulkan.m_device, vulkan.m_physical_device.getMemoryProperties(),
vulkan.m_queue_family_index, sizeof(image_info),
vk::BufferUsageFlagBits::eUniformBuffer, storage_buffers[0],
uniform_buffers_memory[0]);
vulkan.create_manual_buffer(vulkan.m_device, vulkan.m_physical_device.getMemoryProperties(),
vulkan.m_queue_family_index, sizeof(camera_gpu),
vk::BufferUsageFlagBits::eUniformBuffer, uniform_buffers[1],
uniform_buffers_memory[1]);
vulkan.set_manual_buffer_data(vulkan.m_device, uniform_buffers[0], uniform_buffers_memory[0], &img_info, 1, sizeof(img_info));
vulkan.set_manual_buffer_data(vulkan.m_device, uniform_buffers[1], uniform_buffers_memory[1], &camera_gpu, 1, sizeof(camera_gpu));
// Run pipeline etc
I should note that it works perfectly fine when I check the values stored in the image storage buffer (storage_buffers_memory[1]), it's the other 3 that is giving me difficulties

diffrence between std140 and std430 layout

I have trouble with differences between layout std140 and std430.
this is my code of struct in .Cpp
struct Particle {
glm::vec3 position = glm::vec3(0);
float density = 1;
};
for (int z = 0; z < d; ++z) {
for (int y = 0; y < d; ++y) {
for (int x = 0; x < d; ++x) {
int index = z * d * d + y * d + x;
if (index >= num) break;
// dam break
m_InitParticles[index].position = glm::vec3(x, y, z) * distance;
m_InitParticles[index].position += glm::vec3(getJitter(), getJitter(), getJitter());
m_InitParticles[index].density = index;
}
}
}
And compute shader code
struct Particle {
vec3 position;
float density;
};
layout(std140, binding = 0) restrict buffer Particles{
Particle particles[];
};
it seems that I get correct data std430 data in renderdoc by
pack#(std430)
struct Particle {
vec3 position;
float t;
}
And when I use pack#(std140), the struct seems to have a 8N space std140 in renderdoc.
pack#(std430)
struct Particle {
vec3 position;
float t;
}
With std140, glGetActiveUniformsiv returns offset 0 and 12.
Why struct vec + float takes up extra space in std140?
The official OpenGL wiki has got you covered:
The rules for std140 layout are covered quite well in the OpenGL specification (OpenGL 4.5, Section 7.6.2.2, page 137). Among the most important is the fact that arrays of types are not necessarily tightly packed. An array of floats in such a block will not be the equivalent to an array of floats in C/C++. The array stride (the bytes between array elements) is always rounded up to the size of a vec4 (ie: 16-bytes). So arrays will only match their C/C++ definitions if the type is a multiple of 16 bytes
Warning: Implementations sometimes get the std140 layout wrong for vec3 components. You are advised to manually pad your structures/arrays out and avoid using vec3 at all.

Metal Prevented Device Address Mode

I am creating a graphics application that uses Metal to render everything. When I did a frame debug under pipeline statistics for all of my draw calls there is a !! priority alert titled "Prevented Device Address Mode Load" with the details:
Indexing using unsigned int for offset prevents addressing calculation in device. To prevent this extra ALU operation use int for offset.
So for my simplest draw call that involves this here is what is going on. There is a large amount of vertex data followed by an index buffer. The index buffer is created and filled at the start and is then constant from then on. The vertex data is constantly all changing.
I have the following types:
struct Vertex {
float3 data;
};
typedef int32_t indexType;
Then the following draw call
[encoder drawIndexedPrimitives:MTLPrimitiveTypeTriangle indexCount:/*int here*/ indexType:MTLIndexTypeUInt32 indexBuffer:indexBuffer indexBufferOffset:0];
Which goes to the following vertex function
vertex VertexOutTC vertex_fun(constant Vertex * vertexBuffer [[ buffer(0) ]],
indexType vid [[ vertex_id ]],
constant matrix_float3x3* matrix [[buffer(1)]]) {
const float2 coords[] = {float2(-1, -1), float2(-1, 1), float2(1, -1), float2(1, 1)};
CircleVertex vert = vertexBuffer[vid];
VertexOutTC out;
out.position = float4((*matrix * float3(vert.data.x, vert.data.y, 1.0)).xy, ((float)((int)vid/4))/10000.0, 1.0);
out.color = HSVtoRGB(vert.data.z, 1.0, 1.0);
out.tc = coords[vid % 4];
return out;
}
I am very confused what exactly I am doing wrong here. The error would seem to suggest I shouldnt use an unsigned type for the offset which I am guessing is the index buffer.
The thing is is ultimately for the index buffer there is only MTLIndexTypeUInt32 and MTLIndexTypeUInt16 both of which are unsigned. Furthermore if I try to use a raw int as the type the shader wont compile. What is going on here?
In Table 5.1 of the Metal Shading Language Specification, they list the "Corresponding Data Type" for vertex_id as ushort or uint. (There are similar tables in that document for all the rest of the types, my examples will use thread_position_in_grid which is the same).
Meanwhile, the hardware prefers signed types for addressing. So if you do
kernel void test(uint position [[thread_position_in_grid]], device float *test) {
test[position] = position;
test[position + 1] = position;
test[position + 2] = position;
}
we are indexing test by an unsigned integer. Debugging this shader we can see that it involves 23 instructions, and has the "Prevented Device Mode Store" warning:
If we convert to int instead, this uses only 18 instructions:
kernel void test(uint position [[thread_position_in_grid]], device float *test) {
test[(int)position] = position;
test[(int)position + 1] = position;
test[(int)position + 2] = position;
}
However, not all uint can fit into int, so this optimization only works for half the range of uint. Still, that's many usecases.
What about ushort? Well,
kernel void test(ushort position [[thread_position_in_grid]], device float *test) {
test[position] = position;
test[position + 1] = position;
test[position + 2] = position;
}
This version is only 17 instructions. We are also "warned" about using unsigned indexing here, even though it is faster than the signed versions above. This suggests to me the warning is not especially well-designed and requires significant interpretation.
kernel void test(ushort position [[thread_position_in_grid]], device float *test) {
short p = position;
test[p] = position;
test[p + 1] = position;
test[p + 2] = position;
}
This is the signed version of short, and fixes the warning, but is also 17 instructions. So it makes Xcode happier, but I'm not sure it's actually better.
Finally, here's the case I was in. My position ranges above signed short, but below unsigned short. Does it make sense to promote short to int for the indexing?
kernel void test(ushort position [[thread_position_in_grid]], device float *test) {
int p = position;
test[p] = position;
test[p + 1] = position;
test[p + 2] = position;
}
This is also 17 instructions, and generates the device store warning. I believe the compiler proves ushort fits into int, and ignores the conversion. This "unsigned" arithmetic then produces a warning telling me to use int, even though that's exactly what I did.
In summary, these warnings are a bit naive, and should really be confirmed or refuted through on-device testing.

how to use a fragment shader without main function

i found a shader to do a drop shadow from http://madebyevan.com/shaders/fast-rounded-rectangle-shadows/
// License: CC0 (http://creativecommons.org/publicdomain/zero/1.0/)
// This approximates the error function, needed for the gaussian integral
vec4 erf(vec4 x) {
vec4 s = sign(x), a = abs(x);
x = 1.0 + (0.278393 + (0.230389 + 0.078108 * (a * a)) * a) * a;
x *= x;
return s - s / (x * x);
}
// Return the mask for the shadow of a box from lower to upper
float boxShadow(vec2 lower, vec2 upper, vec2 point, float sigma) {
vec4 query = vec4(point - lower, upper - point);
vec4 integral = 0.5 + 0.5 * erf(query * (sqrt(0.5) / sigma));
return (integral.z - integral.x) * (integral.w - integral.y);
}
i thought that a shader need a main function and should return color.
My question is how to use the function boxShadow in c++ code with opengl given a box
thanks
A function can be defined in a shader just as you would do with a C function. I mean, the code for the function is in the same "unit" as the rest of the shader.
#version XX-YY
//MyFunc(.....)
whateverreturn MyFunc(.....)
{
do something and return a whateverreturn
}
void main(void)
{
//use MyFunc
whateverreturn var = MyFunc(....)
}
A bit different case is when you have a function that can be part of several shaders, but it isn't a "full" shader, it has no main() function. This function lives in a file or in string-array or something similar.
Say you have the function in a specific file:
#version XX-YY
//MyFunc(.....)
whateverreturn MyFunc(.....)
{
do something and return a whateverreturn
}
And the file with the shader where you want to use it:
#version XX-YY
//declare the function
whateverreturn MyFunc(.....);
void main(void)
{
//use MyFunc
whateverreturn var = MyFunc(....)
}
Like you do with any common GLSL code, use glShaderSource and glCompileShader.
Now the key step is how to integrate the code within a full (with 'main') shader: just use glAttachShader (again, like you do with VS or FS) before glLinkProgram and that'as all.

GLSL linker error(Sampler needs to be a uniform (global or parameter to main))

we have a GLSL fragment shader :
but the problem is in this code
vec4 TFSelection(StrVolumeColorMap volumeColorMap , vec4 textureCoordinate)
{
vec4 finalColor = vec4(0.0);
if(volumeColorMap.TransferFunctions[0].numberOfBits == 0)
{
return texture(volumeColorMap.TransferFunctions[0].TransferFunctionID,textureCoordinate.x);
}
if(textureCoordinate.x == 0)
return finalColor;
float deNormalize = textureCoordinate.x *65535/*255*/;
for(int i = 0; i < volumeColorMap.TransferFunctions.length(); i++)
{
int NormFactor = volumeColorMap.TransferFunctions[i].startBit + volumeColorMap.TransferFunctions[i].numberOfBits;
float minval = CalculatePower(2, volumeColorMap.TransferFunctions[i].startBit);
if(deNormalize >= minval)
{
float maxval = CalculatePower(2, NormFactor);
if(deNormalize <maxval)
{
//float tempPower = CalculatePower(2 , NormFactor);
float coord = deNormalize /maxval/*tempPower*/;
return texture(volumeColorMap.TransferFunctions[i].TransferFunctionID,coord);
}
}
}
return finalColor;
}
when we compile and link shader this message logs:
Sampler needs to be a uniform (global or parameter to main), need to
inline function or resolve conditional expression
with a simple change like maybe the shader link successfully like changing
float `coord = deNormalize /maxval
to
float coord = deNormalize .`
driver:nvidia 320.49