Incorrect constant buffer size - c++

I have created a following constant buffer:
// C++
struct IndexConstantBuffer
{
unsigned indexes[32]{};
};
// hlsl
cbuffer IndexConstantBuffer : register(b0)
{
uint indexes[32];
};
I am having the following warning:
D3D11 WARNING: ID3D11DeviceContext::DrawIndexedInstanced: The size of
the Constant Buffer at slot 0 of the Pixel Shader unit is too small
(128 bytes provided, 512 bytes, at least, expected). This is OK, as
out-of-bounds reads are defined to return 0. It is also possible the
developer knows the missing data will not be used anyway. This is only
a problem if the developer actually intended to bind a sufficiently
large Constant Buffer for what the shader expects. [ EXECUTION WARNING
#351: DEVICE_DRAW_CONSTANT_BUFFER_TOO_SMALL]
What causes this warning? Do I need to add a padding of 384 bytes (512 - 128) or is there another way around?

Related

Aligning memory of SSBO that is an array of structs containing an array?

I'm flattening out an octree and sending it to my fragment shader using an SSBO, and I believe I am running into some memory alignment issues. I'm using std430 for the layout and binding a vector of voxels to this SSBO this is the structure in my shader. I'm using GLSL 4.3 FYI
struct Voxel
{
bool data; // 4
vec4 pos; // 16
vec4 col; // 16
float size; // 4
int index; // 4
int pIndex; // 4
int cIdx[8]; // 4, 16 or 32 bytes?
};
layout (std430, binding=2) buffer octreeData
{
Voxel voxels[];
};
I'm not 100% sure but I think I'm running into an issue using the int cIdx[8] array inside of the struct, looking at the spec (page 124, section 7.6)
If the member is an array of scalars or vectors, the base alignment and array
stride are set to match the base alignment of a single array element, according
to rules (1), (2), and (3), and rounded up to the base alignment of a vec4. The
array may have padding at the end; the base offset of the member following
the array is rounded up to the next multiple of the base alignment.
I'm not entirely sure what the alignment is, I know the vec4's take up 16 bytes of memory, but how much does my array? If it was just sizeof(int)*8 that would be 32, but it says that it's set to the size of a single array element and then rounded up to a vec4 right? So does that mean my cIdx array has a base alignment of 16 bytes? There's no follow up members so is there padding getting added to my struct?
So total structure memory = 52 bytes (if we only allocate 4 bytes for cIdx), would that mean there is 12 bytes of padding being added on that I need to account for that may be causing me issues? If it was allocating 16 bytes would that be 64 bytes total for the structure and no memory alignment issues?
My corresponding c++ structure
struct Voxel
{
bool data;
glm::vec4 pos;
glm::vec4 col;
float size;
int index;
int pIndex;
int cIdx[8];
};
I'm then filling in my std::vector<Voxel> and passing it to my shader like so
glGenBuffers(1, &octreeSSBO);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, octreeSSBO);
glBufferData(GL_SHADER_STORAGE_BUFFER, voxelData.size()*sizeof(Voxel), voxelData.data(), GL_DYNAMIC_DRAW);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, octreeSSBO);
reading directly from the voxelData vector, I can confirm that the data is getting filled in correctly, and I can even occasionally see that the data is getting passed to the shader but behaving incorrectly compared to what I would expect to see based on the values I'm looking at.
Does it look like there are memory alignment issues here?
I'm not entirely sure what the alignment is
The specification is very clear as to what the base alignment of things are. Your problem is not in item #4 (std430 doesn't do the rounding specified in #4 anyway).
Your problem is in #2:
If the member is a two- or four-component vector with components consuming N basic machine units, the base alignment is 2N or 4N, respectively.
In GLSL, vec4 has a base alignment of 16. That means that any vec4 must be allocated on a 16-byte boundary.
pos must be on a 16-byte boundary. However, data is only 4 bytes. Therefore, 12 bytes of padding must be inserted between data and pos to satisfy std430's alignment requirements.
However, glm::vec4 has a C++ alignment of 4. So the C++ compiler does not insert a bunch of padding between data and pos. Thus, the types in the two languages do not agree.
You should explicitly align all GLM vectors in C++ structs that you want to match GLSL, using C++11's alignas keyword:
struct Voxel
{
bool data;
alignas(16) glm::vec4 pos;
alignas(16) glm::vec4 col;
float size;
int index;
int pIndex;
int cIdx[8];
};
Also, I would not assume that the C++ type bool and the GLSL type bool have the same size.

Vulkan unpack uvec4 from integer

I've observed some strange behavior. I have an array of unsigned 32bit integers.
I use one integer to encode 4 values, each one byte in size. I'd like to then pass such buffer to vertex shader
layout (location = 0) in uvec4 coords;
In order to achieve this, I use VkVertexInputAttributeDescription with format set to VK_FORMAT_R8G8B8A8_UINT. I have defined such handy struct
struct PackedUVec4{
unsigned char x;
unsigned char y;
unsigned char z;
unsigned char w;
};
Then I build my buffer as PackedUVec4[] and such buffer is then sent to GPU. However, what I have observed, is that the order of bytes gets swapped. For example if I have
layout (location = 0) in uvec4 coords;
void main(){
debugPrintfEXT("%v4d", coords);
}
it seems to print the correct output. But if change format to VK_FORMAT_R32_UINT and try to run
layout (location = 0) in uint coords;
void main(){
uint w = coords & 255u;
uint z = coords/256 & 255u;
uint y = coords/(256*256) & 255u;
uint x = coords/(256*256*256) & 255u;
debugPrintfEXT("%v4d", uvec4(x,y,z,w));
}
I get the bytes in opposite order. Do the vector types use different endianness?
The problem is not with Vulkan, but with your code's interpretation of what's going on. Both sending and receiving.
Recall that endianness is about the (potential) difference between the logical location of a byte within a multi-byte value and the relative address of a byte within a multi-byte value. In little endian, if you write a four-byte value to memory, the first byte will be the least significant byte of the value.
Endianness applies to both reading and writing, but only when reading/writing multi-byte values as multi-byte values. Your PackedUVec4 is not a multi-byte value; it's a struct containing bytes with a specific layout. Therefore, if you write to the x component of a PackedUVec4, you are writing to the first byte of that structure, regardless of your CPU's endian.
When you told Vulkan to read this data as a single 4-byte value (VK_FORMAT_R32_UINT), it does so as defined by your CPU's endian. But your code didn't generate that data in accord with your CPU's endian; it generated it in terms of the layout of a PackedUVec4. So the first byte in memory is x. If the GPU reads those 4 bytes as a little endian 4-byte value, then the first byte will map to the least significant byte of the 4-byte value.
But your code that manually decodes the data is decoding it wrong. It expects the least significant byte to be w.
If you want your code to be endian-independent, then you need the GPU to read the data as 4 individual bytes, in the order stored in memory. Which is what the VK_FORMAT_R8G8B8A8_UINT represents. If you want the GPU to read it as an endian-based ordering within a single 32-bit integer, then it needs to be written that way by the CPU.

2D array in constant buffer

Im trying to pass a 2D float array to a constant buffer:
//In the shader:
cbuffer myBuffer
{
other buffer elements
.
.
float myArray[16][16];
};
//In the CPU:
struct myBuffer_struct
{
other buffer elements
.
.
float myArray[16][16];
};
But im having a lot of problems dealing with the padding. I tried using
float4[size/4][size]
in my cbuffer and a lot of other type combinations but I cant access to my array by indexation in any way. What is the proper way to do this?
Thank you.
I've had this issue and it comes down to basically the alignment of the buffer. Your HLSL cbuffer definition most definitely will be padding differently to what you have defined in your struct.
The alignment probably along 16 byte (4 floats) alignment. In my code, I was writing 4 floats out into a buffer. Like this below, as the array alignment was different in the cbuffer.
for (int i = 0; i < 8; i++)
{
stream.Write<float>(m_waveLengths[i] );
stream.Write<float>(m_waveSpeeds[i] );
stream.Write<float>(m_amplitudes[i] );
stream.Write<float>(m_steepness[i]);
}
To read this, I used a float4 array definition.
// hlsl definition
float4 Wave[8];
I then referenced the relevant item as Wave[0].x, Wave[0].y, Wave[0].z, Wave[0].w
The memory alignment would make the buffer 4 times bigger if I didn't pack it like this. This is because in the HLSL code, the buffer definition seems to of aligned each element of the array along 16 byte boundries (4 x floats). So instead, I interweaved my 4 arrays into 1 array and used the properties of float4 to reference it.
because the alignment of float waveLengths[8] would of meant that I would have to write it into the buffer like this:
for (int i = 0; i < 8; i++)
{
stream.Write<float>(m_waveLengths[i] );
stream.Write<float>(0.0f);
stream.Write<float>(0.0f);
stream.Write<float>(0.0f);
}
For some reason (and I am probably not setting a certain HLSL compiler directive), using arrays in the Cbuffer had some quirks where it would pad each element to a 16 byte boundary.
So, for your float myArray[16][16], I would assume that you look at the alignment, you may have to write the buffer for this out in a similar manner, padding out 12 bytes after each element in the array. I'm sure someone will respond with correct compiler directive to get rid of this quirk, I just solved this a while ago and your problem looks similar to what I had.

GL_SHADER_STORAGE_BUFFER memory limitations

I'm writing ray-tracing on OGL computing shaders, to pass data to and from shaders I use buffers.
When size of vec2 output buffer (which is equal to number of rays multiplied by number of faces) reaches ~30Mb attempt of mapping buffer is stable returning NULL pointer. Range mapping also fails.
I can't find any info about GL_SHADER_STORAGE_BUFFER limitations in ogl documentation, but maybe someone can help me, is ~30Mb limit or this mapping-fail may happen because of something different?
And is there any way to avoid this except for calling shader multiple times?
Data declaration in shader:
#version 440
layout(std430, binding=0) buffer rays{
vec4 r[];
};
layout(std430, binding=1) buffer faces{
vec4 f[];
};
layout(std430, binding=2) buffer outputs{
vec2 o[];
};
uniform int face_count;
uniform vec4 origin;
Calling code (using some Qt5 wrappers):
QOpenGLBuffer ray_buffer;
QOpenGLBuffer face_buffer;
QOpenGLBuffer output_buffer;
QVector<QVector2D> output;
output.resize(rays[r].size()*faces.size());
if(!ray_buffer.create()) { /*...*/ }
if(!ray_buffer.bind()) { /*...*/ }
ray_buffer.allocate(rays.data(), rays.size()*sizeof(QVector4D));
if(!face_buffer.create()) { /*...*/ }
if(!face_buffer.bind()) { /*...*/ }
face_buffer.allocate(faces.data(), faces.size()*sizeof(QVector4D));
if(!output_buffer.create()) { /*...*/ }
if(!output_buffer.bind()) { /*...*/ }
output_buffer.allocate(output.size()*sizeof(QVector2D));
ogl->glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ray_buffer.bufferId());
ogl->glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, face_buffer.bufferId());
ogl->glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, output_buffer.bufferId());
int face_count = faces.size();
compute.setUniformValue("face_count", face_count);
compute.setUniformValue("origin", pos);
ogl->glDispatchCompute(rays.size()/256, faces.size(), 1);
ray_buffer.destroy();
face_buffer.destroy();
QVector2D* data = (QVector2D*)output_buffer.map(QOpenGLBuffer::ReadOnly);
First of all, you have to understand that the OpenGL specification defines minimum maxima for a variety of values (the ones starting with a MAX_{*} prefix). That means that implementations are required to at least provide the specified amount as the maximum value, but are free to increase the limit as implementors see fit. This way, developers can at least rely on some upper bound, but can still make provisions for possibly larger values.
Section 23 - State Tables summarizes what has been previously specified in the corresponding sections. The information you were looking for is found in table 23.64 - Implementation Dependent Aggregate Shader Limits (cont.). If you want to know about which state belongs where (because there is per-object state, quasi-global state, program state and so on), you go to section 23.
The minimum maximum size of a shader storage buffer is represented by the symbolic constant MAX_SHADER_STORAGE_BLOCK_SIZE as per section 7.8 of the core OpenGL 4.5 specification.
Since their adoption into core, the required size (i.e. the minimum maximum) has been significantly increased. In core OpenGL 4.3 and 4.4, the minimum maximum was pow(2, 24) (or 16MB with 1 byte basic machine units and 1MB = 1024^2 bytes) - in core OpenGL 4.5 this value is now pow(2, 27) (or 128MB)
Summary: When in doubt about OpenGL state, refer to section 23 of the core specification.
From OpenGL Wiki:
SSBOs can be much larger. The OpenGL spec guarantees that UBOs can be
up to 16KB in size (implementations can allow them to be bigger). The
spec guarantees that SSBOs can be up to 128MB. Most implementations
will let you allocate a size up to the limit of GPU memory.
OpenGL < 4.5 guarantees only 16MiB (OpenGL 4.5 increased the minimum to 128MiB) , you can try using glGet() to query if you can bind more.
GLint64 max;
glGetInteger64v(GL_MAX_SHADER_STORAGE_BLOCK_SIZE, &max);
In fact problem seems to be in Qt wrappers. Didn't look in-depth, but when I've changed QOpenGLBuffer's create(), bind(), allocate() and map() to glCreateBuffers(), glBindBuffer(), glNamedBufferData() and glMapNamedBuffer(), all called through QOpenGLFunctions_4_5_Core, memory problem was gone until I reached 2Gb (which is GPU physical memory limit).
Second error I've made was not using glMemoryBarrier(), but it didn't help while QOpenGLBuffer was in use.

OpenGL compute shader buffer allocation fails

I am trying to use a buffer in a compute shader like this:
layout (binding = 1, std430) writeonly buffer bl1
{
uint data[gl_WorkGroupSize.x * gl_NumWorkGroups.x * gl_NumWorkGroups.y];
};
but I get the following error (because of using gl_NumWorkGroups for the size):
Array size must be a constant integer expression
How can I work around this?
Stop putting in a length at all:
layout (binding = 1, std430) writeonly buffer bl1
{
uint data[];
};
This is a feature unique to SSBOs. And you can only have one unsized array in an SSBO, and it must be the last member in the interface block. The size of data will be computed based on the size of the buffer object range you bind to that binding point. So if you bind 32KB of buffer space, you will get 8K of items (the size of a uint is 4 bytes).
At runtime, your shader can use gl_WorkGroupSize.x * gl_NumWorkGroups.x * gl_NumWorkGroups.y to compute the length of data. Alternatively, just use data.length() to get the length of the buffer that the user gave you. Alternatively... you don't need to explicitly know the length, depending on how you use it.
As long as your OpenGL buffer binding code uses a buffer with enough memory for your dispatch count and work group size, you're fine.