Im trying to pass a 2D float array to a constant buffer:
//In the shader:
cbuffer myBuffer
{
other buffer elements
.
.
float myArray[16][16];
};
//In the CPU:
struct myBuffer_struct
{
other buffer elements
.
.
float myArray[16][16];
};
But im having a lot of problems dealing with the padding. I tried using
float4[size/4][size]
in my cbuffer and a lot of other type combinations but I cant access to my array by indexation in any way. What is the proper way to do this?
Thank you.
I've had this issue and it comes down to basically the alignment of the buffer. Your HLSL cbuffer definition most definitely will be padding differently to what you have defined in your struct.
The alignment probably along 16 byte (4 floats) alignment. In my code, I was writing 4 floats out into a buffer. Like this below, as the array alignment was different in the cbuffer.
for (int i = 0; i < 8; i++)
{
stream.Write<float>(m_waveLengths[i] );
stream.Write<float>(m_waveSpeeds[i] );
stream.Write<float>(m_amplitudes[i] );
stream.Write<float>(m_steepness[i]);
}
To read this, I used a float4 array definition.
// hlsl definition
float4 Wave[8];
I then referenced the relevant item as Wave[0].x, Wave[0].y, Wave[0].z, Wave[0].w
The memory alignment would make the buffer 4 times bigger if I didn't pack it like this. This is because in the HLSL code, the buffer definition seems to of aligned each element of the array along 16 byte boundries (4 x floats). So instead, I interweaved my 4 arrays into 1 array and used the properties of float4 to reference it.
because the alignment of float waveLengths[8] would of meant that I would have to write it into the buffer like this:
for (int i = 0; i < 8; i++)
{
stream.Write<float>(m_waveLengths[i] );
stream.Write<float>(0.0f);
stream.Write<float>(0.0f);
stream.Write<float>(0.0f);
}
For some reason (and I am probably not setting a certain HLSL compiler directive), using arrays in the Cbuffer had some quirks where it would pad each element to a 16 byte boundary.
So, for your float myArray[16][16], I would assume that you look at the alignment, you may have to write the buffer for this out in a similar manner, padding out 12 bytes after each element in the array. I'm sure someone will respond with correct compiler directive to get rid of this quirk, I just solved this a while ago and your problem looks similar to what I had.
Related
I have created a following constant buffer:
// C++
struct IndexConstantBuffer
{
unsigned indexes[32]{};
};
// hlsl
cbuffer IndexConstantBuffer : register(b0)
{
uint indexes[32];
};
I am having the following warning:
D3D11 WARNING: ID3D11DeviceContext::DrawIndexedInstanced: The size of
the Constant Buffer at slot 0 of the Pixel Shader unit is too small
(128 bytes provided, 512 bytes, at least, expected). This is OK, as
out-of-bounds reads are defined to return 0. It is also possible the
developer knows the missing data will not be used anyway. This is only
a problem if the developer actually intended to bind a sufficiently
large Constant Buffer for what the shader expects. [ EXECUTION WARNING
#351: DEVICE_DRAW_CONSTANT_BUFFER_TOO_SMALL]
What causes this warning? Do I need to add a padding of 384 bytes (512 - 128) or is there another way around?
I'm flattening out an octree and sending it to my fragment shader using an SSBO, and I believe I am running into some memory alignment issues. I'm using std430 for the layout and binding a vector of voxels to this SSBO this is the structure in my shader. I'm using GLSL 4.3 FYI
struct Voxel
{
bool data; // 4
vec4 pos; // 16
vec4 col; // 16
float size; // 4
int index; // 4
int pIndex; // 4
int cIdx[8]; // 4, 16 or 32 bytes?
};
layout (std430, binding=2) buffer octreeData
{
Voxel voxels[];
};
I'm not 100% sure but I think I'm running into an issue using the int cIdx[8] array inside of the struct, looking at the spec (page 124, section 7.6)
If the member is an array of scalars or vectors, the base alignment and array
stride are set to match the base alignment of a single array element, according
to rules (1), (2), and (3), and rounded up to the base alignment of a vec4. The
array may have padding at the end; the base offset of the member following
the array is rounded up to the next multiple of the base alignment.
I'm not entirely sure what the alignment is, I know the vec4's take up 16 bytes of memory, but how much does my array? If it was just sizeof(int)*8 that would be 32, but it says that it's set to the size of a single array element and then rounded up to a vec4 right? So does that mean my cIdx array has a base alignment of 16 bytes? There's no follow up members so is there padding getting added to my struct?
So total structure memory = 52 bytes (if we only allocate 4 bytes for cIdx), would that mean there is 12 bytes of padding being added on that I need to account for that may be causing me issues? If it was allocating 16 bytes would that be 64 bytes total for the structure and no memory alignment issues?
My corresponding c++ structure
struct Voxel
{
bool data;
glm::vec4 pos;
glm::vec4 col;
float size;
int index;
int pIndex;
int cIdx[8];
};
I'm then filling in my std::vector<Voxel> and passing it to my shader like so
glGenBuffers(1, &octreeSSBO);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, octreeSSBO);
glBufferData(GL_SHADER_STORAGE_BUFFER, voxelData.size()*sizeof(Voxel), voxelData.data(), GL_DYNAMIC_DRAW);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, octreeSSBO);
reading directly from the voxelData vector, I can confirm that the data is getting filled in correctly, and I can even occasionally see that the data is getting passed to the shader but behaving incorrectly compared to what I would expect to see based on the values I'm looking at.
Does it look like there are memory alignment issues here?
I'm not entirely sure what the alignment is
The specification is very clear as to what the base alignment of things are. Your problem is not in item #4 (std430 doesn't do the rounding specified in #4 anyway).
Your problem is in #2:
If the member is a two- or four-component vector with components consuming N basic machine units, the base alignment is 2N or 4N, respectively.
In GLSL, vec4 has a base alignment of 16. That means that any vec4 must be allocated on a 16-byte boundary.
pos must be on a 16-byte boundary. However, data is only 4 bytes. Therefore, 12 bytes of padding must be inserted between data and pos to satisfy std430's alignment requirements.
However, glm::vec4 has a C++ alignment of 4. So the C++ compiler does not insert a bunch of padding between data and pos. Thus, the types in the two languages do not agree.
You should explicitly align all GLM vectors in C++ structs that you want to match GLSL, using C++11's alignas keyword:
struct Voxel
{
bool data;
alignas(16) glm::vec4 pos;
alignas(16) glm::vec4 col;
float size;
int index;
int pIndex;
int cIdx[8];
};
Also, I would not assume that the C++ type bool and the GLSL type bool have the same size.
I've observed some strange behavior. I have an array of unsigned 32bit integers.
I use one integer to encode 4 values, each one byte in size. I'd like to then pass such buffer to vertex shader
layout (location = 0) in uvec4 coords;
In order to achieve this, I use VkVertexInputAttributeDescription with format set to VK_FORMAT_R8G8B8A8_UINT. I have defined such handy struct
struct PackedUVec4{
unsigned char x;
unsigned char y;
unsigned char z;
unsigned char w;
};
Then I build my buffer as PackedUVec4[] and such buffer is then sent to GPU. However, what I have observed, is that the order of bytes gets swapped. For example if I have
layout (location = 0) in uvec4 coords;
void main(){
debugPrintfEXT("%v4d", coords);
}
it seems to print the correct output. But if change format to VK_FORMAT_R32_UINT and try to run
layout (location = 0) in uint coords;
void main(){
uint w = coords & 255u;
uint z = coords/256 & 255u;
uint y = coords/(256*256) & 255u;
uint x = coords/(256*256*256) & 255u;
debugPrintfEXT("%v4d", uvec4(x,y,z,w));
}
I get the bytes in opposite order. Do the vector types use different endianness?
The problem is not with Vulkan, but with your code's interpretation of what's going on. Both sending and receiving.
Recall that endianness is about the (potential) difference between the logical location of a byte within a multi-byte value and the relative address of a byte within a multi-byte value. In little endian, if you write a four-byte value to memory, the first byte will be the least significant byte of the value.
Endianness applies to both reading and writing, but only when reading/writing multi-byte values as multi-byte values. Your PackedUVec4 is not a multi-byte value; it's a struct containing bytes with a specific layout. Therefore, if you write to the x component of a PackedUVec4, you are writing to the first byte of that structure, regardless of your CPU's endian.
When you told Vulkan to read this data as a single 4-byte value (VK_FORMAT_R32_UINT), it does so as defined by your CPU's endian. But your code didn't generate that data in accord with your CPU's endian; it generated it in terms of the layout of a PackedUVec4. So the first byte in memory is x. If the GPU reads those 4 bytes as a little endian 4-byte value, then the first byte will map to the least significant byte of the 4-byte value.
But your code that manually decodes the data is decoding it wrong. It expects the least significant byte to be w.
If you want your code to be endian-independent, then you need the GPU to read the data as 4 individual bytes, in the order stored in memory. Which is what the VK_FORMAT_R8G8B8A8_UINT represents. If you want the GPU to read it as an endian-based ordering within a single 32-bit integer, then it needs to be written that way by the CPU.
I need to access a one dimensional big(~2MB) buffer from a shader. However, I don't know which type of OpenGL buffer I should use. I'm going to store floats(16F) and unsigned integers (16UI). My data will be like an struct:
struct{
float d; //16F
int a[7]; //or a1,a2,a3,a4,a5,a6,a7; //7x16UI
}
I read about buffer texture and other kind of buffers(Passing a list of values to fragment shader), but it only works for one type of data (float or int), not both. I could use two buffers, but I think this won't be cache friendly nor easy.
Using C++ (GCC specifically, should have put this sooner), I'm storing raw texture data in an array of unsigned bytes, in a RGBA format, with 32 bits per pixel (8 bits per color value with Alpha, so on and so forth...). The thing is, I want to write a function that returns the raw data as a array of Colors, where a Color is a struct defined as the following:
struct Color
{
uint8 r;
uint8 g;
uint8 b;
uint8 a;
};
Plus functions and whatnot, but those are the only variables in the struct. My thinking is that since each color is 4 bytes long, what I can somehow cast the raw byte array to a Color array that is 1/4 of the original size (in "length" of array, not in absolute size). I think that reinterpret_cast is what I am looking for, but I cannot find anything after a google search that confirms 100% that you can convert it into an array of structs instead of just one struct.
So I guess I am just asking someone to either confirm that this is indeed possible with reinterpret_cast, or if there is a different cast or way to do this. Thanks.
EDIT: My wording is a little weird, so as an arbitrary example I'd like to somehow cast a array of 16 unsigned bytes into an array of 4 Colors.
EDIT: Also I know it's kind of a little late, but I cant seem to find how to cast a small portion of the array at a specific place to a single struct using a reinterpret_cast, if that is possible, without copying to a smaller array and casting like that. So any help with this problem would also be greatly appreciated.
as an arbitrary example I'd like to somehow cast a array of 16 unsigned bytes into an array of 4 Colors.
Like this:
#pragma pack(push, 1)
struct Color
{
uint8 r;
uint8 g;
uint8 b;
uint8 a;
};
#pragma pack(pop)
uint8 bytearray[16];
...
Color *colorarray = reinterpret_cast<Color*>(bytearray);
Then you can do things like this:
for (int idx = 0; idx < 4; ++idx)
{
Color &c = colorarray[idx];
// use c.r, c.g, c.b, c.a as needed...
}