Wrong alignment for floats array - c++

Im passing uniform buffer to compute shader in vulkan. The buffer contains an array of 49 floating point numbers (gaussian matrix). Everything is fine, but when I read array in the shader, it gives only 13 values, the others are 0 or gunk, and they correspond to 0, 4, 8, etc. of initial array. I think its some kind of alignment problem
Shader layouts are
struct Pixel
{
vec4 value;
};
layout(push_constant) uniform params_t
{
int width;
int height;
} params;
layout(std140, binding = 0) buffer buf
{
Pixel imageData[];
};
layout (binding = 1) uniform sampler2D inputTex;
layout (binding = 2) uniform unf_t
{
float gauss[SAMPLE_SIZE*SAMPLE_SIZE];
};
Could that be binding 0 influencing binding 2? and if so how can I copy array to buffer with needed alignment? Currently I use
vkCmdUpdateBuffer(a_cmdBuff, a_uniform, 0, a_gaussSize, (const uint32_t *)gauss)
or may be better to split on different sets?
Edit: by expanding buffer and array i manage to pass it with alignment of 16 and all great, but it looks like a waste of memory. How can I align floats by 4?

Uniform blocks require that array elements are aligned to vec4 (16 bytes).
To work around this you use a vec4 instead and you can pass 52 floats and then take the correct component based on index/4 and index%4.

Related

opengl glUniform for arrays of arrays (ARB_arrays_of_arrays)

If I have a fragment-shader that looks like this:
#version 450
#define MAX_NUM_LIGHTS 10
#define NUM_CASCADES 6
uniform sampler2D depthMap[NUM_CASCADES][MAX_NUM_LIGHTS];
...
How do I send a value from my c++ program via glUniform... to the shader?
If i had just:
#define MAX_NUM_LIGHTS 10
uniform sampler2D depthMap[MAX_NUM_LIGHTS];
...
I would do this like so:
...
GLint tmp[MAX_NUM_LIGHTS];
for(GLint i = 0; i<MAX_NUM_LIGHTS; i++)
{
tmp[i] = 2+i; // all textures up to GL_TEXTURE1 are already bound.
glActiveTexture(GL_TEXTURE2+i);
glBindTexture(GL_TEXTURE_2D, depthMapID[i]);
}
glUniform1iv(model.depthMap_UniformLocation, MAX_NUM_LIGHTS, tmp);
glUniform1iv does not work for multidimensional arrays and I couldn't find a function that fits here: https://www.khronos.org/registry/OpenGL-Refpages/es2.0/xhtml/glUniform.xml or: https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_arrays_of_arrays.txt
Arrays of arrays in OpenGL work like arrays of structs. This means that each array of array has an individual uniform location, and therefore an individual name. However, once you get down to an array of basic types, it acts like a regular array of basic types: you can pour lots of values into the first location of that array.
In your case, you have 6 uniforms, named "depthMap[0]" through "depthMap[5]". Each of these is a 10-element array.

Only get garbage from Shader Storage Block?

I have bound the shader storage buffer to the shader storage block like so
GLuint index = glGetProgramResourceIndex(myprogram, GL_SHADER_STORAGE_BLOCK, name);
glShaderStorageBlockBinding(myprogram, index, mybindingpoint);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, mybuffer)
glBindBufferRange(GL_SHADER_STORAGE_BUFFER, mybindingpoint, mybuffer, 0, 48);
glBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, 48, &mydata);
mydata points to a std::vector containing 4 glm::vec3 objects.
Because I bound 48 bytes as the buffer range I expect lights[] to hold 48/(4*3) = 4 vec3s.
layout(std430) buffer light {
vec3 lights[];
};
The element at index 1 in my std::vector holds the data x=1.0, y=1.0, z=1.0.
But viewing the output by doing
gl_FragColor = vec4(lights[1], 1.0);
I see yellow (x=1.0, y=1.0, z=0.0) pixels. This is not what I loaded into the buffer.
Can somebody tell me what I am doing wrong?
EDIT
I just changend the shader storage block to
layout(std430) buffer light {
float lights[];
};
and output
gl_FragColor = vec4(lights[3],lights[4],lights[5],1.0);
and it works (white pixels).
If somebody can explain this, that would still be great.
It's because people don't take this simple advice: never use a vec3 in a UBO/SSBO.
The base alignment of a vec3 is 16 bytes. Always. Therefore, when it is arrayed, the array stride (the number of bytes from one element to the next) is always 16. Exactly the same as a vec4.
Yes, std430 layout is different from std140. But it's not that different. Specifically, it only prevents the base alignment and stride of array elements (and base alignment of structs) from being rounded up to that of a vec4. But since the base alignment of vec3 is always equal to that of a vec4, it changes nothing about them. It only affects scalars and vec2's.

Shader storage buffer object with bytes

I am working on a compute shader where the output is written to SSBO.Now,the consumer of this buffer is CUDA which expects it to contain unsigned bytes.I currently can't see find the way how to write a byte per index in SSBO.With texture or image the normalized float to unsigned byte conversion is handled by OpenGL.For example I can attach a texture with internal format R8 and store byte per entry.But nothing like this is possible with SSBO.Does it mean that except of bool data type all the numerical storage types in SSBO can be at least 4 bytes per entry only?
Practically speaking I would like to be able to do the following:
Compute shader:
#version 430 core
layout (local_size_x = 8,local_size_y = 8 ) in;
struct SSBOBlock
{
byte mydata;
};
layout(std430,binding = BUFFER_OUTPUT) writeonly buffer bBuffer
{
SSBOBlock Ouput[];
} Out;
void main()
{
//..... Compute shader stuff...
//.......
Out.Ouput[globalIndex].mydata = val;//where val is normalized float
}
The smallest type exposed on GPUs tends to be 32-bit for scalars. Even the boolean type you mentioned is actually 32-bit. The same goes for languages like C often; a boolean does not need anything more than 1-bit but even so bool is not synonymous with "give me the smallest data type available."
There are intrinsic functions you can use to pack and unpack data types however and I will show an example of how to use them below:
#version 420 core
layout (local_size_x = 8,local_size_y = 8 ) in;
struct SSBOBlock
{
uint mydata;
};
layout(std430,binding = BUFFER_OUTPUT) writeonly buffer bBuffer
{
SSBOBlock Ouput[];
} Out;
void main()
{
//..... Compute shader stuff...
//.......
Out.Output [globalIndex].mydata = packUnorm4x8 (val)
// where val is a 4-component unsigned normalized vector to pack into globalIndex
}
Your sample shader shows an attempt to write a single scalar to a "byte" data type, that is not possible and you are going to have to modify this somehow to work with indices that reference a packed group of 4 scalars. In the worst-case, this might mean unpacking three values and then re-packing the entire thing just to write one scalar.
This intrinsic function is discussed in the extension specification for GL_ARB_shading_languge_packing and is core in GL 4.2 and later.
Even if you were on an implementation that does not support that extension, it is explained in the text of the extension specification exactly what each does. The equivalent operation for packUnorm4x8 is:
uint fixed_val = round(clamp(float_val, 0, +1) * 255.0);
Some bit-shifts will be necessary to properly pack each component, but those are trivial.
I found a way to write unsigned byte data into buffer in compute shader.Buffer texture does the job.It is basically image texture with buffer as storage.This way I can specify image format to be R8 which allows me to store byte size values on each index of the buffer.
GLuint _tbo_buffer,_tbo_tex;
glGenBuffers(1, &_tbo_buffer);
glBindBuffer(GL_TEXTURE_BUFFER, _tbo_buffer);
glBufferData(GL_TEXTURE_BUFFER, SCREEN_WIDTH * SCREEN_HEIGHT, NULL, GL_DYNAMIC_COPY);
glGenTextures(1, &_tbo_tex);
glBindTexture(GL_TEXTURE_BUFFER, _tbo_tex);
//attach the TBO to the texture:
glTexBuffer(GL_TEXTURE_BUFFER, GL_R8, _tbo_buffer);
glBindImageTexture(0, _tbo_tex, 0, GL_FALSE, 0, GL_WRITE_ONLY, GL_R8);
Compute shader:
#version 430 core
layout (local_size_x = 8,local_size_y = 8 ) in;
layout(binding=0) uniform sampler2D TEX_IN;
layout(r8) writeonly uniform imageBuffer mybuffer;
void main(){
vec2 texSize = vec2(textureSize(TEX_IN,0));
vec2 uv = vec2(gl_GlobalInvocationID.xy / texSize);
vec4 tex = texture(TEX_IN,uv);
uint globalIndex = gl_GlobalInvocationID.y * nThreads.x + gl_GlobalInvocationID.x;
//store only r:
imageStore(mybuffer,int(globalIndex),vec4(0.5,0,0,0));
}
Then we can read byte by byte on CPU or map to CUDA buffer resource:
GLubyte* ptr = (GLubyte*)glMapBuffer(GL_TEXTURE_BUFFER, GL_READ_ONLY);

OpenGL - Calling glBindBufferBase with index = 1 breaks rendering (Pitch black)

There's an array of uniform blocks in my shader which is defined as such:
layout (std140) uniform LightSourceBlock
{
int shadowMapID;
int type;
vec3 position;
vec4 color;
float dist;
vec3 direction;
float cutoffOuter;
float cutoffInner;
float attenuation;
} LightSources[12];
To be able to bind my buffer objects to each LightSource, I've bound each uniform to a uniform block index:
for(unsigned int i=0;i<12;i++)
glUniformBlockBinding(program,locLightSourceBlock[i],i); // locLightSourceBlock contains the locations of each element in LightSources[]
When rendering, I'm binding my buffers to the respective index using:
glBindBufferBase(GL_UNIFORM_BUFFER,i,buffer);
This works fine, as long as I only bind a single buffer to the binding index 0. As soon as there's more, everything is pitch black (Even things that use entirely different shaders). (glGetError returns no errors)
If I change the block indices range from 0-11 to 2-13 (Skipping index 1), everything works as it should. I figured if I use index 1, I'm overwriting something, but I don't have any other uniform blocks in my shader, and I'm not using glUniformBlockBinding or glBindBufferBase anywhere else in my code, so I'm not sure.
What could be causing such behavior? Is the index 1 reserved for something?
1) Dont use multiple blocks. Use one block with array. Something like this:
struct Light{
...
}
layout(std430, binding=0) uniform lightBuffer{
Light lights[42];
}
skip glUniformBlockBinding and only glBindBufferBase to index specified in shader
2) Read up on alignment for std140, std430. In short, buffer variable are aligned so they dont cross 128bit boundaries. So in your case position would start at byte 16 (not 8). This results in mismatch of CPU/GPU side access. (Reorder variables or add padding)

How to interpret the meaning of glGetActiveUniformBlockiv with GL_UNIFORM_BLOCK_DATA_SIZE

Suppose I have the following vertex shader code:
#version 330
uniform mat4 ProjectionMatrix, CameraMatrix, SingleModelMatrix;
uniform uint SingleModel;
layout (std140) uniform ModelBlock {
mat4 ModelMatrices[128];
};
void main(void) {
... something that uses ModelMatrices
}
If in my program I issue the following OpenGL call for a program object representing the shader above:
getUniformBlockParameter(GL_UNIFORM_BLOCK_DATA_SIZE)
I find out the following with an Intel HD Graphics 4000 card:
GLUniformBlock[
name: ModelBlock,
blockSize: 8192,
buffer: java.nio.HeapByteBuffer[pos=0 lim=8192 cap=8192],
metadata: {
ModelMatrices=GLUniformBlockAttributeMetadata[
name: ModelMatrices,
offset: 0,
arrayStride: 64,
matrixStride: 16,
matrixOrder: 0
]},
uniforms: GLUniformInterface[
uniforms: {
ModelMatrices=GLUFloatMat4 [
name:ModelMatrices,
idx:2,
loc:-1,
size:128,
glType:8b5c]
}
]
]
How should I interpret the blockSize parameter? The OpenGL 3 sdk docs state that:
If pname is GL_UNIFORM_BLOCK_DATA_SIZE, the implementation-dependent minimum total buffer object size, in basic machine units, required to hold all active uniforms in the uniform block identified by uniformBlockIndex is returned. It is neither guaranteed nor expected that a given implementation will arrange uniform values as tightly packed in a buffer object. The exception to this is the std140 uniform block layout, which guarantees specific packing behavior and does not require the application to query for offsets and strides. In this case the minimum size may still be queried, even though it is determined in advance based only on the uniform block declaration.
So in this case how should I measure that 8192 number in blockSize? In floats? In bytes?
If I do the numbers, a 4x4 matrix mat4 uniform has a total of 16 float components, which fit into 64 bytes, so at the very least I need about 16 x 4 x 128 bytes to store 128 matrices which is indeed, 8192.
Why then is the hardware also asking for 64 (bytes?) for array stride and 16 (bytes?) of matrix stride, yet requesting only 8192 bytes? Shouldn't getUniformBlockParameter(GL_UNIFORM_BLOCK_DATA_SIZE) have requested more space for alignment/padding purposes?
The array stride is the byte offset from the start of one array element to the start of the next. The array elements in this case are mat4s, which are 64 bytes in size. So the array stride is as small as it can be.
The matrix stride is the byte offset from one column/row of matrix data to the next. Each mat4 has a column or row size of a vec4, which means it is 16 bytes in size. So again, the matrix columns/rows are tightly packed.
So 8KiB is the expected size of such storage.
That being said, this is entirely irrelevant. You shouldn't be bothering to query this stuff at all. By using std140 layout, you have already forced OpenGL to adopt a specific layout. One that notably requires mat4s to have an array stride of 64 bytes and to have a matrix stride of 16 bytes. You don't have to ask the implementation for this; it's required by the standard.
You only need to ask if you're using shared or packed.