Shader storage block name issue - c++

Something weird is happening with my shader storage blocks.
I have 2 SSBs:
#version 450 core
out vec4 out_color;
layout (binding = 0, std430) buffer A_SSB
{
float a_data[];
};
layout (binding = 1, std430) buffer B_SSB
{
float b_data[];
};
void main()
{
a_data[0] = 0.0f;
a_data[1] = 1.0f;
a_data[2] = 2.0f;
a_data[3] = 3.0f;
b_data[0] = 90.0f;
b_data[1] = 81.0f;
b_data[2] = 72.0f;
b_data[3] = 63.0f;
out_color = vec4(0.0f, 0.8f, 1.0f, 1.0f);
}
This is working well, but if i swap the SSB names like that:
layout (binding = 0, std430) buffer B_SSB
{
float a_data[];
};
layout (binding = 1, std430) buffer A_SSB
{
float b_data[];
};
the SSB indexes are swapped although they are hardcoded and data which should be written to a_data is written to b_data and vice versa.
Both SSBs are 250MB large, the max size is more than 2GB. It seems that the indexes are allocated alphabetical but this shouldn't happen. I'm binding the buffers like that:
glCreateBuffers(1, &a_ssb);
glNamedBufferStorage(a_ssb, 7187400 * 9 * sizeof(float), nullptr, GL_MAP_READ_BIT);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, a_ssb);
glShaderStorageBlockBinding(test_prog, 0, 0);
glCreateBuffers(1, &b_ssb);
glNamedBufferStorage(b_ssb, 7187400 * 9 * sizeof(float), nullptr, GL_MAP_READ_BIT);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, b_ssb);
glShaderStorageBlockBinding(test_prog, 1, 1);
Is this a bug or my fault? Also i would like to ask why i'm getting the error "lvalue in array access too complex or possible array index out of bounds" if i'm assigning values in a for loop?
for(unsigned int i = 0; i < 4; ++i)
a_data[i] = float(i);

glShaderStorageBlockBinding(test_prog, 0, 0);
This is your problem.
You assigned the binding index in the shader. You do not need to assign it again.
Your problem comes from the fact that you assigned it incorrectly.
The second parameter to this function is the index of the block you are assigning a binding index to. The only way to get a correct index is to query it via Program Introspection APIs. The block index is the resource index, queried through this call:
auto block_index = glGetProgramResourceIndex​(test_prog, GL_SHADER_STORAGE_BLOCK, "A_SSB");
It just so happened, in your original code, that the shader compiler assigned A_SSB's resource index to 0 and B_SSB's resource index to 1. This assignment was probably arbitrarily done based on their names. Thus, when you changed the names on them, the resource indices didn't change. So A_SSB was still resource index 0, but your shader assigned it binding index 1. Which was fine...
Until your C++ code overrode that assignment with your glShaderStorageBlockBinding(test_prog, 0, 0). That assigned resource index 0 (A_SSB) to binding index 0.
You should either set the binding index in the shader or in C++ code. Not in both.

Related

OpenGL how to get offset of array element in shared layout uniform block?

I have a shared layout uniform block in shader:
layout(shared) uniform TestBlock
{
int test[5];
};
How to get offset of test[3]?
When I try to use glGetUniformIndices to get index of test[3], it will return the same number of test[0]'s index.
So I cannot use glGetActiveUniformsiv to get offset of index of test[3].
Then, how to get offset of test[3]?
(Note that I don't want to use layout std140.)
Arrays of basic types like ints are treated as a single value. You can't get the offset of an individual element in the array. You can however query the array stride, the number of bytes from one element in the array to the next. Then you can just do the multiplication.
Using the new program introspection API:
auto ix = glGetProgramResourceIndex(prog, GL_UNIFORM, "TestBlock.test");
GLenum props[] = {GL_ARRAY_STRIDE, GL_OFFSET};
GLint values[2] = {};
glGetProgramResourceiv(prog, GL_UNIFORM, ix, 2, &props, 2, NULL, &values);
auto byteOffset = values[1] + (3 * values[0]);

Update SSBO in Compute shader

I am currently trying to update a SSBO linked/bound to a Computeshader. Doing it this way, I only write the first 32byte into the out_picture, because i only memcpy that many (sizeof(pstruct)).
Computeshader:
#version 440 core
struct Pstruct{
float picture[1920*1080*3];
float factor;
};
layout(std430, binding = 0) buffer Result{
float out_picture[];
};
layout(std430, binding = 1) buffer In_p1{
Pstruct in_p1;
};
layout(local_size_x = 1000) in;
void main() {
out_picture[gl_GlobalInvocationID.x] = out_picture[gl_GlobalInvocationID.x] +
in_p1.picture[gl_GlobalInvocationID.x] * in_p1.factor;
}
GLSL:
struct Pstruct{
std::vector<float> picture;
float factor;
};
Pstruct tmp;
tmp.factor = 1.0f;
for(int i = 0; i < getNUM_PIX(); i++){
tmp.picture.push_back(5.0f);
}
SSBO ssbo;
glGenBuffers(1, &ssbo.handle);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 1, ssbo.handle);
glBufferData(GL_SHADER_STORAGE_BUFFER, (getNUM_PIX() + 1) * sizeof(float), NULL, GL_DYNAMIC_DRAW);
...
glBindBuffer(GL_SHADER_STORAGE_BUFFER, ssbo.handle);
Pstruct* ptr = (Pstruct *) glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_WRITE_ONLY);
memcpy(ptr, &pstruct, sizeof(pstruct));
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
...
glUseProgram(program);
glDispatchCompute(getNUM_PIX() / getWORK_GROUP_SIZE(), 1, 1);
glMemoryBarrier(GL_SHADER_STORAGE_BARRIER_BIT);
How can I copy both my picture array and my float factor at the same time?
Do I have to split the memcpy call into array and float? and if yes how? I can copy the first part, but I am not allowed to add an offset to the ptr.
First of all,
float picture[1920*1080*3];
clearly should be either a texture (you're only reading from it anyway) or at least an image.
Second:
struct Pstruct{
std::vector<float> picture;
float factor;
};
This definition does not match the definition in your shader in any way. The std::vector object will just be a meta object internally managing the data storage used by the vector. memcpy that to a GL buffer and passing that to the GPU does not make sense at all.
The correct approach would be to either copy the contents of that vector separately into the appropriate places inside the buffer, or to just us a struct definition on your client side which actually matches the one you're using in the shader (and taking all the rules of std430 into account). But, as my first point already was, the correct solution here is most likely to use a texture or image object instead.

Find the maximum float in the array

I have a compute shader program which looks for the maximum value in the float array. it uses reduction (compare two values and save the bigger one to the output buffer).
Now I am not quite sure how to run this program from the Java code (using jogamp). In the display() method I run the program once (every time with the halved array in the input SSBO = result from previous iteration) and finish this when the array with results has only one item - the maximum.
Is this the correct method? Every time in the display() method creating and binding input and output SSBO, running the shader program and then check how many items was returned?
Java code:
FloatBuffer inBuffer = Buffers.newDirectFloatBuffer(array);
gl.glBindBuffer(GL3ES3.GL_SHADER_STORAGE_BUFFER, buffersNames.get(1));
gl.glBufferData(GL3ES3.GL_SHADER_STORAGE_BUFFER, itemsCount * Buffers.SIZEOF_FLOAT, inBuffer,
GL3ES3.GL_STREAM_DRAW);
gl.glBindBufferBase(GL3ES3.GL_SHADER_STORAGE_BUFFER, 1, buffersNames.get(1));
gl.glDispatchComputeGroupSizeARB(groupsCount, 1, 1, groupSize, 1, 1);
gl.glMemoryBarrier(GL3ES3.GL_SHADER_STORAGE_BARRIER_BIT);
ByteBuffer output = gl.glMapNamedBuffer(buffersNames.get(1), GL3ES3.GL_READ_ONLY);
Shader code:
#version 430
#extension GL_ARB_compute_variable_group_size : enable
layout (local_size_variable) in;
layout(std430, binding = 1) buffer MyData {
vec4 elements[];
} data;
void main() {
uint index = gl_GlobalInvocationID.x;
float n1 = data.elements[index].x;
float n2 = data.elements[index].y;
float n3 = data.elements[index].z;
float n4 = data.elements[index].w;
data.elements[index].x = max(max(n1, n2), max(n3, n4));
}

Only first Compute Shader array element appears updated

Trying to send an array of integer to a compute shader, sets an arbitrary value to each integer and then reads back on CPU/HOST. The problem is that only the first element of my array gets updated. My array is initialized with all elements = 5 in the CPU, then I try to sets all the values to 2 in the Compute Shader:
C++ Code:
this->numOfElements = std::vector<int> numOfElements; //num of elements for each voxel
//Set the reset grid program as current program
glUseProgram(this->resetGridProgHandle);
//Binds and fill the buffer
glBindBuffer(GL_SHADER_STORAGE_BUFFER, this->counterBufferHandle);
glBufferData(GL_SHADER_STORAGE_BUFFER, sizeof(int) * numOfVoxels, this->numOfElements.data(), GL_DYNAMIC_DRAW);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, this->counterBufferHandle);
//Flag used in the buffer map function
GLint bufMask = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT;
//calc maximum size for workgroups
//glGetIntegerv(GL_MAX_COMPUTE_WORK_GROUP_SIZE, &result);
//Executes the compute shader
glDispatchCompute(32, 1, 1); //
glMemoryBarrier(GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT);
//Gets a pointer to the returned data
int* returnArray = (int *)glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_READ_WRITE);
//Free the buffer mapping
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
Shader:
#version 430
layout (local_size_x = 32) in;
layout(binding = 0) buffer SSBO{
int counter[];
};
void main(){
counter[gl_WorkGroupID.x * gl_WorkGroupSize.x + gl_LocalInvocationID.x] = 2;
}
If I print returnArray[0] it's 2 (correct), but any index > 0 gives me 5, which is the initial value initialized in host.
glMemoryBarrier(GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT);
//Gets a pointer to the returned data
int* returnArray = (int *)glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_READ_WRITE);
The bit you use for glMemoryBarrier represents the way you want to read the data written by the shader. GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT says "I'm going to read this written data by using the buffer for vertex attribute arrays". In reality, you are going to read the buffer by mapping it.
So you should use the proper barrier bit:
glMemoryBarrier(GL_BUFFER_UPDATE_BARRIER_BIT);
Ok i had this problem as well
i changed:
layout(binding = 0) buffer SSBO{
to:
layout(binding = 0, std430) buffer SSBO{

Why is DirectX skipping every second shader call?

I get a bit frustrating trying to figure out why I have to call a DirectX 11 shader twice to see the desired result.
Here's my current state:
I have a 3d object built from vertex and index buffer. This object is then instanced several times. Later, there will be a lot more than just one object, but for now, I'm testing it with this one only. In my render routine, I iterate over all instances, change the world matrices (so that all object instances are put together and form "one big, whole object") and call the shader method to render the data to the screen.
Here's the code so far, that doesn't work:
m_pLevel->Simulate(0.1f);
std::list<CLevelElementInstance*>& lst = m_pLevel->GetInstances();
float x = -(*lst.begin())->GetPosition().x, y = -(*lst.begin())->GetPosition().y, z = -(*lst.begin())->GetPosition().z;
int i = 0;
for (std::list<CLevelElementInstance*>::iterator it = lst.begin(); it != lst.end(); it++)
{
// Extract base element from current instance
CLevelElement* elem = (*it)->GetBaseElement();
// Write vertex and index buffer to video memory
elem->Render(m_pDirect3D->GetDeviceContext());
// Call shader
m_pTextureShader->Render(m_pDirect3D->GetDeviceContext(), elem->GetIndexCount(), XMMatrixTranslation(x, y, z + (i * 8)), viewMatrix, projectionMatrix, elem->GetTexture());
++i;
}
My std::list consists of 4 3d objects, which are all the same. They only differ in their position in 3d space. All of the objects are 8.0f x 8.0f x 8.0f, so for simplicity I just line them up. (As can be seen on the shader render line, where I just add 8 units to the Z dimension)
The result is the following: I only see two elements rendered on the screen. And between them, there's an empty space the size of an element.
At first, I thought I did some errors with the 3d math, but after a lot of time spent debugging my code, I could not find any errors.
And here is now the confusing part:
If I change the content of the for-loop and add another call to the shader, I suddenly see all four elements; and they're all on their correct positions in 3d space:
m_pLevel->Simulate(0.1f);
std::list<CLevelElementInstance*>& lst = m_pLevel->GetInstances();
float x = -(*lst.begin())->GetPosition().x, y = -(*lst.begin())->GetPosition().y, z = -(*lst.begin())->GetPosition().z;
int i = 0;
for (std::list<CLevelElementInstance*>::iterator it = lst.begin(); it != lst.end(); it++)
{
// Extract base element from current instance
CLevelElement* elem = (*it)->GetBaseElement();
// Write vertex and index buffer to video memory
elem->Render(m_pDirect3D->GetDeviceContext());
// Call shader
m_pTextureShader->Render(m_pDirect3D->GetDeviceContext(), elem->GetIndexCount(), XMMatrixTranslation(x, y, z + (i * 8)), viewMatrix, projectionMatrix, elem->GetTexture());
// Call shader a second time - this seems to have no effect but to allow the next iteration to perform it's shader rendering...
m_pTextureShader->Render(m_pDirect3D->GetDeviceContext(), elem->GetIndexCount(), XMMatrixTranslation(x, y, z + (i * 8)), viewMatrix, projectionMatrix, elem->GetTexture());
++i;
}
Does anybody has an idea of what is going on here?
If it helps, here's the code of the shader:
bool CTextureShader::Render(ID3D11DeviceContext* _pDeviceContext, const int _IndexCount, XMMATRIX& _pWorldMatrix, XMMATRIX& _pViewMatrix, XMMATRIX& _pProjectionMatrix, ID3D11ShaderResourceView* _pTexture)
{
bool result = SetShaderParameters(_pDeviceContext, _pWorldMatrix, _pViewMatrix, _pProjectionMatrix, _pTexture);
if (!result)
return false;
RenderShader(_pDeviceContext, _IndexCount);
return true;
}
bool CTextureShader::SetShaderParameters(ID3D11DeviceContext* _pDeviceContext, XMMATRIX& _WorldMatrix, XMMATRIX& _ViewMatrix, XMMATRIX& _ProjectionMatrix, ID3D11ShaderResourceView* _pTexture)
{
HRESULT result;
D3D11_MAPPED_SUBRESOURCE mappedResource;
MatrixBufferType* dataPtr;
unsigned int bufferNumber;
_WorldMatrix = XMMatrixTranspose(_WorldMatrix);
_ViewMatrix = XMMatrixTranspose(_ViewMatrix);
_ProjectionMatrix = XMMatrixTranspose(_ProjectionMatrix);
result = _pDeviceContext->Map(m_pMatrixBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
if (FAILED(result))
return false;
dataPtr = (MatrixBufferType*)mappedResource.pData;
dataPtr->world = _WorldMatrix;
dataPtr->view = _ViewMatrix;
dataPtr->projection = _ProjectionMatrix;
_pDeviceContext->Unmap(m_pMatrixBuffer, 0);
bufferNumber = 0;
_pDeviceContext->VSSetConstantBuffers(bufferNumber, 1, &m_pMatrixBuffer);
_pDeviceContext->PSSetShaderResources(0, 1, &_pTexture);
return true;
}
void CTextureShader::RenderShader(ID3D11DeviceContext* _pDeviceContext, const int _IndexCount)
{
_pDeviceContext->IASetInputLayout(m_pLayout);
_pDeviceContext->VSSetShader(m_pVertexShader, NULL, 0);
_pDeviceContext->PSSetShader(m_pPixelShader, NULL, 0);
_pDeviceContext->PSSetSamplers(0, 1, &m_pSampleState);
_pDeviceContext->DrawIndexed(_IndexCount, 0, 0);
}
If it helps, I can also post the code from the shaders here.
Any help would be appreciated - I'm totally stuck here :-(
The problem is that you are transposing your data every frame, so it's only "right" every other frame:
_WorldMatrix = XMMatrixTranspose(_WorldMatrix);
_ViewMatrix = XMMatrixTranspose(_ViewMatrix);
_ProjectionMatrix = XMMatrixTranspose(_ProjectionMatrix);
Instead, you should be doing something like:
XMMATRIX worldMatrix = XMMatrixTranspose(_WorldMatrix);
XMMATRIX viewMatrix = XMMatrixTranspose(_ViewMatrix);
XMMATRIX projectionMatrix = XMMatrixTranspose(_ProjectionMatrix);
result = _pDeviceContext->Map(m_pMatrixBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
if (FAILED(result))
return false;
dataPtr = (MatrixBufferType*)mappedResource.pData;
dataPtr->world = worldMatrix;
dataPtr->view = viewMatrix;
dataPtr->projection = projectionMatrix;
_pDeviceContext->Unmap(m_pMatrixBuffer, 0);
I wonder if the problem is that the temporary created by the call to XMMatrixTranslation(x, y, z + (i * 8)) is being passed into a function by reference, and then passed into another function by reference where it is modified.
My knowledge of the c++ spec isn't complete enough for me to tell whether that's undefined behaviour or not (but I know that there are tricky rules in this area - assigning a temporary to a non-const temporary is not supported, for example). Regardless, it's close enough to being suspicious that even if it is well defined c++ it still might be a dusty corner that could trip up a non-compliant compiler.
To rule out this possibility, try doing it like:
XMMatrix worldMatrix = XMMatrixTranslation(x, y, z + (i * 8));
m_pTextureShader->Render(m_pDirect3D->GetDeviceContext(), elem->GetIndexCount(), worldMatrix, viewMatrix, projectionMatrix, elem->GetTexture());