GLSL - unable to access second index of a SSBO array for multiple lights - c++

In my application I add two lights. One at (0,0,2) and the second one at (2,0,0). Here's what I get (the x,y,z axes are represented respectively by the red, green & blue lines):
Notice how only the first light is working and the second is not. I made my application core-profile compliant to inspect the buffers with various tools like RenderDoc and NSight and both show me that the second light's data is present in the buffer (picture taken while running Nsight):
The positions seem to be correctly transfered to the gpu memory buffer. Here's the implementation of my fragment shader that uses a SSBO to handle multiple lights in my application:
#version 430
struct Light {
vec3 position;
vec3 color;
float intensity;
float attenuation;
float radius;
};
layout (std140, binding = 0) uniform CameraInfo {
mat4 ProjectionView;
vec3 eye;
};
layout (std430, binding = 1) readonly buffer LightsData {
Light lights[];
};
uniform vec3 ambient_light_color;
uniform float ambient_light_intensity;
in vec3 ex_FragPos;
in vec4 ex_Color;
in vec3 ex_Normal;
out vec4 out_Color;
void main(void)
{
// Basic ambient light
vec3 ambient_light = ambient_light_color * ambient_light_intensity;
int i;
vec3 diffuse = vec3(0.0,0.0,0.0);
vec3 specular = vec3(0.0,0.0,0.0);
for (i = 0; i < lights.length(); ++i) {
Light wLight = lights[i];
// Basic diffuse light
vec3 norm = normalize(ex_Normal); // in this project the normals are all normalized anyway...
vec3 lightDir = normalize(wLight.position - ex_FragPos);
float diff = max(dot(norm, lightDir), 0.0);
diffuse += diff * wLight.color;
// Basic specular light
vec3 viewDir = normalize(eye - ex_FragPos);
vec3 reflectDir = reflect(-lightDir, norm);
float spec = pow(max(dot(viewDir, reflectDir), 0.0), 32);
specular += wLight.intensity * spec * wLight.color;
}
out_Color = ex_Color * vec4(specular + diffuse + ambient_light,1.0);
}
Note that I've read the section 7.6.2.2 of the OpenGL 4.5 spec and that, if I understood correctly, my alignment should follow the size of the biggest member of my struct, which is a vec3 and my struct size is 36 bytes so everything should be fine here. I also tried different std version (e.g. std140) and adding some padding, but nothing fixes the issue with the second light. In my C++ code, I have those definitions to add the lights in my application:
light_module.h/.cc:
struct Light {
glm::f32vec3 position;
glm::f32vec3 color;
float intensity;
float attenuation;
float radius;
};
...
constexpr GLuint LIGHTS_SSBO_BINDING_POINT = 1U;
std::vector<Light> _Lights;
...
void AddLight(const Light &light) {
// Add to _Lights
_Lights.push_back(light);
UpdateSSBOBlockData(
LIGHTS_SSBO_BINDING_POINT, _Lights.size()* sizeof(Light),
static_cast<void*>(_Lights.data()), GL_DYNAMIC_DRAW);
}
shader_module.h/.cc:
using SSBOCapacity = GLuint;
using BindingPoint = GLuint;
using ID = GLuint;
std::map<BindingPoint, std::pair<ID, SSBOCapacity> > SSBO_list;
...
void UpdateSSBOBlockData(GLuint a_unBindingPoint,
GLuint a_unSSBOSize, void* a_pData, GLenum a_eUsage) {
auto SSBO = SSBO_list.find(a_unBindingPoint);
if (SSBO != SSBO_list.end()) {
GLuint unSSBOID = SSBO->second.first;
glBindBuffer(GL_SHADER_STORAGE_BUFFER, unSSBOID);
glBufferData(GL_SHADER_STORAGE_BUFFER, a_unSSBOSize, a_pData, a_eUsage);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0); //unbind
}
else
// error handling...
}
Basically, I'm trying to update/reallocate the SSBO size with glBufferData each time a light is added in my app.
Now, since I'm having issues processing the second light data, I changed my fragment shader code to only execute the second light in my SSBO array by forcing i = 1 and looping until i < 2, but I get the following errors:
(50) : error C1068: ... or possible array index out of bounds
(50) : error C5025: lvalue in field access too complex
(56) : error C1068: ... or possible array index out of bounds
(56) : error C5025: lvalue in field access too complex
Lines 50 and 56 refer to diffuse += diff * wLight.color; and specular += wLight.intensity * spec * wLight.color; respectively. Is there really an out of bounds access even if I add my lights before the first draw call? Why is the shader compiling correctly when I'm using lights.length() instead of 2?
Finally, I've added a simple if (i == 1) in my for-loop to see if lights.length() is equal to 2, but it doesn't go in it. Yet the initial size of my buffer is 0 and then I add a light that sets the buffer size to 36 bytes and we can see that the first light works fine. Why is the update/reallocate not working the second time?

So what I did was to add some padding at the end of the declaration of my struct on the C++ side only. The padding required was float[3] or 12 bytes, which sums up to 48 bytes. I'm still not sure why this is required, since the specifications state (as highlighted in this post)
If the member is a structure, the base alignment of the structure is N, where N is the largest base alignment value of any of its
members, and rounded up to the base alignment of a vec4. The
individual members of this sub-structure are then assigned offsets by
applying this set of rules recursively, where the base offset of the
first member of the sub-structure is equal to the aligned offset of
the structure. The structure may have padding at the end; the base
offset of the member following the sub-structure is rounded up to the
next multiple of the base alignment of the structure.
[...]
When using the std430 storage layout, shader storage blocks will be
laid out in buffer storage identically to uniform and shader storage
blocks using the std140 layout, except that the base alignment and
stride of arrays of scalars and vectors in rule 4 and of structures in
rule 9 are not rounded up a multiple of the base alignment of a vec4.
My guess is that structures such as vec3 and glm::f32vec3 defined by glm are recursively rounded up to vec4 when using std430 and therefore my struct must follow the alignment of a vec4. If anyone can confirm this, it would be interesting since the linked post above deals with vec4 directly and not vec3.
Picture with both lights working :
EDIT:
After more investigation, it turns out that the last 3 fields of the Light struct (intensity, attenuation and radius) were not usable. I fixed this by changing the position and color from glm::f32vec3 to glm::vec4 instead. More information can be found in a similar post. I also left a single float for padding, because of the alignment mentioned earlier.

Related

Fragment shader failing when accessing higher array indexed input component

I'm trying to access an array of structs in a glsl fragment shader passed from a geometry shader, and I'm having odd failures with any attempt to access certain indices over an arbitrary number. I've already discovered you apparently can't access an array by variable in a fragment shader, but it fails even using an int literal.
I've reduced the shader down a simplified version that shows the problem. This receives a series of quads (triangle strips) from the geometry shader and renders red rectangles, with a smaller sub-rectangle in the middle being rendered yellow if the relative fragment falls into an arbitrary bounding box passed along from the vertex.
#version 400 core
struct SpriteLayer {
ivec2 texPos;
ivec2 offset;
ivec2 dim;
int texNum;
uint flags;
uint color;
};
in vec2 lmUV;
flat in int frag_numLayers;
flat in uint frag_totalColor;
in vec2 pixel;
flat in SpriteLayer spriteLayer[6];
out vec4 color;
uniform sampler2D texLightmap;
uniform sampler2D texBase[16];
uniform int numTextures;
uniform vec4 ambientLight;
uniform vec2 randomSeed;
void main() {
color = vec4(1f, 0f, 0f, 1f);
SpriteLayer lay = spriteLayer[1]; // works
//SpriteLayer lay = spriteLayer[5]; // fails if preceding line replaced with this
if (
(pixel.x >= lay.offset.x) &&
(pixel.x < lay.offset.x + lay.dim.x) &&
(pixel.y >= lay.offset.y) &&
(pixel.y < lay.offset.y + lay.dim.y)
) {
color.g = 1f;
}
}
I checked GL_MAX_FRAGMENT_INPUT_COMPONENTS and my environment reports 128. Unless I'm very mistaken in how that works, I only count around 60 total input components here? Just to check I also tried padding the struct out, but no change. Despite being an array of 6 structs, when I access elements spriteLayer[0] through [3], the shader renders, but when I change the code to access spriteLayer[5] it fails, with no compilation error and no rendering whatsoever, with my program unable to set the uniforms. Trying to access spriteLayer[4] sometimes succeeds and sometimes fails depending on how I restructure the code, yet this quirk doesn't seem dependent on the number of input components at all (!?). Same problem with direct access to the array without the intermediary SpriteLayer variable. What's going on here?

How to instance draw with different transformations for multiple objects

Im having a little problem with glDrawArraysInstanced().
Right now Im trying to draw a chess board with pieces.
I have all the models loaded in properly.
Ive tried drawing pawns only with instance drawing and it worked. I would send an array with transformation vec3s to shader through a uniform and move throught the array with gl_InstanceID
That would be done with this for loop (individual draw call for each model):
for (auto& i : this->models) {
i->draw(this->shaders[0], count);
}
which eventually leads to:
glDrawArraysInstanced(GL_TRIANGLES, 0, vertices.size(), count);
where the vertex shader is:
#version 460
layout(location = 0) in vec3 vertex_pos;
layout(location = 1) in vec2 vertex_texcoord;
layout(location = 2) in vec3 vertex_normal;
out vec3 vs_pos;
out vec2 vs_texcoord;
out vec3 vs_normal;
flat out int InstanceID;
uniform mat4 modelMatrix;
uniform mat4 viewMatrix;
uniform mat4 projectionMatrix;
uniform vec3 offsets[16];
void main(void){
vec3 offset = offsets[gl_InstanceID]; //saving transformation in the offset
InstanceID = gl_InstanceID; //unimportant
vs_pos = vec4(modelMatrix * vec4(vertex_pos + offset, 1.f)).xyz; //using the offset
vs_texcoord = vec2(vertex_texcoord.x,1.f-vertex_texcoord.y);
vs_normal = mat3(transpose(inverse(modelMatrix))) * vertex_normal;
gl_Position = projectionMatrix * viewMatrix * modelMatrix * vec4(vertex_pos + offset,1.f); //using the offset
}
Now my problem is that I dont know how to draw multiple objects in this way and change their transformations since gl_InstanceID starts from 0 on each draw call and thus my array with transformations would be used again from the beggining (which would just draw next pieces on pawns positions).
Any help will be appreciated.
You've got two problems. Or rather, you have one problem, but the natural solution will create a second problem for you.
The natural solution to your problem is to use one of the base-instance rendering functions, like glDrawElementsInstancedBaseInstance. These allow you to specify a starting instance for your instanced rendering calls.
This will precipitate a second problem: gl_InstanceID does not respect the base instance. It will always be on the range [0, instancecount). Only instance arrays respect the base instance. So instead of using a uniform to provide your per-instance data, you must use instance array rendering. This means storing the per-instance data in a buffer object (which you should have done anyway) and accessing it via a VS input whose VAO specifies that the particular attribute is instanced.
This also has the advantage of not restricting your instance count to uniform limitations.
OpenGL 4.6/ARB_shader_draw_parameters allows access to the gl_BaseInstance vertex shader input, which provides the baseinstance value specified by the draw command. So if you don't want to/can't use instanced arrays (for example, the amount of per-instance data is too big for the attribute limitations), you will have to rely on that extension/4.6 functionality. Recent desktop GL drivers offer this functionality, so if your hardware is decently new, you should be able to use it.

Batched rendering with Uniform Buffer Object

I'm changing my basic renderer to make it a batched renderer to improve performance.
I've successfully changed my vertex data to draw all my meshes in a single batch, but when I want to handle multiple materials a problem arises.
Obviously I can't use normal uniforms since I'm batching a lot of meshes, so I thought about using Uniform Buffer Object to store all the materials data.
The problem is, how can I update this buffer?
If I setup my UBO like this:
layout(std140) uniform MATERIAL
{
vec4 Color[20];
float Specular[20];
float Roughness[20];
float Metallic[20];
float ReflectionIntensity[20];
};
I don't think that then, when I submit the data of the single material, I can use a starting offset calculated based on the material index and submit the data like this:
struct Material {
float color[4];
float specular;
float roughness;
float metallic;
float reflectionIntensity;
};
int bufferStride = 32; // 16 (vec4) + 4 (float) + 4 (float) + 4 (float) + 4 (float)
int offset = bufferStride * updatingMaterialIndex;
Material m;
m.color...
glBufferSubData(GL_UNIFORM_BUFFER, offset, &m, sizeof(Material));
because I presume OpenGL will have 20 vec4 followed by 80 float in memory.
So how could I do this? Do I have to calculate the offset of every element singularly?
Also, how should I index the material used by a mesh? Should I pass the material index as a vertex attribute?
Your initial suggestion was a struct containing arrays. So why not instead do the opposite: an array containing structs?
struct Material
{
vec4 Color;
float Specular;
float Roughness;
float Metallic;
float ReflectionIntensity;
}
layout(std140) uniform Materials
{
Material mtls[20];
};
You would then build a similar array on the CPU.
Now to do this, you must make sure to follow std140's alignment rules. Particularly important is the alignment of structures and arrays. Since Material has a vec4 in it, its base alignment is 16. And the stride between elements of an array must be multiples of the base alignment of the array element.
Therefore, mtls[20] will be 32 * 20 bytes in size. Even if you removed a single float from Material, it will still be that size.
If you really wanted to keep the struct-of-array approach, you would simply need to use a corresponding data structure on the CPU.

Simple curiosity about relation between texture mapping and shader program using Opengl/GLSL

I'm working on a small homemade 3D engine and more precisely on rendering optimization. Until here I developped a sort algorithm whose goal is to gather a maximum of geometry (meshes) which have in common the same material properties and same shader program into batches. This way I minimize the state changes (glBindXXX) and draw calls (glDrawXXX). So, if I have a scene composed by 10 boxes, all sharing the same texture and need to be rendered with the same shader program (for example including ADS lighting) so all the vertices of these meshes will be merged into a unique VBO, the texture will be bind just one time and one simple draw call only will be needed.
Scene description:
- 10 meshes (boxes) mapped with 'texture_1'
Pseudo-code (render):
shaderProgram_1->Bind()
{
glActiveTexture(texture_1)
DrawCall(render 10 meshes with 'texture_1')
}
But now I want to be sure one thing: Let's assume our scene is always composed by the same 10 boxes but this time 5 of them will be mapped with a different texture (not multi-texturing, just simple texture mapping).
Scene description:
- 5 boxes with 'texture_1'
- 5 boxes with 'texture_2'
Pseudo-code (render):
shaderProgram_1->Bind()
{
glActiveTexture(texture_1)
DrawCall(render 5 meshes with 'texture_1')
}
shaderProgram_2->Bind()
{
glActiveTexture(texture_2)
DrawCall(render 5 meshes with 'texture_2')
}
And my fragment shader has a unique declaration of sampler2D (the goal of my shader program is to render geometry with simple texture mapping and ADS lighting):
uniform sampler2D ColorSampler;
I want to be sure it's not possible to draw this scene with a unique draw call (like it was possible with my previous example (1 batch was needed)). It was possible because I used the same texture for the whole geometry. I think this time I will need 2 batches hence 2 draw calls and of course for the rendering of each batch I will bind the 'texture_1' and 'texture_2' before each draw call (one for the first 5 boxes and an other one for the 5 others).
To sum up, if all the meshes are mapped with a simple texture (simple texture mapping):
5 with a red texture (texture_red)
5 with a blue texture (texture_blue)
Is it possible to render the scene with a simple draw call? I don't think so because my pseudo code will look like this:
Pseudo-code:
shaderProgram->Bind()
{
glActiveTexture(texture_blue)
glActiveTexture(texture_red)
DrawCall(render 10 meshes)
}
I think it's impossible to differentiate the 2 textures when my fragment shader has to compute the pixel color using a unique sampler2D uniform variable (simple texture mapping).
Here's my fragment shader:
#version 440
#define MAX_LIGHT_COUNT 1
/*
** Output color value.
*/
layout (location = 0) out vec4 FragColor;
/*
** Inputs.
*/
in vec3 Position;
in vec2 TexCoords;
in vec3 Normal;
/*
** Material uniforms.
*/
uniform MaterialBlock
{
vec3 Ka, Kd, Ks;
float Shininess;
} MaterialInfos;
uniform sampler2D ColorSampler;
struct Light
{
vec4 Position;
vec3 La, Ld, Ls;
float Kc, Kl, Kq;
};
uniform struct Light LightInfos[MAX_LIGHT_COUNT];
uniform unsigned int LightCount;
/*
** Light attenuation factor.
*/
float getLightAttenuationFactor(vec3 lightDir, Light light)
{
float lightAtt = 0.0f;
float dist = 0.0f;
dist = length(lightDir);
lightAtt = 1.0f / (light.Kc + (light.Kl * dist) + (light.Kq * pow(dist, 2)));
return (lightAtt);
}
/*
** Basic phong shading.
*/
vec3 Basic_Phong_Shading(vec3 normalDir, vec3 lightDir, vec3 viewDir, int idx)
{
vec3 Specular = vec3(0.0f);
float lambertTerm = max(dot(lightDir, normalDir), 0.0f);
vec3 Ambient = LightInfos[idx].La * MaterialInfos.Ka;
vec3 Diffuse = LightInfos[idx].Ld * MaterialInfos.Kd * lambertTerm;
if (lambertTerm > 0.0f)
{
vec3 reflectDir = reflect(-lightDir, normalDir);
Specular = LightInfos[idx].Ls * MaterialInfos.Ks * pow(max(dot(reflectDir, viewDir), 0.0f), MaterialInfos.Shininess);
}
return (Ambient + Diffuse + Specular);
}
/*
** Fragment shader entry point.
*/
void main(void)
{
vec3 LightIntensity = vec3(0.0f);
vec4 texDiffuseColor = texture2D(ColorSampler, TexCoords);
vec3 normalDir = (gl_FrontFacing ? -Normal : Normal);
for (int idx = 0; idx < LightCount; idx++)
{
vec3 lightDir = vec3(LightInfos[idx].Position) - Position.xyz;
vec3 viewDir = -Position.xyz;
float lightAttenuationFactor = getLightAttenuationFactor(lightDir, LightInfos[idx]);
LightIntensity += Basic_Phong_Shading(
-normalize(normalDir), normalize(lightDir), normalize(viewDir), idx
) * lightAttenuationFactor;
}
FragColor = vec4(LightIntensity, 1.0f) * texDiffuseColor;
}
Are you agree with me?
It's possible if you either: (i) consider it to be a multitexturing problem where the function per fragment just picks between the two incoming fragments (ideally using mix with a coefficient of 0.0 or 1.0, not genuine branching); or (ii) composite your two textures into one texture (subject to your ability to wrap and clamp texture coordinates efficiently — watch out for those dependent reads — and maximum texture size constraints).
It's an open question as to whether either of these things would improve performance. Definitely go with (ii) if you can.

Can I pack both floats and ints into the same array buffer?

...because the floats seem to be coming out fine, but there's something wrong with the ints.
Essentially I have a struct called "BlockInstance" which holds a vec3 and an int. I've got an array of these BlockInstances which I buffer like so (translating from C# to C for clarity):
glBindBuffer(GL_ARRAY_BUFFER, bufferHandle);
glBufferData(GL_ARRAY_BUFFER, sizeof(BlockInstance)*numBlocks, blockData, GL_DYNAMIC_DRAW);
glVertexAttribPointer(3,3,GL_FLOAT,false,16,0);
glVertexAttribPointer(4,1,GL_INT,false,16,12);
glVertexAttribDivisor(3,1);
glVertexAttribDivisor(4,1);
And my vertex shader looks like this:
#version 330
layout (location = 0) in vec3 Position;
layout (location = 1) in vec2 TexCoord;
layout (location = 2) in vec3 Normal;
layout (location = 3) in vec3 Translation;
layout (location = 4) in int TexIndex;
uniform mat4 ProjectionMatrix;
out vec2 TexCoord0;
void main()
{
mat4 trans = mat4(
1,0,0,0,
0,1,0,0,
0,0,1,0,
Translation.x,Translation.y,Translation.z,1);
gl_Position = ProjectionMatrix * trans * vec4(Position, 1.0);
TexCoord0 = vec2(TexCoord.x+TexIndex,TexCoord.y)/16;
}
When I replace TexIndex on the last line of my GLSL shader with a constant like 0, 1, or 2, my textures come out fine, but if I leave it like it is, they come out all mangled, so there must be something wrong with the number, right? But I don't know what it's coming out as so it's hard to debug.
I've looked at my array of BlockInstances, and they're all set to 1,2, or 19 so I don't think my input is wrong...
What else could it be?
Note that I'm using a sprite map texture where each of the tiles is 16x16 px but my TexCoords are in the range 0-1, so I add a whole number to it to choose which tile, and then divide it by 16 (the map is also 16x16 tiles) to put it back into the proper range. The idea is I'll replace that last line with
TexCoord0 = vec2(TexCoord.x+(TexIndex%16),TexCoord.y+(TexIndex/16))/16;
-- GLSL does integer math, right? An int divided by an int will come out as whole number?
If I try this:
TexCoord0 = vec2(TexCoord.x+(TexIndex%16),TexCoord.y)/16;
The texture looks fine, but it's not using the right sprite. (Looks to be using the first sprite)
If I do this:
TexCoord0 = vec2(TexCoord.x+(TexIndex%16),TexCoord.y+(TexIndex/16))/16;
It comes out all white. This leads me to believe that TexIndex is coming out to be a very large number (bigger than 256 anyway) and that it's probably a multiple of 16.
layout (location = 4) in int TexIndex;
There's your problem.
glVertexAttribPointer is used to send data that will be converted to floating-point values. It's used to feed floating-point attributes. Passing integers is possible, but those integers are converted to floats, because that's what glVertexAttribPointer is for.
What you need is glVertexAttribIPointer (notice the I). This is used for providing signed and unsigned integer data.
So if you declare a vertex shader input as a float or some non-prefixed vec, you use glVertexAttribPointer to feed it. If you declare the input as int, uint, ivec or uvec, then you use glVertexAttribIPointer.