Can I pack both floats and ints into the same array buffer? - opengl

...because the floats seem to be coming out fine, but there's something wrong with the ints.
Essentially I have a struct called "BlockInstance" which holds a vec3 and an int. I've got an array of these BlockInstances which I buffer like so (translating from C# to C for clarity):
glBindBuffer(GL_ARRAY_BUFFER, bufferHandle);
glBufferData(GL_ARRAY_BUFFER, sizeof(BlockInstance)*numBlocks, blockData, GL_DYNAMIC_DRAW);
And my vertex shader looks like this:
#version 330
layout (location = 0) in vec3 Position;
layout (location = 1) in vec2 TexCoord;
layout (location = 2) in vec3 Normal;
layout (location = 3) in vec3 Translation;
layout (location = 4) in int TexIndex;
uniform mat4 ProjectionMatrix;
out vec2 TexCoord0;
void main()
mat4 trans = mat4(
gl_Position = ProjectionMatrix * trans * vec4(Position, 1.0);
TexCoord0 = vec2(TexCoord.x+TexIndex,TexCoord.y)/16;
When I replace TexIndex on the last line of my GLSL shader with a constant like 0, 1, or 2, my textures come out fine, but if I leave it like it is, they come out all mangled, so there must be something wrong with the number, right? But I don't know what it's coming out as so it's hard to debug.
I've looked at my array of BlockInstances, and they're all set to 1,2, or 19 so I don't think my input is wrong...
What else could it be?
Note that I'm using a sprite map texture where each of the tiles is 16x16 px but my TexCoords are in the range 0-1, so I add a whole number to it to choose which tile, and then divide it by 16 (the map is also 16x16 tiles) to put it back into the proper range. The idea is I'll replace that last line with
TexCoord0 = vec2(TexCoord.x+(TexIndex%16),TexCoord.y+(TexIndex/16))/16;
-- GLSL does integer math, right? An int divided by an int will come out as whole number?
If I try this:
TexCoord0 = vec2(TexCoord.x+(TexIndex%16),TexCoord.y)/16;
The texture looks fine, but it's not using the right sprite. (Looks to be using the first sprite)
If I do this:
TexCoord0 = vec2(TexCoord.x+(TexIndex%16),TexCoord.y+(TexIndex/16))/16;
It comes out all white. This leads me to believe that TexIndex is coming out to be a very large number (bigger than 256 anyway) and that it's probably a multiple of 16.

layout (location = 4) in int TexIndex;
There's your problem.
glVertexAttribPointer is used to send data that will be converted to floating-point values. It's used to feed floating-point attributes. Passing integers is possible, but those integers are converted to floats, because that's what glVertexAttribPointer is for.
What you need is glVertexAttribIPointer (notice the I). This is used for providing signed and unsigned integer data.
So if you declare a vertex shader input as a float or some non-prefixed vec, you use glVertexAttribPointer to feed it. If you declare the input as int, uint, ivec or uvec, then you use glVertexAttribIPointer.


GLSL - vertex shader and batching using uniform buffer object

I have the following vertex shader:
#version 450 core
layout (binding=2, std140) uniform MATRIX_BLOCK
mat4 projection;
mat4 view;
mat4 model[128];
mat4 mvp[128];
layout (location = 0) in vec3 aPos;
layout (location = 1) in vec2 aTexCoord;
layout (location = 3) in uint object_idx;
out vec2 TexCoord;
flat out uint instance_idx;
void main()
gl_Position = mvp[object_idx] * vec4(aPos.x, aPos.y, aPos.z, 1.0);
TexCoord = aTexCoord;
instance_idx = object_idx;
I'm using a uniform buffer to pass in 128 model and model-view-projection matrices, indexed by an object id. The object id is passed to the shader using a vertex attribute object_idx; basically every vertex, besides having x,y,z coordinates and u,v texture coordinates, also has an object id associated with it. The idea would be to be able to store the data for multiple objects in the same buffers but still use specific transformation matrices for each individual object. This is my (possibly stupid) attempt to batch together multiple objects to draw them with a single draw call, without having to rebind anything, using glDrawElements to render triangles.
However, it doesn't work. When I do
gl_Position = mvp[object_idx] * vec4(aPos.x, aPos.y, aPos.z, 1.0);
then triangles with an object_idx of 0 get rendered just fine at the expected position, but triangles with vertices with object_idx's other than 0 don't appear anywhere. I thought I might have gotten the transformation matrices wrong, so for debugging, I reduced the possible objects to just 2 (0 and 1) and inverted the indexing using
gl_Position = mvp[1-object_idx] * vec4(aPos.x, aPos.y, aPos.z, 1.0);
This resulted in all the triangles with object_idx = 0 being rendered at the expected position for mvp[1], but again, no triangles with object_idx = 1 appearing anywhere. So at least I know that the transformation matrices are correct. I then tried
gl_Position = mvp[0] * vec4(aPos.x, aPos.y, aPos.z, 1.0);
and that renders all triangles (using object 0's transformation matrix) and
gl_Position = mvp[1] * vec4(aPos.x, aPos.y, aPos.z, 1.0);
renders all of them, using object 1's transformation matrix.
So, obviously I don't understand something really fundamental about how vertex shaders or glDrawElements do their work.
So, my question:
Why don't all my triangles get rendered when I do a "dynamic" lookup of the mvp transformation matrix using object_idx, when to the best of my ability to check it all the data is passed into the vertex shader just as it's supposed to be?
I'm making an educated guess here:
layout (location = 3) in uint object_idx;
Using uint attribute input requires to set up the attribute pointer with the function glVertexAttribIPointer() (note the extra I in there).
Using the standard glVertexAttribPointer function will always set up float attributes (and using type GL_INT there will convert from integer to float). Technically, when you read such a float attribute as uint in the shader, it will be undefined, but it is quite likely the 0 stays 0 as the float and integer represations of that are usually identical, but that works only by accident.
Apart from that issue, storing the object index per vertex is also quite inefficient. To effectively batch your draw calls, you should have a look at multi draw calls and gl_DrawID (originally from GL_ARB_shader_draw_parameters) features. You might also find the approaching zero driver overhead (AZDO) techniques useful.

GLSL - unable to access second index of a SSBO array for multiple lights

In my application I add two lights. One at (0,0,2) and the second one at (2,0,0). Here's what I get (the x,y,z axes are represented respectively by the red, green & blue lines):
Notice how only the first light is working and the second is not. I made my application core-profile compliant to inspect the buffers with various tools like RenderDoc and NSight and both show me that the second light's data is present in the buffer (picture taken while running Nsight):
The positions seem to be correctly transfered to the gpu memory buffer. Here's the implementation of my fragment shader that uses a SSBO to handle multiple lights in my application:
#version 430
struct Light {
vec3 position;
vec3 color;
float intensity;
float attenuation;
float radius;
layout (std140, binding = 0) uniform CameraInfo {
mat4 ProjectionView;
vec3 eye;
layout (std430, binding = 1) readonly buffer LightsData {
Light lights[];
uniform vec3 ambient_light_color;
uniform float ambient_light_intensity;
in vec3 ex_FragPos;
in vec4 ex_Color;
in vec3 ex_Normal;
out vec4 out_Color;
void main(void)
// Basic ambient light
vec3 ambient_light = ambient_light_color * ambient_light_intensity;
int i;
vec3 diffuse = vec3(0.0,0.0,0.0);
vec3 specular = vec3(0.0,0.0,0.0);
for (i = 0; i < lights.length(); ++i) {
Light wLight = lights[i];
// Basic diffuse light
vec3 norm = normalize(ex_Normal); // in this project the normals are all normalized anyway...
vec3 lightDir = normalize(wLight.position - ex_FragPos);
float diff = max(dot(norm, lightDir), 0.0);
diffuse += diff * wLight.color;
// Basic specular light
vec3 viewDir = normalize(eye - ex_FragPos);
vec3 reflectDir = reflect(-lightDir, norm);
float spec = pow(max(dot(viewDir, reflectDir), 0.0), 32);
specular += wLight.intensity * spec * wLight.color;
out_Color = ex_Color * vec4(specular + diffuse + ambient_light,1.0);
Note that I've read the section of the OpenGL 4.5 spec and that, if I understood correctly, my alignment should follow the size of the biggest member of my struct, which is a vec3 and my struct size is 36 bytes so everything should be fine here. I also tried different std version (e.g. std140) and adding some padding, but nothing fixes the issue with the second light. In my C++ code, I have those definitions to add the lights in my application:
struct Light {
glm::f32vec3 position;
glm::f32vec3 color;
float intensity;
float attenuation;
float radius;
std::vector<Light> _Lights;
void AddLight(const Light &light) {
// Add to _Lights
LIGHTS_SSBO_BINDING_POINT, _Lights.size()* sizeof(Light),
static_cast<void*>(, GL_DYNAMIC_DRAW);
using SSBOCapacity = GLuint;
using BindingPoint = GLuint;
using ID = GLuint;
std::map<BindingPoint, std::pair<ID, SSBOCapacity> > SSBO_list;
void UpdateSSBOBlockData(GLuint a_unBindingPoint,
GLuint a_unSSBOSize, void* a_pData, GLenum a_eUsage) {
auto SSBO = SSBO_list.find(a_unBindingPoint);
if (SSBO != SSBO_list.end()) {
GLuint unSSBOID = SSBO->second.first;
glBufferData(GL_SHADER_STORAGE_BUFFER, a_unSSBOSize, a_pData, a_eUsage);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0); //unbind
// error handling...
Basically, I'm trying to update/reallocate the SSBO size with glBufferData each time a light is added in my app.
Now, since I'm having issues processing the second light data, I changed my fragment shader code to only execute the second light in my SSBO array by forcing i = 1 and looping until i < 2, but I get the following errors:
(50) : error C1068: ... or possible array index out of bounds
(50) : error C5025: lvalue in field access too complex
(56) : error C1068: ... or possible array index out of bounds
(56) : error C5025: lvalue in field access too complex
Lines 50 and 56 refer to diffuse += diff * wLight.color; and specular += wLight.intensity * spec * wLight.color; respectively. Is there really an out of bounds access even if I add my lights before the first draw call? Why is the shader compiling correctly when I'm using lights.length() instead of 2?
Finally, I've added a simple if (i == 1) in my for-loop to see if lights.length() is equal to 2, but it doesn't go in it. Yet the initial size of my buffer is 0 and then I add a light that sets the buffer size to 36 bytes and we can see that the first light works fine. Why is the update/reallocate not working the second time?
So what I did was to add some padding at the end of the declaration of my struct on the C++ side only. The padding required was float[3] or 12 bytes, which sums up to 48 bytes. I'm still not sure why this is required, since the specifications state (as highlighted in this post)
If the member is a structure, the base alignment of the structure is N, where N is the largest base alignment value of any of its
members, and rounded up to the base alignment of a vec4. The
individual members of this sub-structure are then assigned offsets by
applying this set of rules recursively, where the base offset of the
first member of the sub-structure is equal to the aligned offset of
the structure. The structure may have padding at the end; the base
offset of the member following the sub-structure is rounded up to the
next multiple of the base alignment of the structure.
When using the std430 storage layout, shader storage blocks will be
laid out in buffer storage identically to uniform and shader storage
blocks using the std140 layout, except that the base alignment and
stride of arrays of scalars and vectors in rule 4 and of structures in
rule 9 are not rounded up a multiple of the base alignment of a vec4.
My guess is that structures such as vec3 and glm::f32vec3 defined by glm are recursively rounded up to vec4 when using std430 and therefore my struct must follow the alignment of a vec4. If anyone can confirm this, it would be interesting since the linked post above deals with vec4 directly and not vec3.
Picture with both lights working :
After more investigation, it turns out that the last 3 fields of the Light struct (intensity, attenuation and radius) were not usable. I fixed this by changing the position and color from glm::f32vec3 to glm::vec4 instead. More information can be found in a similar post. I also left a single float for padding, because of the alignment mentioned earlier.

Is glVertexAttribpointer used only for vertex, UVs, colors, and normals ? Nothing else?

I want to incorporate a custom attribute that varies per vertex. In this case it is assigned to location=4 ... but nothing happens, the other four attributes vary properly except that one. At the bottom, I added a test to produce a specific color if it encounters the value '1' (which I know exists in the buffer, because I queried the buffer earlier). Attribute 4 is stuck at the first value of its array and never moves.
Am I missing a setting ? (something to be enabled maybe ?) or is it that openGL only varies a handful attributes but nothing else ?
#version 330 //for openGL 3.3
//uniform variables stay constant for the whole glDraw call
uniform mat4 ProjViewModelMatrix;
uniform vec4 DefaultColor; //x=-1 signifies no default color
//non-uniform variables get fed per vertex from the buffers
layout (location=0) in vec3 coords; //feeding from attribute=0 of the main code
layout (location=1) in vec4 color; //per vertex color, feeding from attribute=1 of the main code
layout (location=2) in vec3 normals; //per vertex normals
layout (location=3) in vec2 UVcoord; //texture coordinates
layout (location=4) in int vertexTexUnit;//per vertex texture unit index
out vec4 thisColor;
out vec2 vertexUVcoord;
flat out int TexUnitIdx;
void main ()
vertexUVcoord = UVcoord;
if (DefaultColor.x==-1) {thisColor = color;} //If no default color is set, use per vertex colors
else {thisColor = DefaultColor;}
gl_Position = ProjViewModelMatrix * vec4(coords,1.0); //This outputs the position to the graphics card.
if (vertexTexUnit==1) thisColor=vec4(1,1,0,1); //Never receives value of 1, but the buffer does contain such values
Because the vertexTexUnit attribute is an integer, you must use glVertexAttribIPointer() instead of glVertexAttribPointer().
You can use vertex attributes for whatever you want. OpenGL doesn't know or care what you're using them for.

Per-vertex value with Element buffer

Say that I have a vertex shader. It's input section looks like this (simplified):
layout(location = 0) in vec3 V_pos;
layout(location = 1) in vec3 V_norm;
layout(location = 2) in vec2 V_texcoord1;
layout(location = 3) in vec2 V_texcoord2;
layout(location = 4) in int V_texNum;
What I want is to have the first 4 inputs come from an element buffer, while the last will come from a regular buffer. Eg, in this example, each element has two uv pairs, and I want to be able to give certain faces different textures to sample from.
Can this be done? One other option would be to give the shader a huge uniform of integers containing the values for texNum, and access that with gl_VertexID. But, that seems like a really ugly way to do it.
I'm using OpenGL 3.3 (happy to use extensions though) and c++.

Is it faster to use texelFetch when rendering fonts?

I am writing some font drawing shaders in OpenGL 3.3. I will render my font into a texture atlas and then generate some display lists for some text I want to draw. I would like the rendering of text to consume the least amount of resources (CPU, GPU memory, GPU time). How can I accomplish this?
Looking at Freetype-gl, I noticed that the author generates 6 indices and 4 vertices per character.
Since I am using OpenGL 3.3, I have some additional freedom. My plan was to generate 1 vertex per character plus one integer "code" per character. The character code can be used in texelFetch operations to retrieve texture coördinates and character size information. A geometry shader turns the size information and vertex into a triangle strip.
Is texelFetch going to be slower than sending more vertices/texture coördinates? Is this worth doing?, or is there are reason why it's not done in the font libraries I looked at?
Final code:
Vertex shader:
#version 330
uniform sampler2D font_atlas;
uniform sampler1D code_to_texture;
uniform mat4 projection;
uniform vec2 vertex_offset; // in view space.
uniform vec4 color;
uniform float gamma;
in vec2 vertex; // vertex in view space of each character adjusted for kerning, etc.
in int code;
out vec4 v_uv;
void main()
v_uv = texelFetch(
gl_Position = projection * vec4(vertex_offset + vertex, 0.0, 1.0);
Geometry shader:
#version 330
layout (points) in;
layout (triangle_strip, max_vertices = 4) out;
uniform sampler2D font_atlas;
uniform mat4 projection;
in vec4 v_uv[];
out vec2 g_uv;
void main()
vec4 pos = gl_in[0].gl_Position;
vec4 uv = v_uv[0];
vec2 size = vec2(textureSize(font_atlas, 0)) * ( - uv.xy);
vec2 pos_opposite = pos.xy + (mat2(projection) * size);
gl_Position = vec4(pos.xy, 0, 1);
g_uv = uv.xy;
gl_Position = vec4(pos.x, pos_opposite.y, 0, 1);
g_uv = uv.xw;
gl_Position = vec4(pos_opposite.x, pos.y, 0, 1);
g_uv = uv.zy;
gl_Position = vec4(pos_opposite.xy, 0, 1);
g_uv =;
Fragment shader:
#version 330
uniform sampler2D font_atlas;
uniform vec4 color;
uniform float gamma;
in vec2 g_uv;
layout (location = 0) out vec4 fragment_color;
void main()
float a = texture(font_atlas, g_uv).r;
fragment_color.rgb = color.rgb;
fragment_color.a = color.a * pow(a, 1.0 / gamma);
I wouldn't expect there to be a significant performance difference between your proposed method vs storing the quad vertex positions and texture coordinates in a vertex buffer. On the one hand your method requires a smaller vertex buffer and less work for the CPU. On the other hand the texelFetch calls will be more-or-less at random locations, and not make the best use of the cache. This last point may not be very significant as I guess that texture wont be very large. Also, the execution model of geometry shaders mean they can quickly become the bottleneck of the pipeline.
To answer "is this worth doing?" - I suspect not for performance reasons. Unfortunately you can't tell until you implement it and measure the performance. I think it's quite a cool idea though, so I don't think you'd be wasting your time trying it out.
Maybe you can use Atomic Counter to handle current position in text.
Here is an interresting paper on memory bandwidth
GPU perf...
You can cache the result in a fbo.
For realy fast rendering as you said, you may build a geom shader taking points as input and outputing quads and sample a texture to get additional on glyph info.
This appear effectively the best solution...