How to establish glBindBufferRange() offset with Shader Storage Buffer and std430? - c++

I want to switch between ssbo data to draw things with different setup. To make it happen I need to use glBindBufferRange() with its suitable offset.
I've read that the offset needs to be a multiple of GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT for ubo, but things may be changed with ssbo since using std430 instead of std140.
I tried to do this the easiest way
struct Color
float r, g, b, a;
struct V2
float x, y;
struct Uniform
Color c1;
Color c2;
V2 v2;
float r;
float f;
int t;
GLuint ssbo = 0;
std::vector<Uniform> uniform;
int main()
//create window, context etc.
glCreateBuffers(1, &ssbo);
Uniform u;
u.c1 = {255, 0, 255, 255 };
u.c2 = {255, 0, 255, 255 };
u.v2 = { 0.0f, 0.0f };
u.r = 0.0f;
u.f = 100.0f;
u.t = 0;
u.c1 = {255, 255, 0, 255 };
u.c2 = {255, 255, 0, 255 };
u.v2 = { 0.0f, 0.0f };
u.r = 100.0f;
u.f = 100.0f;
u.t = 1;
u.c1 = {255, 0, 0, 255 };
u.c2 = {255, 0, 0, 255 };
u.v2 = { 0.0f, 0.0f };
u.r = 100.0f;
u.f = 0.0f;
u.t = 0;
glNamedBufferData(ssbo, sizeof(Uniform) * uniform.size(),, GL_STREAM_DRAW);
for(int i = 0; i < uniform.size(); ++i) {
glBindBufferRange(GL_SHADER_STORAGE_BUFFER, 1, ssbo, sizeof(Uniform) * i, sizeof(Uniform));
//swap buffer etc.
return 0;
#version 460 core
layout(location = 0) out vec4 f_color;
layout(std430, binding = 1) buffer Unif
vec4 c1;
vec4 c2;
vec2 v2;
float r;
float f;
int t;
void main()
f_color = vec4(t, 0, 0, 1);
There is of course vao, vbo, vertex struct and so on, but they are not affect ssbo.
I got GL_INVALID_VALUE glBindBufferRange() error, though. And that must come from offset, because my next attempt transfers data, but with wrong order.
and a formula I found on the Internet
int align = 4;
int ssboSize = sizeof(Uniform) + align - sizeof(Uniform) % align;
so just changing glNamedBufferData and glBindBufferRange it looks like this
glNamedBufferData(ssbo, ssboSize * uniform.size(),, GL_STREAM_DRAW);
glBindBufferRange(GL_SHADER_STORAGE_BUFFER, 1, ssbo, ssboSize * i, sizeof(Uniform));
and that way, it almost worked. As you can see, ts are
so opengl should draw 3 shapes with colors -
vec4(0, 0, 0, 1);
vec4(1, 0, 0, 1);
vec4(0, 0, 0, 1);
it draws them wrong order
vec4(1, 0, 0, 1);
vec4(0, 0, 0, 1);
vec4(0, 0, 0, 1);
How can I make it transfer data proper way?

The OpenGL spec (Version 4.6) states the following in section "6.1.1 Binding Buffer Objects to Indexed Target Points" regararding the error conditions for glBindBufferRange:
An INVALID_VALUE error is generated by BindBufferRange if buffer is
non-zero and offset or size do not respectively satisfy the constraints described for those parameters for the specified target, as described in section 6.7.1.
Section 6.7.1 "Indexed Buffer Object Limits and Binding Queries" states for SSBOs:
offset restriction: multiple of value of SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT
According to Table 23.64 "Implementation Dependent Aggregate Shader Limits":
256 [with the following footnote]: The value of SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT is the maximum allowed, not the minimum.
So if your offset is not a multiple of 256 (which it isn't), this code is simply not guaranteed to work at all. You can query for the actual restriction by the implementation you are running on and ajust your buffer contents accordingly, but you must be prepared that it is as high as 256 bytes.

I ended up using struct alignas(128) Uniform. I guess my next goal is to not use hardcoded align.


Multiplication in OpenGL vertex shader using column-major matrix does not draw triangle as expected

When I use a custom column major matrix in my code, and pass it to the vertex shader, the triangle is not drawn as expected, but when I use a row major matrix, it draws the triangle in its correct position.
I googled it and found some answers related to this question:
Like this and this, but I could not understand what I'm doing wrong.
If I'm not mistaken, a row-major matrix is:
{ 0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
Tx, Ty, Tz, w}
So, using this row-major matrix, the multiplication order should be: v' = v*M.
And a column-major matrix is:
{ 0, 4, 8, Tx,
1, 5, 9, Ty,
2, 6, 10, Tz,
3, 7, 11, w}
Using this column-major matrix, the multiplication order should be: v' = M*v.
Where Tx, Ty, and Tz hold the translation values for x, y and z, respectively.
Having said that, I will focus on what I think I'm having trouble with, in order to have a more compact question, but I will post an example code in the end, using GLFW and GLAD(<glad/gl.h>)
This is my vertex shader:
#version 330 core
layout (location = 0) in vec3 aPos;
uniform mat4 transform;
void main()
gl_Position = transform * vec4(aPos, 1.0);
These are my Mat4 struct and its functions:
typedef struct Mat4
float data[16];
} Mat4;
// Return Mat4 identity matrix
Mat4 mat4_identity()
Mat4 m = {0};[0] = 1.0f;[5] = 1.0f;[10] = 1.0f;[15] = 1.0f;
return m;
// Translate Mat4 using row-major order
Mat4 mat4_row_translation(Mat4 a, float x, float y, float z)
Mat4 m = mat4_identity();[12] += x;[13] += y;[14] += z;
return m;
// Translate Mat4 using column-major order
Mat4 mat4_column_translation(Mat4 a, float x, float y, float z)
Mat4 m = mat4_identity();[3] += x;[7] += y;[11] += z;
return m;
This is my update_triangle function where I translate the matrix:
Mat4 trans = mat4_identity();
trans = mat4_column_translation(trans, 0.5f, 0.5f, 0.0f);
unsigned int transformLoc = glGetUniformLocation(shader, "transform");
glUniformMatrix4fv(transformLoc, 1, GL_FALSE,;
Note that I'm passing GL_FALSE in glUniformMatrix4v, which tells OpenGL that the matrix is already in a column-major order.
However, when running the program, I do not get a triangle 0.5f up and 0.5f right, I get this:
Weird triangle translation
But when I use a row-major matrix and change the multiplication order in the vertex shader(v' = v*M), I get the result that I was expecting.
The vertex shader:
#version 330 core
layout (location = 0) in vec3 aPos;
uniform mat4 transform;
void main()
gl_Position = vec4(aPos, 1.0) * transform;
The update_triangle function:
Mat4 trans = mat4_identity();
trans = mat4_row_translation(trans, 0.5f, 0.5f, 0.0f);
unsigned int transformLoc = glGetUniformLocation(shader, "transform");
glUniformMatrix4fv(transformLoc, 1, GL_TRUE,;
Note that I'm passing GL_TRUE in glUniformMatrix4v, which tells OpenGL that the matrix is not in a column-major order.
The result:
Triangle drawn as expected
Here is the code in a single file, it needs to be compiled with GLFW and glad/gl.c.
Comment[0] and Comment1 are just to help with which lines to comment together, for example: If you comment a line with "// Comment[0]" in int, you need to comment the other lines with "// Comment[0]" as well.
But in the Vertex Shader, both matrices use the same line to be drawn correct(which is why I don't understand).
If you are on linux, you can compile with: g++ -o ex example.cpp gl.c -lglfw && ./ex
(You will need to download gl.c from Glad generator)
#include <glad/gl.h>
#include <GLFW/glfw3.h>
#include <stdio.h>
#include <stdlib.h>
// Mat4 structure
typedef struct Mat4
float data[16];
} Mat4;
int c = 0;
// Return Mat4 identity matrix
Mat4 mat4_identity()
Mat4 m = {0};[0] = 1.0f;[5] = 1.0f;[10] = 1.0f;[15] = 1.0f;
return m;
// Translate Mat4 using row-major order
Mat4 mat4_row_translation(Mat4 a, float x, float y, float z)
Mat4 m = mat4_identity();[12] += x;[13] += y;[14] += z;
return m;
// Translate Mat4 using column-major order
Mat4 mat4_column_translation(Mat4 a, float x, float y, float z)
Mat4 m = mat4_identity();[3] += x;[7] += y;[11] += z;
return m;
GLFWwindow *glfw_window;
// Window functions
int init_glfw(const char *window_title, int x, int y, int width, int height);
void framebuffer_size_callback(GLFWwindow* window, int width, int height);
void processInput();
// Shader functions
static unsigned int compile_shader(unsigned int type, const char *source);
static unsigned int create_shader(const char *vertex_shader, const char *fragment_shader);
// Triangle functions
void init_triangle();
void draw_triangle();
void update_triangle();
unsigned int shader = -1;
unsigned int vao = -1;
unsigned int vbo = -1;
float vertices[] = {
-0.5f, -0.5f, 0.0f, // left
0.5f, -0.5f, 0.0f, // right
0.0f, 0.5f, 0.0f // top
const char *vshader = "#version 330 core\n"
"layout (location = 0) in vec3 aPos;\n"
"uniform mat4 transform;\n"
"void main()\n"
// " gl_Position = vec4(aPos, 1.0) * transform;\n" // Comment [0] -> Inverted for column-major
" gl_Position = transform * vec4(aPos, 1.0);\n" // Comment [1] -> Inverted for column-major
const char *fshader = "#version 330 core\n"
"out vec4 FragColor;\n"
"void main()\n"
" FragColor = vec4(1.0f, 0.5f, 0.2f, 1.0f);\n"
int main()
int result = init_glfw("LearnOpenGL", 0, 0, 800, 600);
if(result != 0)
return result;
while (!glfwWindowShouldClose(glfw_window))
// input
// Update triangle vertices
glClearColor(0.2f, 0.3f, 0.3f, 1.0f);
// Draw triangle example
// glfw: swap buffers and poll IO events (keys pressed/released, mouse moved etc.)
// glfw: terminate, clearing all previously allocated GLFW resources.
return 0;
// My confusion is here
void update_triangle()
Mat4 trans = mat4_identity();
trans = mat4_column_translation(trans, 0.5f, 0.5f, 0.0f); // Comment [0]
// trans = mat4_row_translation(trans, 0.5f, 0.5f, 0.0f); // Comment [1]
// Print Mat4
if(c == 0)
// TODO: Remove this
printf("==== Trans: ====\n");
for(int i = 1; i <= 16; i++)
printf("%.2f, ",[i-1]);
if(i % 4 == 0 && i != 0)
unsigned int transformLoc = glGetUniformLocation(shader, "transform");
glUniformMatrix4fv(transformLoc, 1, GL_FALSE,; // Comment [0]
// glUniformMatrix4fv(transformLoc, 1, GL_TRUE,; // Comment [1]
// Window functions
int init_glfw(const char *window_title, int x, int y, int width, int height)
// glfw: initialize and configure
// ------------------------------
#ifdef __APPLE__
// glfw window creation
// --------------------
glfw_window = glfwCreateWindow(width, height, window_title, NULL, NULL);
if (glfw_window == NULL)
printf("Failed to create GLFW window\n");
return -1;
glfwSetFramebufferSizeCallback(glfw_window, framebuffer_size_callback);
// glad: load all OpenGL function pointers
// ---------------------------------------
int version = gladLoadGL(glfwGetProcAddress);
printf("Current GL loaded: %d.%d\n", GLAD_VERSION_MAJOR(version), GLAD_VERSION_MINOR(version));
return 0;
void framebuffer_size_callback(GLFWwindow* window, int width, int height)
glViewport(0, 0, width, height);
void processInput()
if(glfwGetKey(glfw_window, GLFW_KEY_ESCAPE) == GLFW_PRESS)
glfwSetWindowShouldClose(glfw_window, true);
/* Default Compilation for Shader */
static unsigned int compile_shader(unsigned int type, const char *source)
unsigned int id = glCreateShader(type);
glShaderSource(id, 1, &source, NULL);
int result;
glGetShaderiv(id, GL_COMPILE_STATUS, &result);
int length;
glGetShaderiv(id, GL_INFO_LOG_LENGTH, &length);
char* msg = (char*) alloca(length * sizeof(char));
glGetShaderInfoLog(id, length, &length, msg);
printf("Vertex / Fragment Shader Failed:\n %s", msg);
return 0;
return id;
static unsigned int create_shader(const char *vertex_shader, const char *fragment_shader)
unsigned int program = glCreateProgram();
unsigned int vs = compile_shader(GL_VERTEX_SHADER, vertex_shader);
unsigned int fs = compile_shader(GL_FRAGMENT_SHADER, fragment_shader);
glAttachShader(program, vs);
glAttachShader(program, fs);
return program;
// Triangle functions
void init_triangle()
shader = create_shader(vshader, fshader);
printf("shader=%d", shader);
glGenVertexArrays(1, &vao);
printf("vao=%d", vao);
glGenBuffers(1, &vbo);
printf("vbo=%d\n", vbo);
glBindBuffer(GL_ARRAY_BUFFER, vbo); // Using this vbo
glBufferData(GL_ARRAY_BUFFER, sizeof(vertices), vertices, GL_STATIC_DRAW);
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 3 * sizeof(float), NULL);
void draw_triangle()
glDrawArrays(GL_TRIANGLES, 0, 3);
This is my first question in this forum, so please let me know if there is anything missing.
So many people use row-major or transposed matrices, that they forget that matrices are not naturally oriented that way. So they see a translation matrix as this:
1 0 0 0
0 1 0 0
0 0 1 0
x y z 1
This is a transposed translation matrix. That is not what a normal translation matrix looks like. The translation goes in the 4th column, not the fourth row. Sometimes, you even see this in textbooks, which is utter garbage.
It's easy to know whether a matrix in an array is row or column-major. If it's row-major, then the translation is stored in the 3, 7, and 11th indices. If it's column-major, then the translation is stored in the 12, 13, and 14th indices. Zero-base indices of course.
Your confusion stems from believing that you're using column-major matrices when you're in fact using row-major ones.
The statement that row vs. column major is a notational convention only is entirely true. The mechanics of matrix multiplication and matrix/vector multiplication are the same regardless of the convention.
What changes is the meaning of the results.
A 4x4 matrix after all is just a 4x4 grid of numbers. It doesn't have to refer to a change of coordinate system. However, once you assign meaning to a particular matrix, you now need to know what is stored in it and how to use it.
Take the translation matrix I showed you above. That's a valid matrix. You could store that matrix in a float[16] in one of two ways:
float row_major_t[16] = {1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, x, y, z, 1};
float column_major_t[16] = {1, 0, 0, x, 0, 1, 0, y, 0, 0, 1, z, 0, 0, 0, 1};
However, I said that this translation matrix is wrong, because the translation is in the wrong place. I specifically said that it is transposed relative to the standard convention for how to build translation matrices, which ought to look like this:
1 0 0 x
0 1 0 y
0 0 1 z
0 0 0 1
Let's look at how these are stored:
float row_major[16] = {1, 0, 0, x, 0, 1, 0, y, 0, 0, 1, z, 0, 0, 0, 1};
float column_major[16] = {1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, x, y, z, 1};
Notice that column_major is exactly the same as row_major_t. So, if we take a proper translation matrix, and store it as column-major, it is the same as transposing that matrix and storing it as row-major.
That is what is meant by being only a notational convention. There are really two sets of conventions: memory storage and transposition. Memory storage is column vs row major, while transposition is normal vs. transposed.
If you have a matrix that was generated in row-major order, you can get the same effect by transposing the column-major equivalent of that matrix. And vice-versa.
Matrix multiplication can only be done one way: given two matrices, in a specific order, you multiply certain values together and store the results. Now, A*B != B*A, but the actual source code for A*B is the same as the code for B*A. They both run the same code to compute the output.
The matrix multiplication code does not care whether the matrices happen to be stored in column-major or row-major order.
The same cannot be said for vector/matrix multiplication. And here's why.
Vector/matrix multiplication is a falsehood; it cannot be done. However, you can multiply a matrix by another matrix. So if you pretend a vector is a matrix, then you can effectively do vector/matrix multiplication, simply by doing matrix/matrix multiplication.
A 4D vector can be considered a column-vector or a row-vector. That is, a 4D vector can be thought of as a 4x1 matrix (remember: in matrix notation, the row count comes first) or a 1x4 matrix.
But here's the thing: Given two matrices A and B, A*B is only defined if the number of columns of A is the same as the number of rows of B. Therefore, if A is our 4x4 matrix, B must be a matrix with 4 rows in it. Therefore, you cannot perform A*x, where x is a row-vector. Similarly, you cannot perform x*A where x is a column-vector.
Because of this, most matrix math libraries make this assumption: if you multiply a vector times a matrix, you really mean to do the multiplication that actually works, not the one that makes no sense.
Let us define, for any 4D vector x, the following. C shall be the column-vector matrix form of x, and R shall be the row-vector matrix form of x. Given this, for any 4x4 matrix A, A*C represents matrix multiplying A by the column-vector x. And R*A represents matrix multiplying the row-vector x by A.
But if we look at this using strict matrix math, we see that these are not equivalent. R*A cannot be the same as A*C. This is because a row-vector is not the same thing as a column-vector. They're not the same matrix, so they do not produce the same results.
However, they are related in one way. It is true that R != C. However, it is also true that R = CT, where T is the transpose operation. The two matrices are transposes of each other.
Here's a funny fact. Since vectors are treated as matrices, they too have a column vs. row-major storage question. The problem is that they both look the same. The array of floats is the same, so you can't tell the difference between R and C just by looking at the data. The only way to tell the difference is by how they are used.
If you have any two matrices A and B, and A is stored as row-major and B as column-major, multiplying them is completely meaningless. You get nonsense as a result. Well, not really. Mathematically, what you get is the equivalent of doing ATB. Or ABT; they're mathematically identical.
Therefore, matrix multiplication only makes sense if the two matrices (and remember: vector/matrix multiplication is just matrix multiplication) are stored in the same major ordering.
So, is a vector column-major or row-major? It is both and neither, as stated before. It is column major only when it is used as a column matrix, and it is row major when it is used as a row matrix.
Therefore, if you have a matrix A which is column major, x*A means... nothing. Well, again, it means x*AT, but that's not what you really wanted. Similarly, A*x does transposed multiplication if A is row-major.
Therefore, the order of vector/matrix multiplication does change, depending on your major ordering of the data (and whether you're using transposed matrices).

How does one set an array in hlsl?

In glsl, array = int[8]( 0, 0, 0, 0, 0, 0, 0, 0 ); works fine, but in hlsl this doesn't seem to be the case. It doesn't seem to be mentioned in any guides how you do this. What exactly am I meant to do?
For example, like this:
int array[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
Ah, you mean array assignment. It seems like that is not possible for now. In addition to trying every possible sensible option, I cross-compiled simple glsl code to hlsl code using glslcc (which uses spirv-cross).
GLSL code:
#version 450
layout (location = SV_Target0) out vec4 fragColor;
void main()
int array[4] = {0, 0, 0, 0};
array = int[4]( 1, 0, 1, 0);
fragColor = vec4(array[0], array[1], array[2], array[3]);
HLSL code:
static const int _13[4] = { 0, 0, 0, 0 };
static const int _15[4] = { 1, 0, 1, 0 };
static float4 fragColor;
struct SPIRV_Cross_Output
float4 fragColor : SV_Target0;
void frag_main()
int array[4] = _13;
array = _15;
fragColor = float4(float(array[0]), float(array[1]), float(array[2]), float(array[3]));
SPIRV_Cross_Output main()
SPIRV_Cross_Output stage_output;
stage_output.fragColor = fragColor;
return stage_output;
As you can see, in this case the equivalent hlsl code uses static const array and then assigns it since that kind of array assignment is allowed in HLSL (and makes a deep copy unlike in C/C++).

OpenGL textures smaller than 4x4 in non-RGBA format are rendered malformed [duplicate]

I have a very simple program that maps a dummy red texture to a quad.
Here is the texture definition in C++:
struct DummyRGB8Texture2d
uint8_t data[3*4];
int width;
int height;
DummyRGB8Texture2d myTexture
This is how I setup the texture:
void SetupTexture()
// allocate a texture on the default texture unit (GL_TEXTURE0):
GL_CHECK(glCreateTextures(GL_TEXTURE_2D, 1, &m_texture));
// allocate texture:
GL_CHECK(glTextureStorage2D(m_texture, 1, GL_RGB8, myTexture.width, myTexture.height));
GL_CHECK(glTextureParameteri(m_texture, GL_TEXTURE_WRAP_S, GL_REPEAT));
GL_CHECK(glTextureParameteri(m_texture, GL_TEXTURE_WRAP_T, GL_REPEAT));
GL_CHECK(glTextureParameteri(m_texture, GL_TEXTURE_MAG_FILTER, GL_NEAREST));
GL_CHECK(glTextureParameteri(m_texture, GL_TEXTURE_MIN_FILTER, GL_NEAREST));
// tell the shader that the sampler2d uniform uses the default texture unit (GL_TEXTURE0)
GL_CHECK(glProgramUniform1i(m_program->Id(), /* location in shader */ 3, /* texture unit index */ 0));
// bind the created texture to the specified target. this is necessary even in dsa
GL_CHECK(glBindTexture(GL_TEXTURE_2D, m_texture));
This is how I draw the texture to the quad:
void Draw()
// load the texture to the GPU:
GL_CHECK(glTextureSubImage2D(m_texture, 0, 0, 0, myTexture.width, myTexture.height,
GL_CHECK(glDrawElements(GL_TRIANGLES, static_cast<GLsizei>(VideoQuadElementArray.size()), GL_UNSIGNED_INT, 0));
The Result:
I can't figure out why this texture won't appear Red. Also, if I change the texture internal format to RGBA / RGBA8 and the texture data array to have another element in each row, I get a nice red texture.
In case its relevant, here are my vertex attributes and my (very simple) shaders:
struct VideoQuadVertex
glm::vec3 vertex;
glm::vec2 uv;
std::array<VideoQuadVertex, 4> VideoQuadInterleavedArray
/* vec3 */ VideoQuadVertex{ glm::vec3{ -0.25f, -0.25f, 0.5f }, /* vec2 */ glm::vec2{ 0.0f, 0.0f } },
/* vec3 */ VideoQuadVertex{ glm::vec3{ 0.25f, -0.25f, 0.5f }, /* vec2 */ glm::vec2{ 1.0f, 0.0f } },
/* vec3 */ VideoQuadVertex{ glm::vec3{ 0.25f, 0.25f, 0.5f }, /* vec2 */ glm::vec2{ 1.0f, 1.0f } },
/* vec3 */ VideoQuadVertex{ glm::vec3{ -0.25f, 0.25f, 0.5f }, /* vec2 */ glm::vec2{ 0.0f, 1.0f } }
vertex setup:
void SetupVertexData()
// create a VAO to hold all node rendering states, no need for binding:
GL_CHECK(glCreateVertexArrays(1, &m_vao));
// create vertex buffer objects for data and indices and initialize them:
// allocate memory for interleaved vertex attributes and transfer them to the GPU:
GL_CHECK(glNamedBufferData(m_vbo[EVbo::Data], VideoQuadInterleavedArray.size() * sizeof(VideoQuadVertex), VideoQuadInterle
GL_CHECK(glVertexArrayAttribBinding(m_vao, 0, 0));
GL_CHECK(glVertexArrayVertexBuffer(m_vao, 0, m_vbo[EVbo::Data], 0, sizeof(VideoQuadVertex)));
// setup the indices array:
GL_CHECK(glNamedBufferData(m_vbo[EVbo::Element], VideoQuadElementArray.size() * sizeof(GLuint),
GL_CHECK(glVertexArrayElementBuffer(m_vao, m_vbo[EVbo::Element]));
// enable the relevant attributes for this VAO and
// specify their format and binding point:
// vertices:
GL_CHECK(glEnableVertexArrayAttrib(m_vao, 0 /* location in shader*/));
0, // attribute location
3, // number of components in each data member
GL_FLOAT, // type of each component
GL_FALSE, // should normalize
offsetof(VideoQuadVertex, vertex) // offset from the begining of the buffer
// uvs:
GL_CHECK(glEnableVertexArrayAttrib(m_vao, 1 /* location in shader*/));
1, // attribute location
2, // number of components in each data member
GL_FLOAT, // type of each component
GL_FALSE, // should normalize
offsetof(VideoQuadVertex, uv) // offset from the begining of the buffer
GL_CHECK(glVertexArrayAttribBinding(m_vao, 1, 0));
vertex shader:
layout(location = 0) in vec3 position;
layout(location = 1) in vec2 texture_coordinate;
out FragmentData
vec2 uv;
} toFragment;
void main(void)
toFragment.uv = texture_coordinate;
gl_Position = vec4 (position, 1.0f);
fragment shader:
in FragmentData
vec2 uv;
} data;
out vec4 color;
layout (location = 3) uniform sampler2D tex_object;
void main()
color = texture(tex_object, data.uv);
GL_UNPACK_ALIGNMENT specifies the alignment requirements for the start of each pixel row in memory. By default GL_UNPACK_ALIGNMENT is set to 4.
This means each row of the texture is supposed to have a length of 4*N bytes.
You specify a 2*2 texture with the data: 255, 0, 0, 255, 0, 0, 255, 0, 0, 255, 0, 0
With GL_UNPACK_ALIGNMENT set to 4 this is interpreted as
column 1 column 2 alignment
row 1: 255, 0, 0, 255, 0, 0, 255, 0,
row 2: 0, 255, 0, 0, undef, undef
So the texture is read as
column 1 olumn 2
row 1: red, red,
row 2: green, RGB(0, ?, ?)
You have to set glPixelStorei(GL_UNPACK_ALIGNMENT, 1); before glTextureSubImage2D, for reading a tight packed texture.
If you do not want to change GL_UNPACK_ALIGNMENT (the alignment remains set to 4), you must adjust the data as follows:
struct DummyRGB8Texture2d
uint8_t data[8*2];
int width;
int height;
DummyRGB8Texture2d myTexture
255, 0, 0, 255, 0, 0, // row 1
0, 0, // 2 bytes alignment
255, 0, 0, 255, 0, 0, // row 2
0, 0 // 2 bytes alignment
See further:
Stackoverflow question glPixelStorei(GL_UNPACK_ALIGNMENT, 1) Disadvantages?
Stackoverflow question OpenGL GL_UNPACK_ALIGNMENT
Khronos OpenGL Common Mistakes - Texture upload and pixel reads

Failing to map a simple unsigned byte rgb texture to a quad:

I have a very simple program that maps a dummy red texture to a quad.
Here is the texture definition in C++:
struct DummyRGB8Texture2d
uint8_t data[3*4];
int width;
int height;
DummyRGB8Texture2d myTexture
This is how I setup the texture:
void SetupTexture()
// allocate a texture on the default texture unit (GL_TEXTURE0):
GL_CHECK(glCreateTextures(GL_TEXTURE_2D, 1, &m_texture));
// allocate texture:
GL_CHECK(glTextureStorage2D(m_texture, 1, GL_RGB8, myTexture.width, myTexture.height));
GL_CHECK(glTextureParameteri(m_texture, GL_TEXTURE_WRAP_S, GL_REPEAT));
GL_CHECK(glTextureParameteri(m_texture, GL_TEXTURE_WRAP_T, GL_REPEAT));
GL_CHECK(glTextureParameteri(m_texture, GL_TEXTURE_MAG_FILTER, GL_NEAREST));
GL_CHECK(glTextureParameteri(m_texture, GL_TEXTURE_MIN_FILTER, GL_NEAREST));
// tell the shader that the sampler2d uniform uses the default texture unit (GL_TEXTURE0)
GL_CHECK(glProgramUniform1i(m_program->Id(), /* location in shader */ 3, /* texture unit index */ 0));
// bind the created texture to the specified target. this is necessary even in dsa
GL_CHECK(glBindTexture(GL_TEXTURE_2D, m_texture));
This is how I draw the texture to the quad:
void Draw()
// load the texture to the GPU:
GL_CHECK(glTextureSubImage2D(m_texture, 0, 0, 0, myTexture.width, myTexture.height,
GL_CHECK(glDrawElements(GL_TRIANGLES, static_cast<GLsizei>(VideoQuadElementArray.size()), GL_UNSIGNED_INT, 0));
The Result:
I can't figure out why this texture won't appear Red. Also, if I change the texture internal format to RGBA / RGBA8 and the texture data array to have another element in each row, I get a nice red texture.
In case its relevant, here are my vertex attributes and my (very simple) shaders:
struct VideoQuadVertex
glm::vec3 vertex;
glm::vec2 uv;
std::array<VideoQuadVertex, 4> VideoQuadInterleavedArray
/* vec3 */ VideoQuadVertex{ glm::vec3{ -0.25f, -0.25f, 0.5f }, /* vec2 */ glm::vec2{ 0.0f, 0.0f } },
/* vec3 */ VideoQuadVertex{ glm::vec3{ 0.25f, -0.25f, 0.5f }, /* vec2 */ glm::vec2{ 1.0f, 0.0f } },
/* vec3 */ VideoQuadVertex{ glm::vec3{ 0.25f, 0.25f, 0.5f }, /* vec2 */ glm::vec2{ 1.0f, 1.0f } },
/* vec3 */ VideoQuadVertex{ glm::vec3{ -0.25f, 0.25f, 0.5f }, /* vec2 */ glm::vec2{ 0.0f, 1.0f } }
vertex setup:
void SetupVertexData()
// create a VAO to hold all node rendering states, no need for binding:
GL_CHECK(glCreateVertexArrays(1, &m_vao));
// create vertex buffer objects for data and indices and initialize them:
// allocate memory for interleaved vertex attributes and transfer them to the GPU:
GL_CHECK(glNamedBufferData(m_vbo[EVbo::Data], VideoQuadInterleavedArray.size() * sizeof(VideoQuadVertex), VideoQuadInterle
GL_CHECK(glVertexArrayAttribBinding(m_vao, 0, 0));
GL_CHECK(glVertexArrayVertexBuffer(m_vao, 0, m_vbo[EVbo::Data], 0, sizeof(VideoQuadVertex)));
// setup the indices array:
GL_CHECK(glNamedBufferData(m_vbo[EVbo::Element], VideoQuadElementArray.size() * sizeof(GLuint),
GL_CHECK(glVertexArrayElementBuffer(m_vao, m_vbo[EVbo::Element]));
// enable the relevant attributes for this VAO and
// specify their format and binding point:
// vertices:
GL_CHECK(glEnableVertexArrayAttrib(m_vao, 0 /* location in shader*/));
0, // attribute location
3, // number of components in each data member
GL_FLOAT, // type of each component
GL_FALSE, // should normalize
offsetof(VideoQuadVertex, vertex) // offset from the begining of the buffer
// uvs:
GL_CHECK(glEnableVertexArrayAttrib(m_vao, 1 /* location in shader*/));
1, // attribute location
2, // number of components in each data member
GL_FLOAT, // type of each component
GL_FALSE, // should normalize
offsetof(VideoQuadVertex, uv) // offset from the begining of the buffer
GL_CHECK(glVertexArrayAttribBinding(m_vao, 1, 0));
vertex shader:
layout(location = 0) in vec3 position;
layout(location = 1) in vec2 texture_coordinate;
out FragmentData
vec2 uv;
} toFragment;
void main(void)
toFragment.uv = texture_coordinate;
gl_Position = vec4 (position, 1.0f);
fragment shader:
in FragmentData
vec2 uv;
} data;
out vec4 color;
layout (location = 3) uniform sampler2D tex_object;
void main()
color = texture(tex_object, data.uv);
GL_UNPACK_ALIGNMENT specifies the alignment requirements for the start of each pixel row in memory. By default GL_UNPACK_ALIGNMENT is set to 4.
This means each row of the texture is supposed to have a length of 4*N bytes.
You specify a 2*2 texture with the data: 255, 0, 0, 255, 0, 0, 255, 0, 0, 255, 0, 0
With GL_UNPACK_ALIGNMENT set to 4 this is interpreted as
column 1 column 2 alignment
row 1: 255, 0, 0, 255, 0, 0, 255, 0,
row 2: 0, 255, 0, 0, undef, undef
So the texture is read as
column 1 olumn 2
row 1: red, red,
row 2: green, RGB(0, ?, ?)
You have to set glPixelStorei(GL_UNPACK_ALIGNMENT, 1); before glTextureSubImage2D, for reading a tight packed texture.
If you do not want to change GL_UNPACK_ALIGNMENT (the alignment remains set to 4), you must adjust the data as follows:
struct DummyRGB8Texture2d
uint8_t data[8*2];
int width;
int height;
DummyRGB8Texture2d myTexture
255, 0, 0, 255, 0, 0, // row 1
0, 0, // 2 bytes alignment
255, 0, 0, 255, 0, 0, // row 2
0, 0 // 2 bytes alignment
See further:
Stackoverflow question glPixelStorei(GL_UNPACK_ALIGNMENT, 1) Disadvantages?
Stackoverflow question OpenGL GL_UNPACK_ALIGNMENT
Khronos OpenGL Common Mistakes - Texture upload and pixel reads

Normals are not transfered to DirectX 11 shader correctly - random, time-dependent values?

Today I was trying to add normal maps to my DirectX 11 application.
Something went wrong. I've decided to output the normals' information instead of color on scene objects to "see" where lies the problem.
What surprised me is that the normals' values changes very fast (the colors are blinking each frame). And I'm sure that I don't manipulate with their values during program execution (the position of vertices stays stable, but the normals do not).
Here are two screens for some frames at t1 and t2:
My vertex structure:
struct MyVertex{//vertex structure
MyVertex() : weightCount(0), normal(0,0,0){
//textureCoordinates.x = 1;
//textureCoordinates.y = 1;
MyVertex(float x, float y, float z, float u, float v, float nx, float ny, float nz)
: position(x, y, z), textureCoordinates(u, v), normal(0,0,0), weightCount(0){
DirectX::XMFLOAT3 position;
DirectX::XMFLOAT2 textureCoordinates;
DirectX::XMFLOAT3 normal = DirectX::XMFLOAT3(1.0f, 0.0f, 0.0f);
//will not be sent to shader (and used only by skinned models)
int startWeightIndex;
int weightCount; //=0 means that it's not skinned vertex
The corresponding vertex layout:
layout[0] = { "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 };
layout[1] = { "TEXCOORD", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 12, D3D11_INPUT_PER_VERTEX_DATA, 0 };
layout[2] = { "NORMAL", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 20, D3D11_INPUT_PER_VERTEX_DATA, 0 };
Vertex buffer:
ZeroMemory(&bd, sizeof(bd));
bd.ByteWidth = sizeof(MyVertex) * structure->getVerticesCount();
bd.BindFlags = D3D11_BIND_VERTEX_BUFFER;
bd.CPUAccessFlags = 0;
ZeroMemory(&InitData, sizeof(InitData));
InitData.pSysMem = structure->vertices;
if(device->CreateBuffer(&bd, &InitData, &buffers->vertexBuffer) != S_OK){
return false;
And the shader that output normals "as color" (of course, if I set output.normal to float3(1,1,1), objects stays white):
struct Light
float3 diffuse;
float3 position;
float3 direction;
cbuffer cbPerObject : register(b0)
float4x4 WVP;
float4x4 World;
float4 difColor;
bool hasTexture;
bool hasNormMap;
cbuffer cbPerFrame : register(b1)
Light light;
Texture2D ObjTexture;
Texture2D ObjNormMap;
SamplerState ObjSamplerState;
TextureCube SkyMap;
struct VS_INPUT
float4 position : POSITION;
float2 tex : TEXCOORD;
float3 normal : NORMAL;
struct VS_OUTPUT
float4 Pos : SV_POSITION;
float4 worldPos : POSITION;
float3 normal : NORMAL;
float2 TexCoord : TEXCOORD;
float3 tangent : TANGENT;
VS_OUTPUT output;
//input.position.w = 1.0f;
output.Pos = mul(input.position, WVP);
output.worldPos = mul(input.position, World);
output.normal = input.normal;
output.tangent = mul(input.tangent, World);
output.TexCoord = input.tex;
return output;
float4 PS(VS_OUTPUT input) : SV_TARGET
return float4(input.normal, 1.0);
// Techniques
technique10 RENDER
pass P0
SetVertexShader( CompileShader( vs_4_0, VS() ) );
SetPixelShader( CompileShader( ps_4_0, PS() ) );
SetBlendState( SrcAlphaBlendingAdd, float4( 0.0f, 0.0f, 0.0f, 0.0f ), 0xFFFFFFFF );
Where have I made an mistake? Maybe there are other places in code that can cause that strange behavior (some locking, buffers, dunno...)?
As 413X suggested, I've run the DirectX Diagnostic:
What is strange that on the small preview, the screen looks the same as in program. But when I investigate that frame (screenshot), I got completely different colors:
Also, here's something strange - I pick the blue pixel and it's says it's black (on the right):
edit 2:
As catflier requested I post some additional code.
The rendering and buffers binding:
//set the object world matrix
DirectX::XMMATRIX objectWorldMatrix = DirectX::XMMatrixIdentity();
DirectX::XMMATRIX rotationMatrix = DirectX::XMMatrixRotationQuaternion(
DirectX::XMVectorSet(object->getOrientation().getX(), object->getOrientation().getY(), object->getOrientation().getZ(), object->getOrientation().getW())
irectX::XMMATRIX scaleMatrix = (
? DirectX::XMMatrixScaling(object->getHalfSize().getX(), object->getHalfSize().getY(), object->getHalfSize().getZ())
: DirectX::XMMatrixScaling(1.0f, 1.0f, 1.0f)
DirectX::XMMATRIX translationMatrix = DirectX::XMMatrixTranslation(object->getPosition().getX(), object->getPosition().getY(), object->getPosition().getZ());
objectWorldMatrix = scaleMatrix * rotationMatrix * translationMatrix;
UINT stride = sizeof(MyVertex);
UINT offset = 0;
context->IASetVertexBuffers(0, 1, &buffers->vertexBuffer, &stride, &offset); //set vertex buffer
context->IASetIndexBuffer(buffers->indexBuffer, DXGI_FORMAT_R16_UINT, 0); //set index buffer
//set the constants per object
ConstantBufferStructure constantsPerObject;
//set matrices
DirectX::XMFLOAT4X4 view = myCamera->getView();
DirectX::XMMATRIX camView = XMLoadFloat4x4(&view);
DirectX::XMFLOAT4X4 projection = myCamera->getProjection();
DirectX::XMMATRIX camProjection = XMLoadFloat4x4(&projection);
DirectX::XMMATRIX worldViewProjectionMatrix = objectWorldMatrix * camView * camProjection;
constantsPerObject.worldViewProjection = XMMatrixTranspose(worldViewProjectionMatrix); = XMMatrixTranspose(objectWorldMatrix);
//draw objects's non-transparent subsets
for(int i=0; i<structure->subsets.size(); i++){
setColorsAndTextures(structure->subsets[i], constantsPerObject, context); //custom method that insert data into constantsPerObject variable
//bind constants per object to constant buffer and send it to vertex and pixel shaders
context->UpdateSubresource(constantBuffer, 0, NULL, &constantsPerObject, 0, 0);
context->VSSetConstantBuffers(0, 1, &constantBuffer);
context->PSSetConstantBuffers(0, 1, &constantBuffer);
int start = structure->subsets[i]->getVertexIndexStart();
int count = structure->subsets[i]->getVertexIndexAmmount();
context->DrawIndexed(count, start, 0);
The rasterizer:
void RendererDX::initCull(ID3D11Device * device){
ZeroMemory(&cmdesc, sizeof(D3D11_RASTERIZER_DESC));
cmdesc.FillMode = D3D11_FILL_SOLID;
cmdesc.CullMode = D3D11_CULL_BACK;
cmdesc.FrontCounterClockwise = true;
cmdesc.FrontCounterClockwise = false;
cmdesc.CullMode = D3D11_CULL_NONE;
//cmdesc.FillMode = D3D11_FILL_WIREFRAME;
HRESULT hr = device->CreateRasterizerState(&cmdesc, &RSCullDefault);
edit 3:
The debugger output (there are some mismatches in semantics?):
D3D11 ERROR: ID3D11DeviceContext::DrawIndexed: Input Assembler - Vertex Shader linkage error: Signatures between stages are incompatible. The input stage requires Semantic/Index (NORMAL,0) as input, but it is not provided by the output stage. [ EXECUTION ERROR #342: DEVICE_SHADER_LINKAGE_SEMANTICNAME_NOT_FOUND]
D3D11 ERROR: ID3D11DeviceContext::DrawIndexed: Input Assembler - Vertex Shader linkage error: Signatures between stages are incompatible. Semantic 'TEXCOORD' is defined for mismatched hardware registers between the output stage and input stage. [ EXECUTION ERROR #343: DEVICE_SHADER_LINKAGE_REGISTERINDEX]
D3D11 ERROR: ID3D11DeviceContext::DrawIndexed: Input Assembler - Vertex Shader linkage error: Signatures between stages are incompatible. Semantic 'TEXCOORD' in each signature have different min precision levels, when they must bet identical. [ EXECUTION ERROR #3146050: DEVICE_SHADER_LINKAGE_MINPRECISION]
I am pretty sure your bytes are missaligned. A float is 4 bytes me thinks and a float4 is then 16 bytes. And it wants to be 16 byte aligned. So observe:
layout[0] = { "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 };
layout[1] = { "TEXCOORD", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 12, D3D11_INPUT_PER_VERTEX_DATA, 0 };
layout[2] = { "NORMAL", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 20, D3D11_INPUT_PER_VERTEX_DATA, 0 };
The value; 0,12,20. (AlignedByteOffset) Is where the value then starts. Which would mean; Position starts at 0. Texcoord starts at the end of a float3, which gives you wrong results. Because look inside the shader:
struct VS_INPUT
float4 position : POSITION;
float2 tex : TEXCOORD;
float3 normal : NORMAL;
And Normal at float3+float2. So generally, you want to align things more consistantly. Maybe even "padding" to fill the spaces to keep all the variables at 16 bytes aligned.
But to keep it more simple for you. You want to switch that statement to:
layout[0] = { "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 };
What happens now? Well, the thing aligns itself automagically, however it can be less optimal. But one thing about shaders, try to keep it 16 byte aligned.
Your data structure on upload doesn't match your Input Layout declaration.
since your data structure for vertex is :
struct MyVertex{//vertex structure
MyVertex() : weightCount(0), normal(0,0,0){
//textureCoordinates.x = 1;
//textureCoordinates.y = 1;
MyVertex(float x, float y, float z, float u, float v, float nx, float ny, float nz)
: position(x, y, z), textureCoordinates(u, v), normal(0,0,0), weightCount(0){
DirectX::XMFLOAT3 position;
DirectX::XMFLOAT2 textureCoordinates;
DirectX::XMFLOAT3 normal = DirectX::XMFLOAT3(1.0f, 0.0f, 0.0f);
//will not be sent to shader (and used only by skinned models)
int startWeightIndex;
int weightCount; //=0 means that it's not skinned vertex
startWeightIndex and weightCount will be copied into your vertex buffer (even if they do not contain anything useful.
If you check sizeof(MyVertex), you will have a size of 40.
Now let's look at your input layout declaration (whether you use automatic offset or not is irrelevant):
layout[0] = { "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 };
layout[1] = { "TEXCOORD", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 12, D3D11_INPUT_PER_VERTEX_DATA, 0 };
layout[2] = { "NORMAL", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 20, D3D11_INPUT_PER_VERTEX_DATA, 0 };
From what you see here, you are declaring a data structure of (12+8+12) = 32 bytes, which of course does not match your vertex size.
So first vertex will be fetched properly, but next ones will start to use invalid data (as the Input Assembler doesn't know that your data structure is bigger than what you specified to it).
Two ways to fix it:
1/ Strip your vertex declaration
In that case you modify your vertex structure to match your input declaration (I removed constructors for brevity:
struct MyVertex
{//vertex structure
DirectX::XMFLOAT3 position;
DirectX::XMFLOAT2 textureCoordinates;
DirectX::XMFLOAT3 normal = DirectX::XMFLOAT3(1.0f, 0.0f, 0.0f);
Now your vertex structure exactly matches your declaration, so vertices will be fetched properly.
2/Adapt your Input Layout declaration:
In that case you change your layout to make sure that all data contained in your buffer is declared, so it can be taken into account by the Input Assembler (see below)
Now your declaration becomes:
layout[0] = { "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 };
layout[1] = { "TEXCOORD", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 12, D3D11_INPUT_PER_VERTEX_DATA, 0 };
layout[2] = { "NORMAL", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 20, D3D11_INPUT_PER_VERTEX_DATA, 0 };
layout[4] = { "WEIGHTCOUNT", 0, DXGI_FORMAT_R32_SINT, 0, 36, D3D11_INPUT_PER_VERTEX_DATA, 0 };
So that means you inform the Input assembler of all the data that your structure contains.
In that case even if the data is not needed by your Vertex Shader, as you specified a full data declaration, Input assembler will safely ignore STARTWEIGHTINDEX and WEIGHTCOUNT, but will respect your whole structure padding.