How to instance draw with different transformations for multiple objects - opengl

Im having a little problem with glDrawArraysInstanced().
Right now Im trying to draw a chess board with pieces.
I have all the models loaded in properly.
Ive tried drawing pawns only with instance drawing and it worked. I would send an array with transformation vec3s to shader through a uniform and move throught the array with gl_InstanceID
That would be done with this for loop (individual draw call for each model):
for (auto& i : this->models) {
i->draw(this->shaders[0], count);
}
which eventually leads to:
glDrawArraysInstanced(GL_TRIANGLES, 0, vertices.size(), count);
where the vertex shader is:
#version 460
layout(location = 0) in vec3 vertex_pos;
layout(location = 1) in vec2 vertex_texcoord;
layout(location = 2) in vec3 vertex_normal;
out vec3 vs_pos;
out vec2 vs_texcoord;
out vec3 vs_normal;
flat out int InstanceID;
uniform mat4 modelMatrix;
uniform mat4 viewMatrix;
uniform mat4 projectionMatrix;
uniform vec3 offsets[16];
void main(void){
vec3 offset = offsets[gl_InstanceID]; //saving transformation in the offset
InstanceID = gl_InstanceID; //unimportant
vs_pos = vec4(modelMatrix * vec4(vertex_pos + offset, 1.f)).xyz; //using the offset
vs_texcoord = vec2(vertex_texcoord.x,1.f-vertex_texcoord.y);
vs_normal = mat3(transpose(inverse(modelMatrix))) * vertex_normal;
gl_Position = projectionMatrix * viewMatrix * modelMatrix * vec4(vertex_pos + offset,1.f); //using the offset
}
Now my problem is that I dont know how to draw multiple objects in this way and change their transformations since gl_InstanceID starts from 0 on each draw call and thus my array with transformations would be used again from the beggining (which would just draw next pieces on pawns positions).
Any help will be appreciated.

You've got two problems. Or rather, you have one problem, but the natural solution will create a second problem for you.
The natural solution to your problem is to use one of the base-instance rendering functions, like glDrawElementsInstancedBaseInstance. These allow you to specify a starting instance for your instanced rendering calls.
This will precipitate a second problem: gl_InstanceID does not respect the base instance. It will always be on the range [0, instancecount). Only instance arrays respect the base instance. So instead of using a uniform to provide your per-instance data, you must use instance array rendering. This means storing the per-instance data in a buffer object (which you should have done anyway) and accessing it via a VS input whose VAO specifies that the particular attribute is instanced.
This also has the advantage of not restricting your instance count to uniform limitations.
OpenGL 4.6/ARB_shader_draw_parameters allows access to the gl_BaseInstance vertex shader input, which provides the baseinstance value specified by the draw command. So if you don't want to/can't use instanced arrays (for example, the amount of per-instance data is too big for the attribute limitations), you will have to rely on that extension/4.6 functionality. Recent desktop GL drivers offer this functionality, so if your hardware is decently new, you should be able to use it.

Related

Rendering breaks when using an int in GLSL

I'm rendering light spheres for a deferred renderer and I'm in the process of switching to instancing for better performance. I have the following vertex shader:
in vec3 position;
uniform int test_index;
uniform mat4 projectionMatrix;
uniform mat4 viewMatrix;
uniform mat4 modelMatrix[256];
void main(void) {
gl_Position = projectionMatrix * viewMatrix * modelMatrix[test_index] * vec4(position, 1.0);
}
I upload the matrices to the shader with
val location = glGetUniformLocation(programId, "modelMatrix[$i]")
glUniformMatrix4fv(location, false, buf)
When I use the int uniform in the index (Hardcoded to a 0 for debugging purposes), the sphere disappears, except when I clip into geometry (in which case it renders as a white circle). The same happens when I use gl_InstanceID as my index.
Weirdly I noticed that the problem also occurs when I pass an int from vertex to fragment shader and use it there for something completely different, regardless of what I use as the index.
The problem disappears instantly and rendering is completely fine when I hardcode modelMatrix[0] in the shader instead of modelMatrix[test_index].
I've got a different shader (for skeletal animation) which uploads a mat4 uniform array the exact same way, also being indexed with an int, but I've got no such problems there...
I don't really know what to make of this, so any advice on how I can debug this is much appreciated. I'm using OpenGL 3.3 on Kotlin+LWJGL
Edit: This probably has nothing to do with the uniform. The following also does not work:
int i = 0;
gl_Position = projectionMatrix * viewMatrix * modelMatrix[i] * vec4(position, 1.0f);
OpenGL has a limit on how many uniforms one can use. The same applies to attributes too (but that's not the problem here). An array of 256 matrices is very likely to exceed the allowed amount.
The reason why the code only breaks when using the int uniform is that glsl compilers do a lot of optimization under the hood, for example, removing unused uniforms. So if you hardcode the array location in the shader, the compiler will notice that only a single matrix is ever used and might remove all the others.
When you need more uniforms than what OpenGL allows for, you have to use a uniform buffer object (UBO) or a shader storage buffer (SSBO).

GLSL - vertex shader and batching using uniform buffer object

I have the following vertex shader:
#version 450 core
...
layout (binding=2, std140) uniform MATRIX_BLOCK
{
mat4 projection;
mat4 view;
mat4 model[128];
mat4 mvp[128];
};
layout (location = 0) in vec3 aPos;
layout (location = 1) in vec2 aTexCoord;
layout (location = 3) in uint object_idx;
out vec2 TexCoord;
flat out uint instance_idx;
void main()
{
gl_Position = mvp[object_idx] * vec4(aPos.x, aPos.y, aPos.z, 1.0);
TexCoord = aTexCoord;
instance_idx = object_idx;
}
I'm using a uniform buffer to pass in 128 model and model-view-projection matrices, indexed by an object id. The object id is passed to the shader using a vertex attribute object_idx; basically every vertex, besides having x,y,z coordinates and u,v texture coordinates, also has an object id associated with it. The idea would be to be able to store the data for multiple objects in the same buffers but still use specific transformation matrices for each individual object. This is my (possibly stupid) attempt to batch together multiple objects to draw them with a single draw call, without having to rebind anything, using glDrawElements to render triangles.
However, it doesn't work. When I do
gl_Position = mvp[object_idx] * vec4(aPos.x, aPos.y, aPos.z, 1.0);
then triangles with an object_idx of 0 get rendered just fine at the expected position, but triangles with vertices with object_idx's other than 0 don't appear anywhere. I thought I might have gotten the transformation matrices wrong, so for debugging, I reduced the possible objects to just 2 (0 and 1) and inverted the indexing using
gl_Position = mvp[1-object_idx] * vec4(aPos.x, aPos.y, aPos.z, 1.0);
This resulted in all the triangles with object_idx = 0 being rendered at the expected position for mvp[1], but again, no triangles with object_idx = 1 appearing anywhere. So at least I know that the transformation matrices are correct. I then tried
gl_Position = mvp[0] * vec4(aPos.x, aPos.y, aPos.z, 1.0);
and that renders all triangles (using object 0's transformation matrix) and
gl_Position = mvp[1] * vec4(aPos.x, aPos.y, aPos.z, 1.0);
renders all of them, using object 1's transformation matrix.
So, obviously I don't understand something really fundamental about how vertex shaders or glDrawElements do their work.
So, my question:
Why don't all my triangles get rendered when I do a "dynamic" lookup of the mvp transformation matrix using object_idx, when to the best of my ability to check it all the data is passed into the vertex shader just as it's supposed to be?
I'm making an educated guess here:
layout (location = 3) in uint object_idx;
Using uint attribute input requires to set up the attribute pointer with the function glVertexAttribIPointer() (note the extra I in there).
Using the standard glVertexAttribPointer function will always set up float attributes (and using type GL_INT there will convert from integer to float). Technically, when you read such a float attribute as uint in the shader, it will be undefined, but it is quite likely the 0 stays 0 as the float and integer represations of that are usually identical, but that works only by accident.
Apart from that issue, storing the object index per vertex is also quite inefficient. To effectively batch your draw calls, you should have a look at multi draw calls and gl_DrawID (originally from GL_ARB_shader_draw_parameters) features. You might also find the approaching zero driver overhead (AZDO) techniques useful.

OpenGL instancing : how to debug missing per instance data

I am relatively familiar with instanced drawing and per instance data: I've implemented this in the past with success.
Now I am refactoring some old code, and I introduced a bug on how per instance data are supplied to shaders.
The relevant bits are the following:
I have a working render loop implemented using glMultiDrawElementsIndirect: if I ignore the per instance data everything draws as expected.
I have a vbo storing the world transforms of my objects. I used AMD's CodeXL to debug this: the buffer is correctly populated with data, and is bind when drawing a frame.
glBindBuffer(GL_ARRAY_BUFFER,batch.mTransformBuffer);
glBufferData(GL_ARRAY_BUFFER, sizeof(glm::mat4) * OBJ_NUM, &xforms, GL_DYNAMIC_DRAW);
The shader specifies the input location explicitly:
#version 450
layout(location = 0) in vec3 vertexPos;
layout(location = 1) in vec4 vertexCol;
//...
layout(location = 6)uniform mat4 ViewProj;
layout(location = 10)uniform mat4 Model;
The ViewProj matrix is equal for all instances and is set correctly using:
glUniformMatrix4fv(6, 1, GL_FALSE, &viewProjMat[0][0]);
Model is per instance world matrix it's wrong: contains all zeros.
After binding the buffer and before drawing each frame, I am trying to setup the attribute pointers and divisors in such a way that every drawn instance will receive a different transform:
for (size_t i = 0; i < 4; ++i)
{
glEnableVertexAttribArray(10 + i);
glVertexAttribPointer(10 + i, 4, GL_FLOAT, GL_FALSE,
sizeof(GLfloat) * 16,
(const GLvoid*) (sizeof(GLfloat) * 4 * i));
glVertexAttribDivisor(10 + i, 1);
}
Now, I've looked and the code for a while and I really can't figure out what I am missing. CodeXL clearly show that Model (location 10) isn't correctly filled. No OpenGL error is generated.
My question is: does anyone know under which circumstances the setup of per instance data may fail silently? Or any suggestion on how to debug further this issue?
layout(location = 6)uniform mat4 ViewProj;
layout(location = 10)uniform mat4 Model;
These are uniforms, not input values. They don't get fed by attributes; they get fed by glUniform* calls. If you want Model to be an input value, then qualify it with in, not uniform.
Equally importantly, inputs and uniforms do not get the same locations. What I mean is that uniform locations have a different space from input locations. An input can have the same location index as a uniform, and they won't refer to the same thing. Input locations only refer to attribute indices; uniform locations refer to uniform locations.
Lastly, uniform locations don't work like input locations. With attributes, each vec4-equivalent uses a separate attribute index. With uniform locations, every basic type (anything that isn't a struct or an array) uses a single uniform location. So if ViewProj is a uniform location, then it only takes up 1 location. But if Model is an input, then it takes up 4 attribute indices.

Is glVertexAttribpointer used only for vertex, UVs, colors, and normals ? Nothing else?

I want to incorporate a custom attribute that varies per vertex. In this case it is assigned to location=4 ... but nothing happens, the other four attributes vary properly except that one. At the bottom, I added a test to produce a specific color if it encounters the value '1' (which I know exists in the buffer, because I queried the buffer earlier). Attribute 4 is stuck at the first value of its array and never moves.
Am I missing a setting ? (something to be enabled maybe ?) or is it that openGL only varies a handful attributes but nothing else ?
#version 330 //for openGL 3.3
//uniform variables stay constant for the whole glDraw call
uniform mat4 ProjViewModelMatrix;
uniform vec4 DefaultColor; //x=-1 signifies no default color
//non-uniform variables get fed per vertex from the buffers
layout (location=0) in vec3 coords; //feeding from attribute=0 of the main code
layout (location=1) in vec4 color; //per vertex color, feeding from attribute=1 of the main code
layout (location=2) in vec3 normals; //per vertex normals
layout (location=3) in vec2 UVcoord; //texture coordinates
layout (location=4) in int vertexTexUnit;//per vertex texture unit index
//Output
out vec4 thisColor;
out vec2 vertexUVcoord;
flat out int TexUnitIdx;
void main ()
{
vertexUVcoord = UVcoord;
TexUnitIdx=vertexTexUnit;
if (DefaultColor.x==-1) {thisColor = color;} //If no default color is set, use per vertex colors
else {thisColor = DefaultColor;}
gl_Position = ProjViewModelMatrix * vec4(coords,1.0); //This outputs the position to the graphics card.
//TESTING
if (vertexTexUnit==1) thisColor=vec4(1,1,0,1); //Never receives value of 1, but the buffer does contain such values
}
Because the vertexTexUnit attribute is an integer, you must use glVertexAttribIPointer() instead of glVertexAttribPointer().
You can use vertex attributes for whatever you want. OpenGL doesn't know or care what you're using them for.

Efficient way to manage matrices within a graphic application using Texture Buffer Object(s) (OpenGL)

I'm developping a little 3D Engine using OpenGL and GLSL. I currently use Texture Buffer Objects (TBOs) to store all my matrices (Proj, View, Model and Shadow Matrices). But I did some researches on what is the best way to handle matrices (I mean the most efficient way) within a graphic engine, without any success. The goal is to store a maximum of matrices into a minimum number of TBO and occur a minimum of state changes and a minimum of exchanges between the GPU and client code (glBufferSubData).
I propose 2 different methods (with their advantages and disadvantages):
Here's a scene example:
1 Camera (1 ProjMatrix, 1 ViewMatrix)
5 boxes (5 ModelMatrix)
Here's an example of a simple vertex shader I use:
#version 400
/*
** Vertex attributes.
*/
layout (location = 0) in vec4 VertexPosition;
layout (location = 1) in vec2 VertexTexture;
/*
** Uniform matrix buffer.
*/
uniform samplerBuffer matrixBuffer;
/*
** Matrix buffer offset.
*/
uniform int MatrixBufferOffset;
/*
** Output variables.
*/
out vec2 TexCoords;
/*
** Returns matrix4x4 from texture cache.
*/
mat4 Get_Matrix(int offset)
{
return (mat4(texelFetch(
matrixBuffer, offset), texelFetch(
matrixBuffer, offset + 1), texelFetch(matrixBuffer, offset + 2),
texelFetch(matrixBuffer, offset + 3)));
}
/*
** Vertex shader entry point.
*/
void main(void)
{
TexCoords = VertexTexture;
{
mat4 ModelViewProjMatrix = Get_Matrix(
MatrixBufferOffset);
gl_Position = ModelViewProjMatrix * VertexPosition;
}
}
1) The method I currently use: in my vertex shader I use to use ModelViewProjMatrix (needed for rasterization(gl_Position)), ModelViewMatrix (for lighting calculations) and ModelMatrix. So to avoid useless calculation within the vertex shader I've decided to store the ModelViewProjMatrix, the ModelViewMatrix and the ModelMatrix for each mesh node inlined in the TBO as follow:
TBO = {[ModelViewProj_Box1][ModelView_Box1][Model_Box1]|[ModelViewProj_Box2]...}
Advantages: I don't need to compute the product Proj * View * Model (ModelViewProj for example) for each vertex shader (the matrices are pre-calculated).
Disadvantages: if I move the camera I need to update all the ModelViewProj and ModelView matrices. So, a lot of informations to update.
2) I thought about an other way, I think more efficient: store once the projection matrix, once the view matrix and finally each box scene node model matrix once again this way:
TBO = {[ProjMatrix][ViewMatrix][ModelMatrix_Box1][ModelMatrix_Box2]...}
So my vertex shader will look like this:
#version 400
/*
** Vertex attributes.
*/
layout (location = 0) in vec4 VertexPosition;
layout (location = 1) in vec2 VertexTexture;
/*
** Uniform matrix buffer.
*/
uniform samplerBuffer matrixBuffer;
/*
** Matrix buffer offset.
*/
uniform int MatrixBufferOffset;
/*
** Output variables.
*/
out vec2 TexCoords;
/*
** Returns matrix4x4 from texture cache.
*/
mat4 Get_Matrix(int offset)
{
return (mat4(texelFetch(
matrixBuffer, offset), texelFetch(
matrixBuffer, offset + 1), texelFetch(matrixBuffer, offset + 2),
texelFetch(matrixBuffer, offset + 3)));
}
/*
** Vertex shader entry point.
*/
void main(void)
{
TexCoords = VertexTexture;
{
mat4 ProjMatrix = Get_Matrix(MatrixBufferOffset);
mat4 ViewMatrix = Get_Matrix(MatrixBufferOffset + 4);
mat4 ModelMatrix = Get_Matrix(MatrixBufferOffset + 8);
gl_Position = ProjMatrix * ViewMatrix * ModelMatrix * VertexPosition;
}
}
Advantages: The TBO contains the exact number of matrices used. The update is highly targeted (if I move the camera I only updates the view matrix, if I resize the window I only updates the projection matrix and finally if a object is moving only its model matrix will be updated).
Disadvantages: I need to compute fo each vertex within the vertex shader the ModelViewProjMatrix. Plus, if the scene is composed of a huge number of object with each of them owning a different model matrix, I probably need to create a new TBO. Consequently, I will loose the proj/view matrix information because I won't be connect to the right TBO, which bring us to my third method.
3) Store the Projection and View matrix in a TBO and all the other model matrices within another or others TBO(s) as follow:
TBO_0 = {[ProjMatrix][ViewMatrix]}
TBO_1 = {[ModelMatrix_Box1][ModelMatrix_Box2]...}
What do you think of my 3 methods ? Which one is the best for you?
Thanks a lot in advance for your help!
The solution 3 is what most engines do, except they use uniform buffers (constant buffers) instead of texture buffers. Also they don't generally group all the model matrices together in the same buffer, they usually are grouped by object type (because same objects are drawn at once with instancing) and sometimes by frequency of update (objects that never move are in the same buffer so that it never needs to be updated).
Also glBufferSubData can be pretty slow; updating a buffer is often slower than just binding a different one, because of all the synchronization happening inside the driver. There is a very good book chapter about that, freely available on the Internet, called "OpenGL Insights: Asynchronous Buffer Transfers" (Google it to find it).
EDIT: The nvidia article you linked in the comments is very interesting. They recommend using glMultiDrawElements to make several draw calls at once (that's the main trick, everything else is there because of that decision). That can reduce the CPU work in the driver a lot, but that also mean that it's a lot more complicated to provide all the data required to draw the objects: you have to build/update bigger buffers for the matrices / material values and, you also need to use something like bindless textures to be able to have different textures for each object. So, interesting, but more complicated.
And glMultiDrawElements is only important if you want to draw a lot of different objects. Their examples have 68000-98000 different meshes, that's really a lot. In a game, for example, you usually have lots of instances of the same objects, but only a few hundred of different objects (maximum). In the end, it depends on what your 3D engine needs to render.