I´m writing a little OpenGL Engine 3D in C++ and Eclipse/Visual C++. In outline, my engine has several objects derived from typical GameObject virtual class. In my hierarchy there is several levels depending if the object has children and parent. For example, the terrain object is in the level 0. A tank loaded from blender is in the level 1, Etc.
The question is what is the best practice for rendering each object depending on the corresponding shader. If I have a list of objects renderer by the same shader program, I should render all objects VAOs between the clausules: glUseProgram(program_id) ... glUseProgram(0) instead of change the program for each object. I.e:
for each object
glUseProgram(object.program)
...
glBindVertexArray(m_pVao->m_vaoHandle);
for (GLuint i = 0; i < (m_iNumIndex / 3); i++)
{
offset = i * 3;
glDrawElements(GL_TRIANGLES, 3, GL_UNSIGNED_INT, &m_Index[offset]);
}
glBindVertexArray(0);
Or:
glUseProgram(object.program)
...
for each object in program.list
glBindVertexArray(object.m_pVao->m_vaoHandle);
for (GLuint i = 0; i < (m_iNumIndex / 3); i++)
{
offset = i * 3;
glDrawElements(GL_TRIANGLES, 3, GL_UNSIGNED_INT, &m_Index[offset]);
}
glBindVertexArray(0);
Sorry the pseudo-pseudocode.
The objects could be stored in std::vector and every shader have a list of objects.
Do the most costly operations (binding programs, shaders, textures, etc.) as little as possible. So, if your design allows it, do binding outside of the loop and reuse the resource. That's one of the most basic performance optimizations you would do later anyway.
Related
As I understand VAOs/VBOs currently, a VAO retains all the attribute information that has been set up since it was bound, eg. the offset, stride, number of components, etc. of a given vertex attribute within a VBO.
What I seem to be unclear on is how VAOs and VBOs work together. A lot of the examples I have seen specify the vertex attributes with respect to the currently bound VBO, and when the VAO is bound the data in the VBO become accessible. One way I can see of using VAOs in this way would be to have one per object (where each object uses its own VBO), but I've read that this is poor performance-wise because of switching between many VAOs unnecessarily. I also would rather like to avoid having to store all my object data in one monolithic VBO because I will need to add and remove objects within my scene at any time - as a 3D editor, I feel the application would be much better suited to having each geometry object own its own buffer, rather than in some large, preallocated VBO. (Is this a correct assumption?)
My question therefore is whether one VAO can store vertex attribute configurations independently of the VBOs? Would I be able to configure a VAO to expect data in a certain format (eg. position, normal, UV) and then "swap in" different VBOs as I draw the different geometry objects, or is the format information essentially bound only to the VBO itself? If the latter, is it worth me using VAOs at all?
ARB_vertex_attrib_binding allows you to separate Vao attribute format and buffer binding.
https://www.opengl.org/wiki/Vertex_Specification#Separate_attribute_format
Internally, when you configure your Vao, Vertex buffer is automatically associated with attribute index. With ARB_vertex_attrib_binding, you have new gl functions to define Attribute formats independently from the bound buffer, which may be switched with VertexBuffer functions.
Here some piece of code in c# with openTK: (full surce: https://github.com/jpbruyere/GGL/tree/ottd/Tetra )
The solution here is to build a VAO with all your meshes concatenated, keeping for each of them only
BaseVertex = the vertice offset in the VAO
IndicesOffset = the offset in the Element buffer (ebo index)
IndicesCount = and the total indice count of the model
protected void CreateVAOs()
{
//normal vao binding
vaoHandle = GL.GenVertexArray();
GL.BindVertexArray(vaoHandle);
GL.EnableVertexAttribArray(0);
GL.BindBuffer(BufferTarget.ArrayBuffer, positionVboHandle);
GL.VertexAttribPointer(0, 3, VertexAttribPointerType.Float, true, Vector3.SizeInBytes, 0);
... other attrib bindings come here
//ARB vertex attrib binding use for fast instance buffers switching
//note that I use 4 attrib indices to bind a matrix
GL.VertexBindingDivisor (instanceBufferIndex, 1);
for (int i = 0; i < 4; i++) {
GL.EnableVertexAttribArray (instanceBufferIndex + i);
GL.VertexAttribBinding (instanceBufferIndex+i, instanceBufferIndex);
GL.VertexAttribFormat(instanceBufferIndex+i, 4, VertexAttribType.Float, false, Vector4.SizeInBytes * i);
}
if (indices != null)
GL.BindBuffer(BufferTarget.ElementArrayBuffer, eboHandle);
GL.BindVertexArray(0);
}
Then, I define Instances of mesh with just a Matrix array for each, that's a normal buffer creation, but not staticaly bound to the vao.
instancesVboId = GL.GenBuffer ();
GL.BindBuffer (BufferTarget.ArrayBuffer, instancesVboId);
GL.BufferData<Matrix4> (BufferTarget.ArrayBuffer,
new IntPtr (modelMats.Length * Vector4.SizeInBytes * 4),
modelMats, BufferUsageHint.DynamicDraw);
GL.BindBuffer (BufferTarget.ArrayBuffer, 0);
To render such vao, I loop inside my instance array:
public void Bind(){
GL.BindVertexArray(vaoHandle);
}
public void Render(PrimitiveType _primitiveType){
foreach (VAOItem item in Meshes) {
GL.ActiveTexture (TextureUnit.Texture1);
GL.BindTexture (TextureTarget.Texture2D, item.NormalMapTexture);
GL.ActiveTexture (TextureUnit.Texture0);
GL.BindTexture (TextureTarget.Texture2D, item.DiffuseTexture);
//Here I bind the Instance buffer with my matrices
//that's a fast switch without changing vao confing
GL.BindVertexBuffer (instanceBufferIndex, item.instancesVboId, IntPtr.Zero,Vector4.SizeInBytes * 4);
//here I draw instanced with base vertex
GL.DrawElementsInstancedBaseVertex(_primitiveType, item.IndicesCount,
DrawElementsType.UnsignedShort, new IntPtr(item.IndicesOffset*sizeof(ushort)),
item.modelMats.Length, item.BaseVertex);
}
}
The final VAO is bound only once.
So, I need the way to render multiple objects(not instances) using one draw call. Actually I know how to do this, just to place data into single vbo/ibo and render, using glDrawElements.
The question is: what is efficient way to update uniform data without setting it up for every single object, using glUniform...?
How can I setup one buffer containing all uniform data of dozens of objects, include MVP matrices, bind it and perform render using single draw call?
I tried to use UBOs, but it's not what I need at all.
For rendering instances we just place uniform data, including matrices, at another VBO and set up attribute divisor using glVertexAttribDivisor, but it only works for instances.
Is there a way to do that I want in OpenGL? If not, what can I do to overcome overheads of setting uniform data for dozens of objects?
For example like this:
{
// setting up VBO
glGenBuffers(1, &vbo);
glBindBuffer(vbo);
glBufferData(..., data_size);
// setup buffer
for(int i = 0; i < objects_num; i++)
glBufferSubData(...offset, size, &(objects[i]));
// the same for IBO
.........
// when setup some buffer, that will store all uniforms, for every object
.........
glDrawElements(...);
}
Thanks in advance for helping.
If you're ok with requiring OpenGL 4.3 or higher, I believe you can render this with a single draw call using glMultiDrawElementsIndirect(). This allows you to essentially make multiple draw calls with a single API call. Each sub-call is defined by values in a struct of the form:
typedef struct {
GLuint count;
GLuint instanceCount;
GLuint firstIndex;
GLuint baseVertex;
GLuint baseInstance;
} DrawElementsIndirectCommand;
Since you do not want to draw multiple instances of the same vertices, you use 1 for the instanceCount in each draw call. The key idea is that you can still use instancing by specifying a different baseInstance value for each one. So each object will have a different gl_InstanceID value, and you can use instanced attributes for the values (matrices, etc) that you want to vary per object.
So if you currently have a rendering loop:
for (int k = 0; k < objectCount; ++k) {
// set uniforms for object k.
glDrawElements(GL_TRIANGLES, object[k].indexCount,
GL_UNSIGNED_INT, object[k].indexOffset * sizeof(GLuint));
}
you would instead fill an array of the struct defined above with the arguments:
DrawElementsIndirectCommand cmds[objectCount];
for (int k = 0; k < objectCount; ++k) {
cmds[k].count = object[k].indexCount;
cmds[k].instanceCount = 1;
cmds[k].firstIndex = object[k].indexOffset;
cmds[k].baseVertex = 0;
cmds[k].baseInstance = k;
}
// Rest of setup.
glMultiDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_INT, 0, objectCount, 0);
I didn't provide code for the full setup above. The key steps include:
Drop the cmds array into a buffer, and bind it as GL_DRAW_INDIRECT_BUFFER.
Store the per-object values in a VBO. Set up the corresponding vertex attributes, which includes specifying them as instanced with glVertexAttribDivisor(1).
Set up the per-vertex attributes as usual.
Set up the index buffer as usual.
For this to work, the indices for all the objects will have to be in the same index buffer, and the values for each attribute will have to be in the same VBO across all objects.
I'm currently trying to teach myself some OpenGL using some Tutorials and LWJGL. Obviously I'm just at rendering cubes.
What I've done up until now, and what works is, that for each cube I'll do
glUniformMatrix4(RenderProgram.ModelMatrixID, false,
renderobject.getTransformationBuffer());
glDrawElements(GL_TRIANGLES, renderobject.Model.countIndices(),
GL_UNSIGNED_INT, renderobject.Model.indexOffset);
Since that only gives me about 50-55 FPS with about 70k cubes, I decided trying instanced rendering, like so:
glDrawElementsInstanced(GL_TRIANGLES, Model.countIndices(),
GL_UNSIGNED_INT, 0, instanceCount);
Of course I've created another buffer for that beforehand, filling it with renderobject.getTransformationBuffer() of each cube and I'm binding this buffer before I try to draw instanced.
I also added it to my vertex shader like so layout(location = 12) in mat4 mModel and I've initialized the attrib pointers like so:
for (int i = 0; i < 4; i++) {
glEnableVertexAttribArray(12 + i);
glVertexAttribPointer(12 + i, 4, GL_FLOAT, false, Float.BYTES * 16,
Float.BYTES * 4 * i);
glVertexAttribDivisor(InstanceBufferID, 1);
}
I get no errors and while I don't see anything on screen, it's rendering and I see an FPS increase of about 350% so I think that I don't get the right model matrix in the shader.
Unfortunately I can't debug variable contents within the shader :) So I'm a little bit stumped as to what I might be missing or how I could unravel this... Also, obviously, Google didn't help me much either and SO just comes up with glDrawElements not working for people.
Edit: The accepted answer was the one error that could be determined from the code provided. However, I had another error in the code, which needed fixing before finally something was visible on the screen, which I'd like to share as well: I unbound the VAO before populating the VBO with the matrix data. As soon as I pushed that unbinding after loading the data into the VBO it worked!
Edit2: Interestingly the performance increase is even more imense now that something IS rendered. With my blank screen I got around 170 FPS for around 70k cubes. Now that it renders correctly I'm getting around 350-400 FPS for around 270k cubes! I didn't expect that.
The first argument to glVertexAttribDivisor should be the index of the vertex attribute that you want to use as an instanced array and not InstanceBufferID.
This should thus become:
for (int i = 0; i < 4; i++) {
glEnableVertexAttribArray(12 + i);
glVertexAttribPointer(12 + i, 4, GL_FLOAT, false, Float.BYTES * 16,
Float.BYTES * 4 * i);
glVertexAttribDivisor(12 + i, 1);
}
I have an array of Vertex Array Objects, that each contain a VBO reference, and and array of matrices, of the same size, such as:
unsigned int vaoArray[128];
matrix_t matrixArray[128];
rather than
for (i = 0; i < 128; i++)
{
glBindVertexArray(vaoArray[i]);
glUniformMatrix4fv(U_MVP_MATRIX_SLOT, 1, GL_FALSE, &matrixArray[i]);
glDrawArrays(BGL_TRIANGLE_FAN, 0, 4);
}
Is there a way I can push the entire array of VAOs and matrices to the GPU at once? Maybe using the instancing extension somehow?
I can't combine them all in one VAO/VBO, because the combination can change (this is drawing text, with each character having its own VAO/VBO combo).
And yes, I realize this all involves using ES 2.0 extensions. That's OK.
BTW, All of the VAOs for each character are identical except for the VBO id, if that helps.
I'm working for the first time on a 3D project (actually, I'm programming a Bullet Physics integration in a Quartz Composer plug-in), and as I try to optimize my rendering method, I began to use glDrawElements instead of the direct access to vertices by glVertex3d...
I'm very surprised by the result. I didn't check if it is actually quicker, but I tried on this very simple scene below. And, from my point of view, the rendering is really better in immediate mode.
The "draw elements" method keep showing the edges of the triangles and a very ugly shadow on the cube.
I would really appreciate some information on this difference, and may be a way to keep quality with glDrawElements. I'm aware that it could really be a mistake of mines...
Immediate mode
DrawElements
The vertices, indices and normals are computed the same way in the two method. Here are the 2 codes.
Immediate mode
glBegin (GL_TRIANGLES);
int si=36;
for (int i=0;i<si;i+=3)
{
const btVector3& v1 = verticesArray[indicesArray[i]];;
const btVector3& v2 = verticesArray[indicesArray[i+1]];
const btVector3& v3 = verticesArray[indicesArray[i+2]];
btVector3 normal = (v1-v3).cross(v1-v2);
normal.normalize ();
glNormal3f(-normal.getX(),-normal.getY(),-normal.getZ());
glVertex3f (v1.x(), v1.y(), v1.z());
glVertex3f (v2.x(), v2.y(), v2.z());
glVertex3f (v3.x(), v3.y(), v3.z());
}
glEnd();
glDrawElements
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_NORMAL_ARRAY);
glNormalPointer(GL_FLOAT, sizeof(btVector3), &(normalsArray[0].getX()));
glVertexPointer(3, GL_FLOAT, sizeof(btVector3), &(verticesArray[0].getX()));
glDrawElements(GL_TRIANGLES, indicesCount, GL_UNSIGNED_BYTE, indicesArray);
glDisableClientState(GL_NORMAL_ARRAY);
glDisableClientState(GL_VERTEX_ARRAY);
Thank you.
EDIT
Here is the code for the vertices / indices / normals
GLubyte indicesArray[] = {
0,1,2,
3,2,1,
4,0,6,
6,0,2,
5,1,4,
4,1,0,
7,3,1,
7,1,5,
5,4,7,
7,4,6,
7,2,3,
7,6,2 };
btVector3 verticesArray[] = {
btVector3(halfExtent[0], halfExtent[1], halfExtent[2]),
btVector3(-halfExtent[0], halfExtent[1], halfExtent[2]),
btVector3(halfExtent[0], -halfExtent[1], halfExtent[2]),
btVector3(-halfExtent[0], -halfExtent[1], halfExtent[2]),
btVector3(halfExtent[0], halfExtent[1], -halfExtent[2]),
btVector3(-halfExtent[0], halfExtent[1], -halfExtent[2]),
btVector3(halfExtent[0], -halfExtent[1], -halfExtent[2]),
btVector3(-halfExtent[0], -halfExtent[1], -halfExtent[2])
};
indicesCount = sizeof(indicesArray);
verticesCount = sizeof(verticesArray);
btVector3 normalsArray[verticesCount];
int j = 0;
for (int i = 0; i < verticesCount * 3; i += 3)
{
const btVector3& v1 = verticesArray[indicesArray[i]];;
const btVector3& v2 = verticesArray[indicesArray[i+1]];
const btVector3& v3 = verticesArray[indicesArray[i+2]];
btVector3 normal = (v1-v3).cross(v1-v2);
normal.normalize ();
normalsArray[j] = btVector3(-normal.getX(), -normal.getY(), -normal.getZ());
j++;
}
You can (and will) achieve the exact same results with immediate mode and vertex array based rendering. Your images suggest that you got your normals wrong. As you did not include the code with which you create your arrays, I can only guess what might be wrong. One thing I could imagine: you are using one normal per triangle, so in the normal array, you have to repeat that normal for each vertex.
You should be aware that a vertex in the GL is not just the position (which you specify via glVertex in immediate mode), but the set of all attributes like position, normals, texcoords and so on. So if you have a mesh where an end point is part of different triangles, this is only one vertex if all attributes are shared, not just the position. In your case, the normals are per triangle, so you will need different vertices (sharing position with some other vertices, but using a different normal) per triangle.
I began to use glDrawElements
Good!
instead of the direct access to vertices by glVertex3d...
There's nothing "direct" about immediate mode. In fact it's as far away from the GPU as you can get (on modern GPU architectures).
I'm very surprised by the result. I didn't check if it is actually quicker, but I tried on this very simple scene below. And, from my point of view, the rendering is really better with the direct access method.
Actually its several orders of magnitudes slower. Each and every glVertex call causes the overhead of a context switch. Also a GPU needs larger batches of data to work efficiently, so glVertex calls first fill a buffer created ad-hoc.
Your immediate code segment must be actually understand as following
glNormal3f(-normal.getX(),-normal.getY(),-normal.getZ());
glVertex3f (v1.x(), v1.y(), v1.z());
// implicit copy of the glNormal supplied above
glVertex3f (v2.x(), v2.y(), v2.z());
// implicit copy of the glNormal supplied above
glVertex3f (v3.x(), v3.y(), v3.z());
The reason for that is, that a vertex is not just a position, but the whole combination of its attributes. And when working with vertex arrays you must supply the full attribute vector to form a valid vertex.