Rendering multiple objects with OpenGL shaders - opengl

In OpenGL with shaders, I want to render two objects which I have loaded as two meshes. Each object is represented by a set of vertex positions, a set of vertex colours, and a set of vertex indices for the triangles.
There are three ways I can think of to draw the two objects. Which is the best practice?
1) I concatenate the vertex positions of the two objects into one long array of vertices, and similar for the vertex colours and the vertex indices. I then create one vertex position buffer, one vertex colour buffer, and one index buffer. When rendering, I then make one call to glBindBuffer(...) and glDrawElements(...).
2) I concatenate the vertex positions of the two objects into one long array of vertices, and similar for the vertex colours. I then create one vertex position buffer, and one vertex colour buffer. When rendering, I then make two calls to glBindBuffer(...) and glDrawElements(...), one for each object.
3) I create two vertex position buffers, two vertex colour buffers, and two index buffers. When rendering, I then make two calls to glBindBuffer(...) and glDrawElements(...), one for each object.
Thanks!

The general rule for OpenGL optimization is to minimize the number of state changes. And binding buffers is a state change.
So, all other things being equal, #1 would probably be the fastest.
However, it is not always particularly useful. Often times, different objects have different transforms relative to one another. So usually, you'll need to change some state between rendering the objects, so #1 is not an option.
Even so, you need not resort to option 2. You could use the same buffer setup as option 1, but just issue two draw calls. Each draw would render with part of the arrays.
Equally importantly, with vertex arrays, the issue is not so much binding buffers (though they aren't cheap). It's changing the vertex format that you use, the relative arrangements of attributes. If you use the separate attribute format API, you can easily change buffer bindings without touching the format. And usually, you will only have maybe 5-6 vertex formats throughout your entire program.
The other issue is that you have not fully explored the array of possibilities here. For example, in all of your cases, the positions, colors, normals and other attributes each inhabit separate buffers. Well... why are they in separate buffers? Interleaving your vertex attributes generally gives better performance.
So really, the answer for best performance is "none of the above".

With only 2 objects being rendered, none of these options will bring you even close to any kind of performance bottleneck. If you're targeting your frame rate to match the display refresh rate, which is typically around 60 fps, you only need to render 120 objects per second.
Say you need a good handful of state setup calls per object, that gets you in the range of 1000 state setting calls per second. While the performance characteristics are of course highly platform/vendor specific, a halfway decent driver will be able to handle a few million simple state setting calls (like binding buffers, setting up vertex attributes, etc.) per second. So you're at least about 3 or 4 orders of magnitude below a level where I would start to be worried about throughput.
Now, if you had thousands of objects, things start to look a little different. There are two things I would recommend for sure:
Use interleaved attributes. This means that you store the position and color of one vertex sequentially, followed by the attributes for the next vertex. Say you have 3 components (xk, yk, pk) for the position and 4 components (rk, gk, bk, ak) for the color of vertex k, the memory layout of the buffer is:
x0 y0 z0 r0 g0 b0 a0 x1 y1 z1 r1 g1 b1 a1 x2 y2 z2 r2 g2 b2 a2 ...
Use VAOs (Vertex Array Objects) to set up the state. This will allow you to set the entire vertex attribute state for an object with a single glBindVertexArray() call.
Whether it's better to have separate VBOs for each object, or to share larger VBOs between objects, is hard to tell in general. Having a large number of small VBOs could certainly be harmful to performance, and sharing them may be beneficial if you can arrange that fairly easily. But I can also picture scenarios where having very large VBOs could have adverse effects. So you may have to try different options if you wanted to get the maximum performance in the case where you have a lot of objects. Unfortunately, the results might very well be platform dependent.

Related

How can I draw two triangles using index buffer? [duplicate]

I have some vertex data. Positions, normals, texture coordinates. I probably loaded it from a .obj file or some other format. Maybe I'm drawing a cube. But each piece of vertex data has its own index. Can I render this mesh data using OpenGL/Direct3D?
In the most general sense, no. OpenGL and Direct3D only allow one index per vertex; the index fetches from each stream of vertex data. Therefore, every unique combination of components must have its own separate index.
So if you have a cube, where each face has its own normal, you will need to replicate the position and normal data a lot. You will need 24 positions and 24 normals, even though the cube will only have 8 unique positions and 6 unique normals.
Your best bet is to simply accept that your data will be larger. A great many model formats will use multiple indices; you will need to fixup this vertex data before you can render with it. Many mesh loading tools, such as Open Asset Importer, will perform this fixup for you.
It should also be noted that most meshes are not cubes. Most meshes are smooth across the vast majority of vertices, only occasionally having different normals/texture coordinates/etc. So while this often comes up for simple geometric shapes, real models rarely have substantial amounts of vertex duplication.
GL 3.x and D3D10
For D3D10/OpenGL 3.x-class hardware, it is possible to avoid performing fixup and use multiple indexed attributes directly. However, be advised that this will likely decrease rendering performance.
The following discussion will use the OpenGL terminology, but Direct3D v10 and above has equivalent functionality.
The idea is to manually access the different vertex attributes from the vertex shader. Instead of sending the vertex attributes directly, the attributes that are passed are actually the indices for that particular vertex. The vertex shader then uses the indices to access the actual attribute through one or more buffer textures.
Attributes can be stored in multiple buffer textures or all within one. If the latter is used, then the shader will need an offset to add to each index in order to find the corresponding attribute's start index in the buffer.
Regular vertex attributes can be compressed in many ways. Buffer textures have fewer means of compression, allowing only a relatively limited number of vertex formats (via the image formats they support).
Please note again that any of these techniques may decrease overall vertex processing performance. Therefore, it should only be used in the most memory-limited of circumstances, after all other options for compression or optimization have been exhausted.
OpenGL ES 3.0 provides buffer textures as well. Higher OpenGL versions allow you to read buffer objects more directly via SSBOs rather than buffer textures, which might have better performance characteristics.
I found a way that allows you to reduce this sort of repetition that runs a bit contrary to some of the statements made in the other answer (but doesn't specifically fit the question asked here). It does however address my question which was thought to be a repeat of this question.
I just learned about Interpolation qualifiers. Specifically "flat". It's my understanding that putting the flat qualifier on your vertex shader output causes only the provoking vertex to pass it's values to the fragment shader.
This means for the situation described in this quote:
So if you have a cube, where each face has its own normal, you will need to replicate the position and normal data a lot. You will need 24 positions and 24 normals, even though the cube will only have 8 unique positions and 6 unique normals.
You can have 8 vertexes, 6 of which contain the unique normals and 2 of normal values are disregarded, so long as you carefully order your primitives indices such that the "provoking vertex" contains the normal data you want to apply to the entire face.
EDIT: My understanding of how it works:

Rendering meshes with multiple indices

I have some vertex data. Positions, normals, texture coordinates. I probably loaded it from a .obj file or some other format. Maybe I'm drawing a cube. But each piece of vertex data has its own index. Can I render this mesh data using OpenGL/Direct3D?
In the most general sense, no. OpenGL and Direct3D only allow one index per vertex; the index fetches from each stream of vertex data. Therefore, every unique combination of components must have its own separate index.
So if you have a cube, where each face has its own normal, you will need to replicate the position and normal data a lot. You will need 24 positions and 24 normals, even though the cube will only have 8 unique positions and 6 unique normals.
Your best bet is to simply accept that your data will be larger. A great many model formats will use multiple indices; you will need to fixup this vertex data before you can render with it. Many mesh loading tools, such as Open Asset Importer, will perform this fixup for you.
It should also be noted that most meshes are not cubes. Most meshes are smooth across the vast majority of vertices, only occasionally having different normals/texture coordinates/etc. So while this often comes up for simple geometric shapes, real models rarely have substantial amounts of vertex duplication.
GL 3.x and D3D10
For D3D10/OpenGL 3.x-class hardware, it is possible to avoid performing fixup and use multiple indexed attributes directly. However, be advised that this will likely decrease rendering performance.
The following discussion will use the OpenGL terminology, but Direct3D v10 and above has equivalent functionality.
The idea is to manually access the different vertex attributes from the vertex shader. Instead of sending the vertex attributes directly, the attributes that are passed are actually the indices for that particular vertex. The vertex shader then uses the indices to access the actual attribute through one or more buffer textures.
Attributes can be stored in multiple buffer textures or all within one. If the latter is used, then the shader will need an offset to add to each index in order to find the corresponding attribute's start index in the buffer.
Regular vertex attributes can be compressed in many ways. Buffer textures have fewer means of compression, allowing only a relatively limited number of vertex formats (via the image formats they support).
Please note again that any of these techniques may decrease overall vertex processing performance. Therefore, it should only be used in the most memory-limited of circumstances, after all other options for compression or optimization have been exhausted.
OpenGL ES 3.0 provides buffer textures as well. Higher OpenGL versions allow you to read buffer objects more directly via SSBOs rather than buffer textures, which might have better performance characteristics.
I found a way that allows you to reduce this sort of repetition that runs a bit contrary to some of the statements made in the other answer (but doesn't specifically fit the question asked here). It does however address my question which was thought to be a repeat of this question.
I just learned about Interpolation qualifiers. Specifically "flat". It's my understanding that putting the flat qualifier on your vertex shader output causes only the provoking vertex to pass it's values to the fragment shader.
This means for the situation described in this quote:
So if you have a cube, where each face has its own normal, you will need to replicate the position and normal data a lot. You will need 24 positions and 24 normals, even though the cube will only have 8 unique positions and 6 unique normals.
You can have 8 vertexes, 6 of which contain the unique normals and 2 of normal values are disregarded, so long as you carefully order your primitives indices such that the "provoking vertex" contains the normal data you want to apply to the entire face.
EDIT: My understanding of how it works:

Draw a bunch of elements generated by CUDA/OpenCL?

I'm new to graphics programming, and need to add on a rendering backend for a demo we're creating. I'm hoping you guys can point me in the right direction.
Short version: Is there any way to send OpenGL an array of data for distinct elements, without having to issue a draw command for each element distinctly?
Long version: We have a CUDA program (will eventually be OpenCL) which calculates a bunch of data for a bunch of objects for us. We then need to render these objects using, e.g., OpenGL.
The CUDA kernel can generate our vertices, and using OpenGL interop, it can shove these in an OpenGL VBO and not have to transfer the data back to host device memory. But the problem is we have a bunch (upwards of a million is our goal) distinct objects. It seems like our best bet here is allocating one VBO and putting every object's vertices into it. Then we can call glDrawArrays with offsets and lengths of each element inside that VBO.
However, each object may have a variable number of vertices (though the total vertices in the scene can be bounded.) I'd like to avoid having to transfer a list of start indices and lengths from CUDA -> CPU every frame, especially given that these draw commands are going right back to the GPU.
Is there any way to pack a buffer with data such that we can issue only one call to OpenGL to render the buffer, and it can render a number of distinct elements from that buffer?
(Hopefully I've also given enough info to avoid a XY problem here.)
One way would be to get away from understanding these as individual objects and making them a single large object drawn with a single draw call. The question is, what data is it that distinguishes the objects from each other, meaning what is it you change between the individual calls to glDrawArrays/glDrawElements?
If it is something simple, like a color, it would probably be easier to supply this an additional per-vertex attribute. This way you can render all objects as one single large object using a single draw call with the indiviudal sub-objects (which really only exist conceptually now) colored correctly. The memory cost of the additional attribute may be well worth it.
If it is something a little more complex (like a texture), you may still be able to index it using an additional per-vertex attribute, being either an index into a texture array (as texture arrays should be supported on CUDA/OpenCL-able hardware) or a texture coordinate into a particular subregion of a single large texture (a so-called texture atlas).
But if the difference between those objects is something more complex, as a different shader or something, you may really need to render individual objects and make individual draw calls. But you still don't need to neccessarily make a round-trip to the CPU. With the use of the ARB_draw_indirect extension (which is core since GL 4.0, I think, but may be supported on GL 3 hardware (and thus CUDA/CL-hardware), don't know) you can source the arguments to a glDrawArrays/glDrawElements call from an additional buffer (into which you can write with CUDA/CL like any other GL buffer). So you can assemble the offset-length-information of each individual object on the GPU and store them in a single buffer. Then you do your glDrawArraysIndirect loop offsetting into this single draw-indirect-buffer (with the offset between the individual objects now being constant).
But if the only reason for issuing multiple draw calls is that you want to render the objects as single GL_TRIANGLE_STRIPs or GL_TRIANGLE_FANs (or, god beware, GL_POLYGONs), you may want to reconsider just using a bunch of GL_TRIANGLES so that you can render all objects in a single draw call. The (maybe) time and memory savings from using triangle strips are likely to be outweight by the overhead of multiple draw calls, especially when rendering many small triangle strips. If you really want to use strips or fans, you may want to introduce degenerate triangles (by repeating vertices) to seprate them from each other, even when drawn with a single draw call. Or you may look into the glPrimitiveRestartIndex function introduced with GL 3.1.
Probably not optimal, but you could make a single glDrawArray on your whole buffer...
If you use GL_TRIANGLES, you can fill your buffer with zeroes, and write only the needed vertices in your kernel. This way "empty" regions of your buffer will be drawn as 0-area polygons ( = degenerate polygons -> not drawn at all )
If you use GL_TRIANGLE_STRIP, you can do the same, but you'll have to duplicate your first vertex in order to make a fake triangle between (0,0,0) and your mesh.
This can seem overkill, but :
- You'll have to be able to handle as many vertices anyway
- degenerate triangles use no fillrate, so they are almost free (the vertex shader is still computed, though)
A probably better solution would be to use glDrawElements instead : In you kernel, you also generate an index list for your whole buffer, which will be able to completely skip regions of your buffer.

Rendering a mesh in OpenGL as a series of subgroups?

I'm completing a wavefront object parser and I want to use it to construct generic mesh objects. My engine uses OpenGL 4 and shaders to draw everything in my engine.
My question is about how to ensure best rendering efficiency for rendering a mesh.
A wavefront .obj file normally has many object sub-groups specified.
A sub-group might be assigned a specific material (e.g. a shiny red colour).
So a mesh might be a fairly complex collection of sub-groups, each with their own material assigned.
My questions are -
Q. Do I need to draw each sub-group separately e.g. with a call to glDrawElements for each sub-group ? (So if I had 4 separate sub-groups, I'd have to make four glDrawElements calls, thereby invoking the shader 4 times with 4 uniform changes (for the materials/textures) )
glDrawElements( GL_TRIANGLES, nNumIndicesInGroup, GL_UNSIGNED_INT, ((char*)NULL)+ first-vertex-offset );
If this is correct, then I'll have to calculate:
The indices in each sub-group (implying a separate index array and VAO for each sub-group)
The vertex offset of the start of the sub-group
This seems terribly inefficient, am I barking up the wrong tree?
Also, from the Wavefront obj wiki page:
Smooth shading across polygons is enabled by smoothing groups.
s 1
...
# Smooth shading can be disabled as well.
s off
...
Can anyone suggest what smooth shading values indicate? E.g. s1, s2, s4 etc.
Yes, you should draw each sub-group separately from the others. This is required till the state is different between sub-groups.
But you are making a too long step.
To avoid multiple draw calls, you can introduce a vertex attribute indicating an index used for accessing uniform array values (array of materials, array of textures). In this way, you need only one draw call, but you will have the cost of one additional attribute and its relative management.
I would avoid the above approach. What if a sub-group is textured and another one not? How do you discriminate whether to texture or not? Introducing other attributes? Seems confusing.
The first point is that the buffer object management is very flexible. Indeed you could have a single element buffer object and a single vertex buffer object: using offsets and interleaving you can satisfy every level of complexity. And then, on modern harware, using vertex array objects you can minimize the cost of the different buffer bindings.
Second point is that your software can group different sub-group having the same uniform state, joining multiple draw calls into a single one. Remember that you can use Multi draw entry points variants, and there's also the primitive restart that can aid you (in the case stripped primitives).
Other considerations are not usefull, because you have to draw anyway, regardless if it's complex or not. Successively, when you have a correct rendering, you could profile the application and the rendering, cutting-off the hot-spots.
Smoothing groups are a collection of vertices that are sharing the same option attribute (normals, texture coordinates). This is the case of element-indexed vertices.
To go deeper on subject, read one of the specification found by googling.

Using more than one index list in a single VAO

I'm probably going about this all wrong, but hey.
I am rendering a large number of wall segments (for argument's sake, let's say 200). Every segment is one unit high, even and straight with no diagonals. All changes in direction are a change of 90 degrees.
I am representing each one as a four pointed triangle fan, AKA a quad. Each vertex has a three dimensional texture coordinate associated with it, such as 0,0,0, 0,1,7 or 10,1,129.
This all works fine, but I can't help but think it could be so much better. For instance, every point is duplicated at least twice (Every wall is a contiguous line of segments and there are some three & four way intersections) and the starting corner texture coordinates (0,0,X and 0,1,X) are going to be duplicated for every wall with texture number X on it. This could be compressed even further by moving the O coordinate into a third attribute and indexing the S and T coordinates separately.
The problem is, I can't seem to work out how to do this. VAOs only seem to allow one index, and taken as a lump, each position and texture coordinate form a unique snowflake never to be repeated. (Admittedly, this may not be true on certain corners, but that's a very edge case)
Is what I want to do possible, or am I going to have to stick with the (admittedly fine) method I currently use?
It depends on how much work you want to do.
OpenGL does not directly allow you to use multiple indices. But you can get the same effect.
The most direct way is to use a Buffer Texture to access an index list (using gl_VertexID), which you then use to access a second Buffer Texture containing your positions or texture coordinates. Basically, you'd be manually doing what OpenGL automatically does. This will naturally be slower per-vertex, as attributes are designed to be fast to access. You also lose some of the compression features, as Buffer Textures don't support as many formats.
each vertex and texture coordinate form a unique snowflake never to be repeated
A vertex is not just position, but the whole vector formed by position, texture coordinates and all the other attributes. It is, what you referred to as "snowflake".
And for only 200 walls, I'd not bother about memory consumption. It comes down to cache efficiency. And GPUs use vertices – and that means the whole position and other attributes vector – as caching key. Any "optimization" like you want to do them will probably have a negative effect on performance.
But having some duplicated vertices doesn't hurt so much, if they're not too far apart in the primitive index list. Today GPUs can hold about between 30 to 1000 vertices (that is after transformation, i.e. the shader stage) in their cache, depending on the number of vertex attributes are fed and the number of varying variables delivered to the fragment shader stage. So if a vertex (input key) has been cached, the shader won't be executed, but the cached result fed to fragment processing directly.
So the optimization you should really aim for is cache locality, i.e. batching things up in a way, that shared/duplicated vertices are sent to the GPU in quick succession.