Render multiple models in OpenGL with a single draw call - opengl

I built a 2D graphical engine, and I created a batching system for it, so, if I have 1000 sprites with the same texture, I can draw them with one single call to openGl.
This is achieved by putting in a single vbo vertex array all the vertices of all the sprites with the same texture.
Instead of "print these vertices, print these vertices, print these vertices", I do "put all the vertices toghether, print", just to be very clear.
Easy enough, but now I'm trying to achieve the same thing in 3D, and I'm having a big problem.
The problem is that I'm using a Model View Projection matrix to place and render my models, which is the common approach to render a model in 3D space.
For each model on screen, I need to pass the MVP matrix to the shader, so that I can use it to transform each vertex to the correct position.
If I would do the transformation outside the shader, it would be executed by the cpu, which I not a good idea, for obvious reasons.
But the problem lies there. I need to pass the matrix to the shader, but for each model the matrix is different.
So I cannot do the same I did with 2d sprites, because changing a shader uniform requires a draw every time.
I hope I've been clear, maybe you have a good idea I didn't have or you already had the same problem. I know for a fact that there is a solution somewhere, because in engine like Unity, you can use the same shader for multiple models, and get away with one draw call

There exists a feature exactly like what you're looking for, and it's called instancing. With instancing, you store n matrices (or whatever else you need) in a Uniform Buffer and call glDrawElementsInstanced to draw n copies. In the shader, you get an extra input gl_InstanceID, with which you index into the Uniform Buffer to fetch the matrix you need for that particular instance.
You can read more about instancing here: https://www.opengl.org/wiki/Vertex_Rendering#Instancing

The answer depends on whether the vertex data for each item is identical or not. If it is, you can use instancing as in #orost's answer, using glDrawElementsInstanced, and gl_InstanceID within the vertex shader, and that method should be preferred.
However, if each 3D model requires different vertex data (which is frequently the case), you can still render them using a single draw call. To do this, you would add another stream into your vertex data with glVertexAttribPointer (and glEnableVertexAttribArray). This extra stream would contain the index of the matrix within the uniform buffer that vertex should use when rendering - so each mesh within the VBO would have an identical index in the extra stream. The uniform buffer contains the same data as in the instancing setup.
Note this method may require some extra CPU processing, if you need to redo the batching - for example, an object within a batch should not be rendered anymore. If this process is required frequently, it should be determined whether batching items is actually beneficial or not.

Besides instancing and adding another vertex attribute as some object ID, I'd like to also mention another strategy (which requires modern OpenGL, though):
The extension ARB_multi_draw_indirect (in core since GL 4.3) adds indirect drawing commands. These commands do source their parameters (number of vertices, starting index and so on) directly from another buffer object. With these functions, many different objects can be drawn with a single draw call.
However, as you still want some per-object state like transformation matrices, that feature is not enough. But in combination with ARB_shader_draw_parameters (not in core GL yet), you get the gl_DrawID parameter, which will be incremented by one for each single object in one mult draw indirect call. That way, you can index into some UBO, or TBO, or SSBO (or whatever) where you store per-object data.

Related

Efficiently transforming many different models in modern OpenGL

Suppose I want to render many different models, each with a different transformation matrix I want to be applied to their vertices. As far as I understand, the naive approach is to specify a matrix uniform in the vertex shader, the value of which is updated for each mesh during rendering.
It's obvious to me that this is a bad idea, due to the expense of many uniform updates and draw calls. So, what is the most efficient way to achieve this in modern OpenGL?
I've genuinely tried to find a straight, clear answer to this question. Most answers I find vaguely mention UBOs, or instance drawing (which afaik won't work unless you are drawing instances of the same mesh many times, which is not my goal).
With OpenGL 4.6 or with ARB_shader_draw_parameters, each draw in a multi-draw rendering command (functions of the form glMultiDraw*) is assigned a draw index from 0 to the number of draw calls specified by that function. This index is provided to the Vertex Shader via the gl_DrawID input value. You can then use this index to fetch a matrix from any number of constructs: UBOs, SSBOs, buffer textures, etc.
This works for multi-draw indirect rendering as well. So in theory, you can have a compute shader operation generate a bunch of rendering commands, then render your entire scene with a single draw call (assuming that all of your objects live in the same vertex buffers and can use the same shader and other state). Or at the very least, a large portion of the scene.
Furthermore, this index is considered dynamically uniform, so you can also use it (or values derived from it and other dynamically uniform values) to index into arrays of textures, fetch a texture from an array of bindless textures, or the like.

OpenGL best practice for putting two different mesh in the same vertex VBO

After some searching, it is said that the separated VAOs which shares the exact same shader attribute layouts, merging these into one VAO and put all these datas into one VBO so that I can draw this objects with only one draw call.
This perfectly makes sense, but how about uniform variables? Say that I want to draw tree and ball. these have different count of vertices, have different transform but share exactly the same shader program.
Until now, My program was like
// generate VAOs, called only once
glGenVertexArray(1, &treeVaoId);
// generate VBO and bind tree vertices
// enabled vertex attribute and set it for shader program
glGenVertexArray(1, &ballVaoId);
// repeat for ball
// draw, called each frame
// give the tree's transform to shader as uniform
glDrawArrays(...) // first draw call
// repeat for ball
glDrawArrays(...) // second draw call
And with these vertices to one VAO and VBO, like:
glGenVertexArray(1, &treeBallId);
// generate enough size of VBO and bind tree vertices and ball vertices after it.
// enabled vertex attribute and set it for shader program
// to draw, I have to separately give transform to it's uniform to each tree and ball, but how?
glDrawArrays(...)
Sorry for poor example, but the point is, Is there a way for giving different uniform variable while drawing one VAO? Or, is my approach totally wrong?
The purpose of batching is to improve performance by minimizing state changes between draw calls (batching reduces them to 0, since there is nothing between draw calls). However, there are degrees of performance improvement, and not all state changes are equal.
On the scale of the costs of state changes, changing program uniforms is the least expensive state change. That's not to say that it's meaningless, but you should consider how much effort you really want to spend compared to the results you get out of it. Especially if you're not pushing hardware as fast as possible.
VAO changes (non-buffer-only changes, that is) are among the more expensive state changes, so you gained a lot by eliminating them.
As the name suggests, uniform variables cannot be changed from one shader instance to another within a draw call. Even a multi-draw call. But that doesn't mean that there's nothing that can be done.
Multi-draw functionality allows you to issue multiple draw calls in a single function call. These individual draws can get their vertex data from different parts of the vertex and index buffers. What you need is a way to communicate to your vertex shader which draw call it is taking part in, so that it can index an array of some sort to extract that draw call's per-object data.
Semi-recent hardware has access to gl_DrawID, which is the index into a multi-draw command of the particular draw call being executed. The ARB_shader_draw_parameters extension is fairly widely implemented. You can use that index to fetch per-object data from a UBO or SSBO.

How to render multiple objects that can each have multiple shaders in OpenGL 3.3?

Im trying to make a 3D renderer with OpenGL using c++, well, so far I have a Scene class that contains a list of Objects and Materials objects (I also have classes for those and I written my code so an object can have multiple shaders (every shader will be able to affect a group of vertices in an object) but now I'm trying to find a good way to send all that information to openGL.
I've seen people suggest taking everything that uses the same shader and rendering that at once, and do the same for every shader, well If I understood well enough,but is that a good idea if you can get the same shaders included in different objects, if I merged every vert that has shader A for example, won't it hurt that that group contains verts of separate objects when I try to draw them at once ? And if I take each object and separate each object according to their shaders, so for the rendering I would take Object A then split into its shader groups, then draw shadergroup1 in object1 then shader group2 in object 2 and so on.. Won't that be too many draw calls too.
What strategy do you recommend to accomplish that ?
The first things I recommend is, that you stop thinking in terms of "objects", as far as the rendering process is concerned. When rendering the only sensible grouping are drawing batches (of a certain primitive, points, lines, triangles) for which the same rendering steps (render pipeline) is executed. The modern rendering APIs that were released over the past months (Vulkan, DirectX 12 and Metal) make this explicit.
When rendering your scene the recommended strategy is to iterate over all your objects, split them into render pipeline groups and perform a single drawing batch call once for each primitive-by-pipeline group. The overall goal should be to minimize the total number of drawing calls made.
If you are using OpenGL 3.3, you are using Vertex Array Objects (VAO) and Vertex Buffer Objects (VBO). You have an object, a table for example, which can have three (or more or less) VBO:s, one for vertex data, one for normal data and one for texture coordinate data. You enclose your VBO:s of that table inside one VAO. So every object have its own VAO stored in a GPU memory.
When you want to render your objects or a part of them, you bind one of your shaders at use and call those VAO:s you want to render by that shader. It may be important that you render right objects on right order and use right shaders (of course!) on each VAO.

How could I morph one 3D object into another using shaders?

For the latest Ludum Dare competition the theme was shapeshifting, so my idea involved simple morphing from one geometric shape to another.
So what I did was I made a few objects in Blender with the same vertex count. In OpenGL I made separate VAOs for each object and one additional VAO (with dynamic draw attributes) for the "morphing" object. Every frame, while the player is shapeshifting, I would upload interpolated vertex data, between the current object and the target object, into this extra VAO and then render. Otherwise just render the object's corresponding VAO.
Morphing looked like this:
(The vertices have a different ordering, so morphing is not "smooth")
Since I had little time I just made something quick and dirty but now I think this is not a great way of doing this process, because I have to upload a lot of data to the GPU every frame. And it doesn't look scalable either, if I ever wanted to draw multiple morphing objects at different morphing stages.
As a first step to improve this process I would like to move those interpolation calcs into the shaders.
I could perhaps store the data for all objects in a single VAO, in separate attributes, and then select which of the attributes to interpolate from.
But I was wondering: is there a way to somehow send multiple (two) objects/buffers into the shaders, along with an interpolation rate uniform, and then in the shaders I would do the interpolation?
You can create a buffer that holds several coordinates for each vertex. Just like normally you have coordinates, normals, texture coordinates you can have coordinate1, coordinate2, coordinate3 etc. Then in the shader you can have a uniform variable that says which to use.
With two it's of course easy since the uniform will be from zero to one and you just multiply the first coordinate with it and add the second multiplied with (1.0 - value).
Then just make sure you create the meshes from the same base shape and they will morph nicely.
Also if you use normals, make sure you have several normals and interpolate between them also.
The minus in this is that the more data you put through the more skipping in memory the shader has to do so it might not be the prettiest solution if you have a lot of forms.

Draw a bunch of elements generated by CUDA/OpenCL?

I'm new to graphics programming, and need to add on a rendering backend for a demo we're creating. I'm hoping you guys can point me in the right direction.
Short version: Is there any way to send OpenGL an array of data for distinct elements, without having to issue a draw command for each element distinctly?
Long version: We have a CUDA program (will eventually be OpenCL) which calculates a bunch of data for a bunch of objects for us. We then need to render these objects using, e.g., OpenGL.
The CUDA kernel can generate our vertices, and using OpenGL interop, it can shove these in an OpenGL VBO and not have to transfer the data back to host device memory. But the problem is we have a bunch (upwards of a million is our goal) distinct objects. It seems like our best bet here is allocating one VBO and putting every object's vertices into it. Then we can call glDrawArrays with offsets and lengths of each element inside that VBO.
However, each object may have a variable number of vertices (though the total vertices in the scene can be bounded.) I'd like to avoid having to transfer a list of start indices and lengths from CUDA -> CPU every frame, especially given that these draw commands are going right back to the GPU.
Is there any way to pack a buffer with data such that we can issue only one call to OpenGL to render the buffer, and it can render a number of distinct elements from that buffer?
(Hopefully I've also given enough info to avoid a XY problem here.)
One way would be to get away from understanding these as individual objects and making them a single large object drawn with a single draw call. The question is, what data is it that distinguishes the objects from each other, meaning what is it you change between the individual calls to glDrawArrays/glDrawElements?
If it is something simple, like a color, it would probably be easier to supply this an additional per-vertex attribute. This way you can render all objects as one single large object using a single draw call with the indiviudal sub-objects (which really only exist conceptually now) colored correctly. The memory cost of the additional attribute may be well worth it.
If it is something a little more complex (like a texture), you may still be able to index it using an additional per-vertex attribute, being either an index into a texture array (as texture arrays should be supported on CUDA/OpenCL-able hardware) or a texture coordinate into a particular subregion of a single large texture (a so-called texture atlas).
But if the difference between those objects is something more complex, as a different shader or something, you may really need to render individual objects and make individual draw calls. But you still don't need to neccessarily make a round-trip to the CPU. With the use of the ARB_draw_indirect extension (which is core since GL 4.0, I think, but may be supported on GL 3 hardware (and thus CUDA/CL-hardware), don't know) you can source the arguments to a glDrawArrays/glDrawElements call from an additional buffer (into which you can write with CUDA/CL like any other GL buffer). So you can assemble the offset-length-information of each individual object on the GPU and store them in a single buffer. Then you do your glDrawArraysIndirect loop offsetting into this single draw-indirect-buffer (with the offset between the individual objects now being constant).
But if the only reason for issuing multiple draw calls is that you want to render the objects as single GL_TRIANGLE_STRIPs or GL_TRIANGLE_FANs (or, god beware, GL_POLYGONs), you may want to reconsider just using a bunch of GL_TRIANGLES so that you can render all objects in a single draw call. The (maybe) time and memory savings from using triangle strips are likely to be outweight by the overhead of multiple draw calls, especially when rendering many small triangle strips. If you really want to use strips or fans, you may want to introduce degenerate triangles (by repeating vertices) to seprate them from each other, even when drawn with a single draw call. Or you may look into the glPrimitiveRestartIndex function introduced with GL 3.1.
Probably not optimal, but you could make a single glDrawArray on your whole buffer...
If you use GL_TRIANGLES, you can fill your buffer with zeroes, and write only the needed vertices in your kernel. This way "empty" regions of your buffer will be drawn as 0-area polygons ( = degenerate polygons -> not drawn at all )
If you use GL_TRIANGLE_STRIP, you can do the same, but you'll have to duplicate your first vertex in order to make a fake triangle between (0,0,0) and your mesh.
This can seem overkill, but :
- You'll have to be able to handle as many vertices anyway
- degenerate triangles use no fillrate, so they are almost free (the vertex shader is still computed, though)
A probably better solution would be to use glDrawElements instead : In you kernel, you also generate an index list for your whole buffer, which will be able to completely skip regions of your buffer.