Cost of large buffer switch vs. small buffer switch - c++

I'm creating a tile-based renderer where each tile has a vertex model. However, from each vertex model only a small portion is rendered in one frame. These subsets change every frame.
What would be the fastest way to render this? I can think of the following options:
Make one draw call for every model. Every model is stored in full on the gpu. For every draw call, the full vbo is switched every time. Indices are then used to pick the appropriate small portion for the actual rendering.
Make one draw call with one vbo which gets assembled every frame by copying the necessary (small) subset of all the other vbos (the data is copied within vram).
Make one draw call with one vbo, but the vbo is recreated every frame with the (small) subset from CPU data using glBufferData.
Which do you think is fastest, or can you think of something faster?
One deciding factor is obviously if switching between larger VBOs is more expensive than switching between smaller VBOs.

It is a bad idea to make a lot of drawcalls. In OpenGL,you will be CPU bound by this method, so it is better to batch a lot of models.
Actually, I would go for this method. All static geometry is inside one and only one VBO and one VAO. It does not mean that you only have "one draw call". However, you should use glMultiDraw*Indirect.
The idea burried that is you have to use compute shaders to perform culling on GPU, and use something like GL_INDIRECT_PARAMETERS extensions with your multi indirect draw call.
Indirect Drawing
For all dynamic geometry, you can use a persistent buffer.
To answer your question about changing vao/vbo. Change VAO, or use glBindVertexBuffer should not make a big overhead.
But you should profile it, it can depends on your driver / hardware :)

Related

How many draw calls is acceptable in vulkan?

Ive been working on a vulkan renderer and am having a bit of a pickle. Currently I am using vulkan to render 2d sprites, and just imported a whole map to draw. The map is 40x40 with 1600 tiles. I cannot instance/batch these as there are moving objects in the scene and I may need to interject draw calls in between ( Some objects need to be rendered in front of others ). However, when I render these 1600 sprites individually my cpu CHUGS and it takes ~20ms to accomplish JUST the sprites. This happens in a separate thread and does the following:
Start command buffer & render pass
For every sprite to draw
Set up translation matrix.
Fetch the material if its not cached
If this command buffer is not bound to the pipeline bind it.
Bind the descriptor set given by the material if not already bound.
Push translation matrix to pipeline using push constant.
Draw.
End command buffer & render pass & submit.
My question I guess is, is 1600 too much? Should I try and find ways to batch this? Would it make more sense to just spend these clock cycles building a big buffer on the gpu and only draw once? I figured this was less efficient since I only really submit once for all commands given.
Yes, 1600 draw calls for this type of application is too many. It sounds like you could possibly use a single vkCmdDrawIndexedIndirect().
You would just need to create SSBOs for your per sprite matrices and texture samplers to index into each draw using gl_DrawIDARB in the shaders (don't forget to enable VK_KHR_SHADER_DRAW_PARAMETERS_EXTENSION_NAME).
Your CPU-side pre-draw preparation per frame would consist of setting the correct vertex/index buffer offsets within the VkDrawIndexedIndirectCommand structure, as well as setting up any required texture loads and populating your descriptors.
If draw order is a consideration for you, your application could track depth per sprite and then make sure they're set up for draw in the correct order.

Best way to convert OpenGL immediate mode rendering utility methods to using VBOs?

I've written for myself a small utility class containing useful methods for rendering lines, quads, cubes, etc. quickly and easily in OpenGL. Up until now, I've been using almost entirely immediate mode, so I could focus on learning other aspects of OpenGL. It seems prudent to switch over to using VBOs. However, I want to keep much of the same functionality I've been using, for instance my utility class. Is there a good method of converting these simple immediate mode calls to a versatile VBO system?
I am using LWJGL.
Having converted my own code from begin..end blocks and also taught others, this is what I recommend.
I'm assuming that your utility class is mostly static methods, draw a line from this point to that point.
First step is to have each individual drawing operation create a VBO for each attribute. Replace your glBegin..glEnd block with code that creates an array (actually a ByteBuffer) for each vertex attribute: coordinates, colors, tex coords, etc. After what used to be glEnd, copy the ByteBuffers to the VBOs with glBufferData. Then set up the attributes with chunks of glEnableClientState, glBindBuffer, glVertex|Color|whateverPointer calls. Call glDrawArrays to actually draw something, and finally restore client state and delete the VBOs.
Now, this is not good OpenGL code and is horribly inefficient and wasteful. But it should work, it's fairly straightforward to write, and you can change one method at a time.
And if you don't need to draw very much, well modern GPUs are so fast that maybe you won't care that it's inefficient.
Second step is to start re-using VBOs. Have your class create one VBO for each possible attribute at init time or first use. The drawing code still creates ByteBuffer data arrays and copies them over, but doesn't delete the VBOs.
Third step, if you want to move into OpenGL 4 and are using shaders, would be to replace glVertexPointer with glVertexAttribPointer(0, glColorPointer with glVertexAttribPointer(1, etc. You should also create a Vertex Array Object along with the VBOs at init time. (You'll still have to enable/disable attrib pointers individually depending on whether each draw operation needs colors, tex coords, etc.)
And the last step, which would require changes elsewhere to your program(s), would be to go for 3D "objects" rather than methods. Your utility class would no longer contain drawing methods. Instead you create a line, quad, or cube object and draw that. Each of these objects would (probably) have its own VBOs. This is more work, but really pays off in the common case when a lot of your 3D geometry doesn't change from frame to frame. But again, you can start with the more "wasteful" approach of replacing each method call to draw a line from P1 to P2 with something like l = new Line3D(P1, P2) ; l.draw().
Hope this helps.

Draw a bunch of elements generated by CUDA/OpenCL?

I'm new to graphics programming, and need to add on a rendering backend for a demo we're creating. I'm hoping you guys can point me in the right direction.
Short version: Is there any way to send OpenGL an array of data for distinct elements, without having to issue a draw command for each element distinctly?
Long version: We have a CUDA program (will eventually be OpenCL) which calculates a bunch of data for a bunch of objects for us. We then need to render these objects using, e.g., OpenGL.
The CUDA kernel can generate our vertices, and using OpenGL interop, it can shove these in an OpenGL VBO and not have to transfer the data back to host device memory. But the problem is we have a bunch (upwards of a million is our goal) distinct objects. It seems like our best bet here is allocating one VBO and putting every object's vertices into it. Then we can call glDrawArrays with offsets and lengths of each element inside that VBO.
However, each object may have a variable number of vertices (though the total vertices in the scene can be bounded.) I'd like to avoid having to transfer a list of start indices and lengths from CUDA -> CPU every frame, especially given that these draw commands are going right back to the GPU.
Is there any way to pack a buffer with data such that we can issue only one call to OpenGL to render the buffer, and it can render a number of distinct elements from that buffer?
(Hopefully I've also given enough info to avoid a XY problem here.)
One way would be to get away from understanding these as individual objects and making them a single large object drawn with a single draw call. The question is, what data is it that distinguishes the objects from each other, meaning what is it you change between the individual calls to glDrawArrays/glDrawElements?
If it is something simple, like a color, it would probably be easier to supply this an additional per-vertex attribute. This way you can render all objects as one single large object using a single draw call with the indiviudal sub-objects (which really only exist conceptually now) colored correctly. The memory cost of the additional attribute may be well worth it.
If it is something a little more complex (like a texture), you may still be able to index it using an additional per-vertex attribute, being either an index into a texture array (as texture arrays should be supported on CUDA/OpenCL-able hardware) or a texture coordinate into a particular subregion of a single large texture (a so-called texture atlas).
But if the difference between those objects is something more complex, as a different shader or something, you may really need to render individual objects and make individual draw calls. But you still don't need to neccessarily make a round-trip to the CPU. With the use of the ARB_draw_indirect extension (which is core since GL 4.0, I think, but may be supported on GL 3 hardware (and thus CUDA/CL-hardware), don't know) you can source the arguments to a glDrawArrays/glDrawElements call from an additional buffer (into which you can write with CUDA/CL like any other GL buffer). So you can assemble the offset-length-information of each individual object on the GPU and store them in a single buffer. Then you do your glDrawArraysIndirect loop offsetting into this single draw-indirect-buffer (with the offset between the individual objects now being constant).
But if the only reason for issuing multiple draw calls is that you want to render the objects as single GL_TRIANGLE_STRIPs or GL_TRIANGLE_FANs (or, god beware, GL_POLYGONs), you may want to reconsider just using a bunch of GL_TRIANGLES so that you can render all objects in a single draw call. The (maybe) time and memory savings from using triangle strips are likely to be outweight by the overhead of multiple draw calls, especially when rendering many small triangle strips. If you really want to use strips or fans, you may want to introduce degenerate triangles (by repeating vertices) to seprate them from each other, even when drawn with a single draw call. Or you may look into the glPrimitiveRestartIndex function introduced with GL 3.1.
Probably not optimal, but you could make a single glDrawArray on your whole buffer...
If you use GL_TRIANGLES, you can fill your buffer with zeroes, and write only the needed vertices in your kernel. This way "empty" regions of your buffer will be drawn as 0-area polygons ( = degenerate polygons -> not drawn at all )
If you use GL_TRIANGLE_STRIP, you can do the same, but you'll have to duplicate your first vertex in order to make a fake triangle between (0,0,0) and your mesh.
This can seem overkill, but :
- You'll have to be able to handle as many vertices anyway
- degenerate triangles use no fillrate, so they are almost free (the vertex shader is still computed, though)
A probably better solution would be to use glDrawElements instead : In you kernel, you also generate an index list for your whole buffer, which will be able to completely skip regions of your buffer.

Static background w/ objects in openGL. Best way to "bake"?

I'm making a 2D game in openGL and I have a list of static objects. Thus far I'm looping through them and drawing them into the room, however in some large rooms there are up to 2000 of them and speed is critical so I'd like to find a way to "bake" them all together and never update them in the draw loop after that.
How can I do this and what's the best way in terms of performance, memory usage, gpu ram usage etc?
I'd prefer to use oGL 2, but I'm considering oGL 3+.
The simplest way is to move all the data of those objects to the GPU so that rendering commands will fetch memory directly from GPU memory. It can be done by simply using VBO or even DisplayList (in 'old' OpenGL 2.0 and before).
Probably the DisplayList solution wll be the most efficient because you can 'pack' all the commands inside... with VBO you can pack only the geometry data, the materials need to be setup every frame.
Related topic: instacing (but you will have to use GL 3+).
Another way is to render them to textures... and display them as simple Sprits. This technique is called 'impostors', here is some info: True Impostors.
Another option: render the environment to a Cube Map. It could work for objects that are far away from the camera (like hills, tries, etc...) but in a room it could look strange.
First option: make single mesh for objects. For example, you may dynamically update index array with objects that are visible. Very important in this case that textures you use should be in an atlas. If you can't share shader and textures there is no much effect from this technique. You may combine this method, grouping by material an texture and using single draw call to render. For example, first draw call - render 100 trees with one texture, and than render 600 apples on them and after 100 clouds.
Another option, if your objects are static you may render all of them into texture using FBO. This may be applied if your objects like background. For example, your render random stars (1000) in space for your galaxy.

Better to create new VBOs or just swap the data? (OpenGL)

So in a OpenGL rendering application, is it usually better to create and maintain a vertex buffer throughout the life of an application and just swap out the data every frame with glBufferData, or is it better to just delete the VBO and recreate it every frame?
Intuition tells me it's better to swap out data, but a few sample programs I've seen does the latter, so I'm kind of confused.
I read Nvidia's whitepaper on VBOs, but as I'm a newbie to opengl, it didn't make a whole lot of sense.
Thanks in advance for and advice
Since you're generating a whole new set of data each frame the documentation seems to indicate that GL_STREAM_DRAW is the Right Way to go about things.
The significant thing about VBOs is that they are render data buffers that are stored in graphics memory and not in the computer's main memory. That makes their usage very efficient when the data in them isn't being updated (too) frequently, because every time you do that, the computer will have to transfer (potentially huge amounts of) data from main to graphics memory - which is slow.
So the ideal case is to put all required render data into VBOs once and then only manipulate them via OpenGL functions like matrix transformation or via shaders.
So you would e.g. put each mesh's world space coordinates and texture coordinates into VBOs and never directly touch them again; you'd use the modelview matrix, lighting functions and shaders to render them.
You can do more to optimize VBO usage, but that's the basics as I have understood them.
Find some good hints and more details here: How do I use OpenGL 3.x VBOs to render a dynamic world?