How many draw calls is acceptable in vulkan? - c++

Ive been working on a vulkan renderer and am having a bit of a pickle. Currently I am using vulkan to render 2d sprites, and just imported a whole map to draw. The map is 40x40 with 1600 tiles. I cannot instance/batch these as there are moving objects in the scene and I may need to interject draw calls in between ( Some objects need to be rendered in front of others ). However, when I render these 1600 sprites individually my cpu CHUGS and it takes ~20ms to accomplish JUST the sprites. This happens in a separate thread and does the following:
Start command buffer & render pass
For every sprite to draw
Set up translation matrix.
Fetch the material if its not cached
If this command buffer is not bound to the pipeline bind it.
Bind the descriptor set given by the material if not already bound.
Push translation matrix to pipeline using push constant.
Draw.
End command buffer & render pass & submit.
My question I guess is, is 1600 too much? Should I try and find ways to batch this? Would it make more sense to just spend these clock cycles building a big buffer on the gpu and only draw once? I figured this was less efficient since I only really submit once for all commands given.

Yes, 1600 draw calls for this type of application is too many. It sounds like you could possibly use a single vkCmdDrawIndexedIndirect().
You would just need to create SSBOs for your per sprite matrices and texture samplers to index into each draw using gl_DrawIDARB in the shaders (don't forget to enable VK_KHR_SHADER_DRAW_PARAMETERS_EXTENSION_NAME).
Your CPU-side pre-draw preparation per frame would consist of setting the correct vertex/index buffer offsets within the VkDrawIndexedIndirectCommand structure, as well as setting up any required texture loads and populating your descriptors.
If draw order is a consideration for you, your application could track depth per sprite and then make sure they're set up for draw in the correct order.

Related

Cost of large buffer switch vs. small buffer switch

I'm creating a tile-based renderer where each tile has a vertex model. However, from each vertex model only a small portion is rendered in one frame. These subsets change every frame.
What would be the fastest way to render this? I can think of the following options:
Make one draw call for every model. Every model is stored in full on the gpu. For every draw call, the full vbo is switched every time. Indices are then used to pick the appropriate small portion for the actual rendering.
Make one draw call with one vbo which gets assembled every frame by copying the necessary (small) subset of all the other vbos (the data is copied within vram).
Make one draw call with one vbo, but the vbo is recreated every frame with the (small) subset from CPU data using glBufferData.
Which do you think is fastest, or can you think of something faster?
One deciding factor is obviously if switching between larger VBOs is more expensive than switching between smaller VBOs.
It is a bad idea to make a lot of drawcalls. In OpenGL,you will be CPU bound by this method, so it is better to batch a lot of models.
Actually, I would go for this method. All static geometry is inside one and only one VBO and one VAO. It does not mean that you only have "one draw call". However, you should use glMultiDraw*Indirect.
The idea burried that is you have to use compute shaders to perform culling on GPU, and use something like GL_INDIRECT_PARAMETERS extensions with your multi indirect draw call.
Indirect Drawing
For all dynamic geometry, you can use a persistent buffer.
To answer your question about changing vao/vbo. Change VAO, or use glBindVertexBuffer should not make a big overhead.
But you should profile it, it can depends on your driver / hardware :)

Displaying a framebuffer in OpenGL

I've been learning a bit of OpenGL lately, and I just got to the Framebuffers.
So by my current understanding, if you have a framebuffer of your own, and you want to draw the color buffer onto the window, you'll need to first draw a quad, and then wrap the texture over it? Is that right? Or is there something like glDrawArrays(), glDrawElements() version for framebuffers?
It seems a bit... Odd (clunky? Hackish?) to me that you have to wrap a texture over a quad in order to draw the framebuffer. This doesn't have to be done with the default framebuffer. Or is that done behind your back?
Well. The main point of framebuffer objects is to render scenes to buffers that will not get displayed but rather reused somewhere, as a source of data for some other operation (shadow maps, High dynamic range processing, reflections, portals...).
If you want to display it, why do you use a custom framebuffer in the first place?
Now, as #CoffeeandCode comments, there is indeed a glBlitFramebuffer call to allow transfering pixels from one framebuffer to another. But before you go ahead and use that call, ask yourself why you need that extra step. It's not a free operation...

Sprite Batch concept

I would like to confirm the following, Is it fine to use just one sprite-batch and draw it fonts, and other animated sprites ? if that's true, how many quads that can be batched using just one sprite-batch?is that an issue of DirectX API and it takes care of that or GPU ?
Yes it is ok to use one sprite batch object for fonts and other sprites. In fact it is probably better that way.
The number of sprites that can be batched is up to the implementation. If you are using the SpriteBatch class in the DirectXTK, then it uses a growing array as you add sprites to it so there is no real limit to the number of sprites you can give it (except for memory). Internally it creates a vertex buffer that can handle 2048 sprites or 2048*4 vertices. This doesn't limit the amount of sprites that you can send to the SpriteBatch. It just means that if you queue up 3000 sprites for example, it will need to make at least two draw calls to render everything (more if you are using multiple textures).
So, the number of sprites that can be drawn in one call depends on the size of the vertex buffer that the implementation has created. The maximum size of a vertex buffer ultimately depends on how much memory is available.

Mouse-picking using off-screen rendering?

I have 3d-scene with a lot of simple objects (may be huge number of them), so I think it's not very good idea to use ray-tracing for picking objects by mouse.
I'd like to do something like this:
render all these objects into some opengl off-screen buffer, using pointer to current object instead of his color
render the same scene onto the screen, using real colors
when user picks a point with (x,y) screen coordinates, I take the value from the off-screen buffer (from corresponding position) and have a pointer to object
Is it possible? If yes- what type of buffer can I choose for "drawing with pointers"?
I suppose you can render in two passes. First to a buffer or a texture data you need for picking and then on the second pass the data displayed. I am not really familiar with OGL but in DirectX you can do it like this: http://www.two-kings.de/tutorials/dxgraphics/dxgraphics16.html. You could then find a way to analyse the texture. Keep in mind that you are rendering data twice, which will not necessarily double your render time (as you do not need to apply all your shaders and effects) bud it will be increased quite a lot. Also per each frame you are essentially sending at least 2MB of data (if you go for 1byte per pixel on 2K monitor) from GPU to CPU but that might change if you have more than 256 objects on screen.
Edit: Here is how to do the same with OGL although I cannot verify that the tutorial is correct: http://www.opengl-tutorial.org/intermediate-tutorials/tutorial-14-render-to-texture/ (There is also many more if you look around on Google)

Static background w/ objects in openGL. Best way to "bake"?

I'm making a 2D game in openGL and I have a list of static objects. Thus far I'm looping through them and drawing them into the room, however in some large rooms there are up to 2000 of them and speed is critical so I'd like to find a way to "bake" them all together and never update them in the draw loop after that.
How can I do this and what's the best way in terms of performance, memory usage, gpu ram usage etc?
I'd prefer to use oGL 2, but I'm considering oGL 3+.
The simplest way is to move all the data of those objects to the GPU so that rendering commands will fetch memory directly from GPU memory. It can be done by simply using VBO or even DisplayList (in 'old' OpenGL 2.0 and before).
Probably the DisplayList solution wll be the most efficient because you can 'pack' all the commands inside... with VBO you can pack only the geometry data, the materials need to be setup every frame.
Related topic: instacing (but you will have to use GL 3+).
Another way is to render them to textures... and display them as simple Sprits. This technique is called 'impostors', here is some info: True Impostors.
Another option: render the environment to a Cube Map. It could work for objects that are far away from the camera (like hills, tries, etc...) but in a room it could look strange.
First option: make single mesh for objects. For example, you may dynamically update index array with objects that are visible. Very important in this case that textures you use should be in an atlas. If you can't share shader and textures there is no much effect from this technique. You may combine this method, grouping by material an texture and using single draw call to render. For example, first draw call - render 100 trees with one texture, and than render 600 apples on them and after 100 clouds.
Another option, if your objects are static you may render all of them into texture using FBO. This may be applied if your objects like background. For example, your render random stars (1000) in space for your galaxy.