Static background w/ objects in openGL. Best way to "bake"? - c++

I'm making a 2D game in openGL and I have a list of static objects. Thus far I'm looping through them and drawing them into the room, however in some large rooms there are up to 2000 of them and speed is critical so I'd like to find a way to "bake" them all together and never update them in the draw loop after that.
How can I do this and what's the best way in terms of performance, memory usage, gpu ram usage etc?
I'd prefer to use oGL 2, but I'm considering oGL 3+.

The simplest way is to move all the data of those objects to the GPU so that rendering commands will fetch memory directly from GPU memory. It can be done by simply using VBO or even DisplayList (in 'old' OpenGL 2.0 and before).
Probably the DisplayList solution wll be the most efficient because you can 'pack' all the commands inside... with VBO you can pack only the geometry data, the materials need to be setup every frame.
Related topic: instacing (but you will have to use GL 3+).
Another way is to render them to textures... and display them as simple Sprits. This technique is called 'impostors', here is some info: True Impostors.
Another option: render the environment to a Cube Map. It could work for objects that are far away from the camera (like hills, tries, etc...) but in a room it could look strange.

First option: make single mesh for objects. For example, you may dynamically update index array with objects that are visible. Very important in this case that textures you use should be in an atlas. If you can't share shader and textures there is no much effect from this technique. You may combine this method, grouping by material an texture and using single draw call to render. For example, first draw call - render 100 trees with one texture, and than render 600 apples on them and after 100 clouds.
Another option, if your objects are static you may render all of them into texture using FBO. This may be applied if your objects like background. For example, your render random stars (1000) in space for your galaxy.

Related

How many draw calls is acceptable in vulkan?

Ive been working on a vulkan renderer and am having a bit of a pickle. Currently I am using vulkan to render 2d sprites, and just imported a whole map to draw. The map is 40x40 with 1600 tiles. I cannot instance/batch these as there are moving objects in the scene and I may need to interject draw calls in between ( Some objects need to be rendered in front of others ). However, when I render these 1600 sprites individually my cpu CHUGS and it takes ~20ms to accomplish JUST the sprites. This happens in a separate thread and does the following:
Start command buffer & render pass
For every sprite to draw
Set up translation matrix.
Fetch the material if its not cached
If this command buffer is not bound to the pipeline bind it.
Bind the descriptor set given by the material if not already bound.
Push translation matrix to pipeline using push constant.
Draw.
End command buffer & render pass & submit.
My question I guess is, is 1600 too much? Should I try and find ways to batch this? Would it make more sense to just spend these clock cycles building a big buffer on the gpu and only draw once? I figured this was less efficient since I only really submit once for all commands given.
Yes, 1600 draw calls for this type of application is too many. It sounds like you could possibly use a single vkCmdDrawIndexedIndirect().
You would just need to create SSBOs for your per sprite matrices and texture samplers to index into each draw using gl_DrawIDARB in the shaders (don't forget to enable VK_KHR_SHADER_DRAW_PARAMETERS_EXTENSION_NAME).
Your CPU-side pre-draw preparation per frame would consist of setting the correct vertex/index buffer offsets within the VkDrawIndexedIndirectCommand structure, as well as setting up any required texture loads and populating your descriptors.
If draw order is a consideration for you, your application could track depth per sprite and then make sure they're set up for draw in the correct order.

Cost of large buffer switch vs. small buffer switch

I'm creating a tile-based renderer where each tile has a vertex model. However, from each vertex model only a small portion is rendered in one frame. These subsets change every frame.
What would be the fastest way to render this? I can think of the following options:
Make one draw call for every model. Every model is stored in full on the gpu. For every draw call, the full vbo is switched every time. Indices are then used to pick the appropriate small portion for the actual rendering.
Make one draw call with one vbo which gets assembled every frame by copying the necessary (small) subset of all the other vbos (the data is copied within vram).
Make one draw call with one vbo, but the vbo is recreated every frame with the (small) subset from CPU data using glBufferData.
Which do you think is fastest, or can you think of something faster?
One deciding factor is obviously if switching between larger VBOs is more expensive than switching between smaller VBOs.
It is a bad idea to make a lot of drawcalls. In OpenGL,you will be CPU bound by this method, so it is better to batch a lot of models.
Actually, I would go for this method. All static geometry is inside one and only one VBO and one VAO. It does not mean that you only have "one draw call". However, you should use glMultiDraw*Indirect.
The idea burried that is you have to use compute shaders to perform culling on GPU, and use something like GL_INDIRECT_PARAMETERS extensions with your multi indirect draw call.
Indirect Drawing
For all dynamic geometry, you can use a persistent buffer.
To answer your question about changing vao/vbo. Change VAO, or use glBindVertexBuffer should not make a big overhead.
But you should profile it, it can depends on your driver / hardware :)

Mouse-picking using off-screen rendering?

I have 3d-scene with a lot of simple objects (may be huge number of them), so I think it's not very good idea to use ray-tracing for picking objects by mouse.
I'd like to do something like this:
render all these objects into some opengl off-screen buffer, using pointer to current object instead of his color
render the same scene onto the screen, using real colors
when user picks a point with (x,y) screen coordinates, I take the value from the off-screen buffer (from corresponding position) and have a pointer to object
Is it possible? If yes- what type of buffer can I choose for "drawing with pointers"?
I suppose you can render in two passes. First to a buffer or a texture data you need for picking and then on the second pass the data displayed. I am not really familiar with OGL but in DirectX you can do it like this: http://www.two-kings.de/tutorials/dxgraphics/dxgraphics16.html. You could then find a way to analyse the texture. Keep in mind that you are rendering data twice, which will not necessarily double your render time (as you do not need to apply all your shaders and effects) bud it will be increased quite a lot. Also per each frame you are essentially sending at least 2MB of data (if you go for 1byte per pixel on 2K monitor) from GPU to CPU but that might change if you have more than 256 objects on screen.
Edit: Here is how to do the same with OGL although I cannot verify that the tutorial is correct: http://www.opengl-tutorial.org/intermediate-tutorials/tutorial-14-render-to-texture/ (There is also many more if you look around on Google)

Draw a bunch of elements generated by CUDA/OpenCL?

I'm new to graphics programming, and need to add on a rendering backend for a demo we're creating. I'm hoping you guys can point me in the right direction.
Short version: Is there any way to send OpenGL an array of data for distinct elements, without having to issue a draw command for each element distinctly?
Long version: We have a CUDA program (will eventually be OpenCL) which calculates a bunch of data for a bunch of objects for us. We then need to render these objects using, e.g., OpenGL.
The CUDA kernel can generate our vertices, and using OpenGL interop, it can shove these in an OpenGL VBO and not have to transfer the data back to host device memory. But the problem is we have a bunch (upwards of a million is our goal) distinct objects. It seems like our best bet here is allocating one VBO and putting every object's vertices into it. Then we can call glDrawArrays with offsets and lengths of each element inside that VBO.
However, each object may have a variable number of vertices (though the total vertices in the scene can be bounded.) I'd like to avoid having to transfer a list of start indices and lengths from CUDA -> CPU every frame, especially given that these draw commands are going right back to the GPU.
Is there any way to pack a buffer with data such that we can issue only one call to OpenGL to render the buffer, and it can render a number of distinct elements from that buffer?
(Hopefully I've also given enough info to avoid a XY problem here.)
One way would be to get away from understanding these as individual objects and making them a single large object drawn with a single draw call. The question is, what data is it that distinguishes the objects from each other, meaning what is it you change between the individual calls to glDrawArrays/glDrawElements?
If it is something simple, like a color, it would probably be easier to supply this an additional per-vertex attribute. This way you can render all objects as one single large object using a single draw call with the indiviudal sub-objects (which really only exist conceptually now) colored correctly. The memory cost of the additional attribute may be well worth it.
If it is something a little more complex (like a texture), you may still be able to index it using an additional per-vertex attribute, being either an index into a texture array (as texture arrays should be supported on CUDA/OpenCL-able hardware) or a texture coordinate into a particular subregion of a single large texture (a so-called texture atlas).
But if the difference between those objects is something more complex, as a different shader or something, you may really need to render individual objects and make individual draw calls. But you still don't need to neccessarily make a round-trip to the CPU. With the use of the ARB_draw_indirect extension (which is core since GL 4.0, I think, but may be supported on GL 3 hardware (and thus CUDA/CL-hardware), don't know) you can source the arguments to a glDrawArrays/glDrawElements call from an additional buffer (into which you can write with CUDA/CL like any other GL buffer). So you can assemble the offset-length-information of each individual object on the GPU and store them in a single buffer. Then you do your glDrawArraysIndirect loop offsetting into this single draw-indirect-buffer (with the offset between the individual objects now being constant).
But if the only reason for issuing multiple draw calls is that you want to render the objects as single GL_TRIANGLE_STRIPs or GL_TRIANGLE_FANs (or, god beware, GL_POLYGONs), you may want to reconsider just using a bunch of GL_TRIANGLES so that you can render all objects in a single draw call. The (maybe) time and memory savings from using triangle strips are likely to be outweight by the overhead of multiple draw calls, especially when rendering many small triangle strips. If you really want to use strips or fans, you may want to introduce degenerate triangles (by repeating vertices) to seprate them from each other, even when drawn with a single draw call. Or you may look into the glPrimitiveRestartIndex function introduced with GL 3.1.
Probably not optimal, but you could make a single glDrawArray on your whole buffer...
If you use GL_TRIANGLES, you can fill your buffer with zeroes, and write only the needed vertices in your kernel. This way "empty" regions of your buffer will be drawn as 0-area polygons ( = degenerate polygons -> not drawn at all )
If you use GL_TRIANGLE_STRIP, you can do the same, but you'll have to duplicate your first vertex in order to make a fake triangle between (0,0,0) and your mesh.
This can seem overkill, but :
- You'll have to be able to handle as many vertices anyway
- degenerate triangles use no fillrate, so they are almost free (the vertex shader is still computed, though)
A probably better solution would be to use glDrawElements instead : In you kernel, you also generate an index list for your whole buffer, which will be able to completely skip regions of your buffer.

How do I use Vertex Buffer Objects to render many different circles?

I'm trying to write a game that deals with many circles (well, Triangle Fans, but you get the idea). Each circle will have an x position, a y position, and a mass property. Every circle's mass property will be different. Also, I want to color some groups of circles different, while keeping a transparent circle center, and fading to opaque along the perimeter of the circles.
I was told to use VBOs, and have been Googling all day. I would like a full example on how I would draw these circles and an explanation as to how VBOs work, while keeping that explanation simple.
I have not implemented VBOs myself yet, but from what I understand, they work similar to texture objects. In fact, when I am reminding myself and explaining to others what VBOs are, I like to incorrectly call texture objects 'texture buffer objects', to reinforce the conceptual similarity.
(Not to be mixed with buffer textures from NVIDIA-specified extension GL_EXT_texture_buffer_object.)
So let's think: what are texture objects? They are objects you generate using glGenTextures(). glGenBuffersARB() does similar thing. An analogy applies with glBindTexture() and glBindBufferARB().
(As of OpenGL 1.5, functions glGenBuffers() and glBindBuffer() have entered core OpenGL, so you can use them in place of the extension equivalents.)
But what exactly are these 'texture objects', and what do they do? Well, consider that you can, actually, use glTexture2D() in each frame to set up a texture. Texture objects only serve to reduce traffic between GPU and main memory; instead of sending entire pixel array, you send just the "OpenGL name" (that is, an integer identifier) of the texture object which you know to be static.
VBOs serve similar purpose. Instead of sending the vertex array over and over, you upload the array once using glBufferData() and then send just the "OpenGL name" of the object. They are great for static objects, and not so great for dynamic objects. In fact, many generic 3D engines such as Ogre3D provide you with a way to specify if a mesh is dynamic or static, quite probably in order to let you decide between VBOs and vertex arrays.
For your purposes, VBOs are not the right answer. You want numerous simple objects that are continuously morphing and changing. By simple, I mean those with less than 200 vertices. Unless you intend to write a very smart and complex vertex shader, VBOs are not for you. You want to use vertex arrays, which you can easily manipulate from main CPU and update them each frame without making special calls to graphics card to reupload the entire VBO onto the graphics card (which may turn out slower than just sending vertex arrays).
Here's a quite good letscallit "man page" from nVidia about VBO API. Read it for further info!
good vbo tutorial
What you're doing looks like particles, you might want to google "particle rendering" just in case.