Imagine I want to draw a pyramid made of triangles.
-Should I create a VBO for each triangle or one containing all triangles?
Selecting a VBO into a context is a rather expensive state change, so using fewer VBOs is definitely advantageous.
Related
I am creating a 2d game engine as a side project. I've been doing some experimenting and research and have come across a lot of people suggesting that you batch (store multiple objects to draw) in the same VBO. For example, if I my scene had a lot of trees I could put all the trees in the same VBO because they have the same memory footprint and then use a single glDrawArrays to draw all of them.
This is fine and it makes total sense... but then I started to wonder how I can send the different transforms for each tree? How do I get that to the shader? Or is this approach assuming I do the calculations on CPU and send the entire VBO each draw?
Here are two of the main questions i've been looking at:
OpenGL VAO best practices
OpenGL How Many VAOs
The term you're looking for is Instancing.
You create a VAO that contains the model of the tree, and then instead of passing the model matrix through the uniforms, you put it as an instanced vertex attribute in the VAO. Then each instance of your tree will get drawn with a different model matrix.
You can utilize compute shaders or transform feedback to update and store the model-view product in the VAO once per instance per frame (rather than calculating it per each vertex of each instance).
I'm using OpenGL 4 and C++11.
Currently I make a whole bunch of individual calls to glDrawElements using separate VAOs with a separate VBO and an IBO.
I do this because the texture coords change for each, and my Vertex data features the texture coords. I understand that there's some redundent position information in this vertex data; however, it's always -1,-1,1,1 because I use a translation and a scale matrix in my vertex shader to then position and scale the vertex data.
The VAO, VBO, IBO, position and scale matrix and texture ID are stored in an object. It's one object per quad.
Currently, some of the drawing would occur like this:
Draw a quad object via (glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT,0)). The bound VBO is just -1,-1,1,1 and the IBO draws me a quad. The bound VBO contains the texture coords of a common texture (same texture used to texture all drawn quads). Matrix transformations on shader position it.
Repeat with another quad object
glEnable(GL_SCISSOR_TEST) is called and the position information of the preview quad is used in a call to glScissor
Next quad object is drawn; only the parts of it visible from the previous quad are actually shown.
Draw another quad object
The performance I'm getting now is acceptable but I want it faster because I've only scratched the surface of what I have in mind. So I'm looking at optimizing. So far I've read that I should:
Remove the position information from my vertex data and just keep texture coords. Instead bind a single position VBO at the start of drawing quads so it's used by all of them.
But I'm unsure how this would work? Because I can only have one VBO active at any one time.
Would I then have to call glBufferSubData and update the texture coordinates prior to drawing each quad? Would this be better performance or worse (a call to glBindVertexArray for every object or a call to glBufferSubData?)
Would I still pass the position and scale as matrices to the shader, I would I take that opportunity to also update the position info of the vertices as well as the texture coords? Which would be faster?
Create one big VBO with or without an IBO and update the vertex data for the position (rather than use a transformation and scale matrix) of each quad within this. It seems like this would be difficult to manage.
Even if I did manage to do this; I would only have a single glDraw call; which sounds fast. Is this true? What sort of performance impact does a single glBindVertexArray call have over multiple?
I don't think there's any way to use this method to implement something like the glScissor call that I'm making now?
Another option I've read is instancing. So I draw the quad however many times I need it; which means I would pass the shader an array of translation matrices and an array of texture coords?
Would this be a lot faster?
I think I could do something like the glScissor test by passing an additional array of booleans which defines whether the current quad should be only drawn within the bounds of the previous one. However, I think this means that for each gl_InstanceID I would have to traverse all previous instances looking for true and false values, and it seems like it would be slow.
I'm trying to save time by not implementing all of these individually. Hopefully an expert can point me towards which is probably better. If anyone has an even better idea, please let me know.
You can have multiple VBO attached to different attributes!
following seqence binds 2 vbos to attribs 0 & 1, note that glBindBuffer() binds buffer temporarily and actual VBO assignment to attrib is made during glVertexAttribPointer().
glBindBuffer(GL_ARRAY_BUFFER,buf1);
glVertexAttribPointer(0, ...);
glEnableVertexAttribArray(0);
glBindBuffer(GL_ARRAY_BUFFER,buf2);
glVertexAttribPointer(1, ...);
glEnableVertexAttribArray(1);
The fastest way to provide quad positions & sizes is to use texture and sample it inside vertex shader. Of course you'd need at least RGBA (x,y,width,height) 16bits / channel texture. But then you can update quad positions using glTexSubImage2D() or you could even render them via FBO.
Everything other than that will perform slower, of course if you want we can elaborate about using uniforms, attribs in vbos or using attribs without enabled arrays for them.
Putting all together:
use single vbo, store quad id in it (int) + your texturing data
prepare x,y,w,h texture, define mapping from quad id to this texture texcoord ie: u=quad_id&0xFF , v=(quad_id>>8) (for texture 256x256 max 65536 quads)
use vertex shader to sample displacement and size from that texture (for given quad_id stored in attribute (or use vertex_ID/4 or vertex_ID/6)
fill vbo and texture
draw everything with single drawarrays of draw elements
So I read somewhere that I should use fewer VBOs as possible and render many objects from a single VBO.
I get the point, but isn't that less organized?
I planned on creating 3d shapes classes where each instance of the class has its own VBO that will be bound when it needs to be drawn. This comes in contrast to what I wrote earlier.
How do 3D applications (video games etc.) get this done?
Firstly a VBO is just a collection of vertices that is stored in your video card memory. You can use a single VBO and model matrix to transform it, you can also apply different shaders and textures on top of it.
This is best explained with a 2D game in mind where every graphic is basically a quad; 4 vertices.
Would you really need to flood your video card memory with repeated allocations for a VBO of 4 vertices?
Instead we can have a pointer to the VBO and model matrix that transforms it. The VBO is usually stored in a resource manager so it stays in memory for as long as we like.
So now we're reusing a single VBO for everything, we're applying different transformations and textures to it. It looks like there are lots and lots of VBO's but actually we're being very efficient with our video card memory.
Here's some pseudo code:
class Sprite {
mat4 transform;
GLuint VBO = ResourceManager.quadVBO;
}
Actually, I have a lot of models that are produced by cpu.(around 100K and one of them around 100 triangle) and all model has its vbo and ibo. If I try to draw each model with glDrawElements() it is quite slow. also if I try to draw combine all vbos and ibos if a model is deleted I need to update vbo also almost all ibo because of removed points change index order and then I need the buffer all of these again it is slow. Also I am not sure about instancing performance and picking I need to know which triangle belongs to which model.Is there any way to buffer and than one draw function draw all individual model with its own vbo and index?
You can set a base vertex per mesh and pass it to the glDrawElementsBaseVertex call. This will still require a call per mesh which can be solved with glMultiDrawElementsBaseVertex with which you can combine them all into a single draw call.
I'm making a small 2D game demo and from what I've read, it's better to use drawElements() to draw an indexed triangle list than using drawArrays() to draw an unindexed triangle list.
But it doesn't seem possible as far as I know to draw multiple elements that are not connected in a single draw call with drawElements().
So for my 2D game demo where I'm only ever going to draw squares made of two triangles, what would be the best approach so I don't end having one draw call per object?
Yes, it's better to use indices in many cases since you don't have to store or transfer duplicate vertices and you don't have to process duplicate vertices (vertex shader only needs to be run once per vertex). In the case of quads, you reduce 6 vertices to 4, plus a small amount of index data. Two thirds is quite a good improvement really, especially if your vertex data is more than just position.
In summary, glDrawElements results in
Less data (mostly), which means more GPU memory for other things
Faster updating if the data changes
Faster transfer to the GPU
Faster vertex processing (no duplicates)
Indexing can affect cache performance, if the reference vertices that aren't near each other in memory. Modellers commonly produce meshes which are optimized with this in mind.
For multiple elements, if you're referring to GL_TRIANGLE_STRIP you could use glPrimitiveRestartIndex to draw multiple strips of triangles with the one glDrawElements call. In your case it's easy enough to use GL_TRIANGLES and reference 4 vertices with 6 indices for each quad. Your vertex array then needs to store all the vertices for all your quads. If they're moving you still need to send that data to the GPU every frame. You could position all the moving quads at the front of the array and only update the active ones. You could also store static vertex data in a separate array.
The typical approach to drawing a 3D model is to provide a list of fixed vertices for the geometry and move the whole thing with the model matrix (as part of the model-view). The confusing part here is that the mesh data is so small that, as you say, the overhead of the draw calls may become quite prominent. I think you'll have to draw a LOT of quads before you get to the stage where it'll be a problem. However, if you do, instancing or some similar idea such as particle systems is where you should look.
Perhaps only go down the following track if the draw calls or data transfer becomes a problem as there's a lot involved. A good way of implementing particle systems entirely on the GPU is to store instance attributes such as position/colour in a texture. Each frame you use an FBO/render-to-texture to "ping-pong" this data between another texture and update the attributes in a fragment shader. To draw the particles, you can set up a static VBO which stores quads with the attribute-data texture coordinates for use in the vertex shader where the particle position can be read and applied. I'm sure there's a bunch of good tutorials/implementations to follow out there (please comment if you know of a good one).