How many VBOs do I use? - opengl

So I understand how to use a Vertex Buffer Object, and that it offers a large performance increase over immediate mode drawing. I will be drawing a lot of 2D quads (sprites), and I am wanting to know if I am supposed to create a VBO for each one, or create one VBO to hold all of the data?

You shouldn't use a new VBO for each sprite/quad. So putting them all in a single VBO would be the better solution in your case.
But in general i don't think this can be answered in one sentence.
Creating a new VBO for each Quad won't give you a real performance increase. If you do so, a lot of time will be wasted with glBindBuffer calls for switching the VBOs. However if you create VBOs that hold too much data you can run into other problems.
Small VBOs:
Are often easier to handle in your program code. You can use a new VBO for each Object you render. This way you can manage your objects very easy in your world
If VBOs are too small (only a few triangles) you don't gain much benefit. A lot of time will be lost with switching buffers (and maybe switching shaders/textures) between buffers
Large VBOs:
You can render tons of objects with a single drawArrays() call for best performance.
Depending on your data its possible that you create overhead for managing a lot of objects in a single VBO (what if you want to move one of these objects and rotate another object?).
If your VBOs are too large its possible that they can't be moved into VRAM
The following links can help you:
Vertex Specification Best Practices

Use one (or a small number of) VBO(s) to hold all/most of your geometry.
Generally the fewer API calls it takes to render your scene, the better.

It also depends what d you want to do with those sprites?
Are they dynamic? Do you want to change only the centre of quad or maybe modify all four points?
This is important because if your data are dynamic then, in the simplest way, you will have to transfer from cpu to gpu each frame. Maybe you could perform all computation on the GPU - for instance by using geometry shaders?
Also for very simple quads/sprites one can use GL_POINT_SPRITE. With that one need to send only one vertex for whole quad. But the drawback is that it is hard to rotate it...

Related

GLSL optimization by wrapping tris?

So I'm considering two different approaches to render objects in OpenGL.
First approach:
Use the TRIANGLE_STRIP setting with the idea of compressing the amount of vertex data written into the buffer every render. The drawback here is that normals have to be calculated on the shaders, instead of being provided at initialization.
Second approach:
Use the regular old triangle draw method, which allows most if not all vertex data, including normals, to be provided at initialization. The drawback here is that vertex data can get repetitive when writing to the buffer, especially when multiple triangles share a single vertex.
So my question is, which method is preferable? And is there another way that improves on both that I'm not thinking of? I'm new to the OpenGL scene, so take it easy on me. Thanks!

Multiple meshes in a vertex buffer object when not all of them are drawn every frame

So Let's say I have 100 different meshes that all use the same OpenGL shader. Reading OpenGL best practices apparently I should place them into the same vertex buffer object and draw them using glDrawElementsBaseVertex. Now my question is, if I only render a fraction of these meshes every frame, am I wasting resources by having all these meshes in the same vertex buffer object? What are the best practices for batching in this context?
Also are there any guidelines or ways I can determine how much should be placed into a single vertex buffer object?
if I only render a fraction of these meshes every frame, am I wasting resources by having all these meshes in the same vertex buffer object?
What resources could you possibly be wasting? The mere act of rendering doesn't use resources. And since you're going to render those other meshes sooner or later, it's better to have them in memory than to have to DMA them up.
Of course, this has to be balanced against the question of how much stuff you can fit into memory. It's a memory vs. performance tradeoff, and you have to decide for yourself and your application how appropriate it is to keep data you're not actively using around.
Common techniques for dealing with this include streaming. That is, what data is in memory depends on where you are in the scene. As you move through the world, new data for new areas is loaded in, overwriting data for old areas.
Also are there any guidelines or ways I can determine how much should be placed into a single vertex buffer object?
As much as you possibly can. The general rule of thumb is that the number of buffer objects you have should not vary with the number of objects you render.

Should I sort by buffer use when rendering?

I'm designing the sorting part of my rendering engine. I know that changing the render target, shader program, texture bindings, and more are expensive and therefore one should sort the draw order based on them to reduce state changes. However, what about sorting based on what index buffer is bound, and which vertex buffers are used for attributes?
I'm confused about these because VAOs are mandatory and they encapsulate all of that state. So should I peek behind the scenes of vertex array objects (VAOs), see what state they set and sort based on it? Or should I just not care in what order VAOs are called?
This is what confuses me so much about vertex array objects. It makes sense to me to not be switching which buffers are in use over and over and yet VAOs just seem to force one to not care about that.
Is there a general vague or not agreed on order on which to sort stuff for rendering/game engines?
I know that binding a buffer simply changes some global state but surely it must be beneficial to the hardware to draw from the same buffer multiple times, maybe some small cache coherency?
While VAOs are mandated in GL 3.1 without GL_ARB_compatibility or core 3.2+, you do not have to use them the way they are intended... that is to say, you can bind a single VAO for the duration of your application's lifetime and continue to bind and unbind VBOs, etc. the traditional way if this somehow makes your life easier. Valve is famous for advocating doing this in their presentation on porting the Source engine from D3D to GL... I tend to disagree with them on some points though. A lot of things that they mention in their presentation make me cringe as someone who has years of experience with both D3D and OpenGL; they are making suggestions on how to port something to an API they have a minimal working knowledge of.
Getting back to your performance concern though, there can be validation overhead for changing bound resources frequently, so it is actually more than just "simply changing a global state." All GL commands have to do validation in order to determine if they need to set an error state. They will validate your input parameters (which is pretty trivial), as well as the state of any resource the command needs to use (this can be complicated).
Other types of GL objects like FBOs, textures and GLSL programs have more rigorous validation and more complicated memory dependencies than buffer objects and vertex arrays do. Swapping a vertex pointer should be cheaper in the grand scheme of things than most other kinds of object bindings, especially since a lot of stuff can be deferred by an implementation until you actually issue a glDrawElements (...) command.
Nevertheless, the best way to tackle this problem is just to increase reuse of vertex buffers. Object reuse is pretty high to begin with for vertex buffers, if you have 200 instances of the same opaque model in a scene you can potentially draw all 200 of them back-to-back and never have to change a vertex pointer. Materials tend to change far more frequently than actual vertex buffers, and so you would generally sort your drawing first and foremost by material (sub-sorted by associated states like opaque/translucent, texture(s), shader(s), etc.). You can add another level to batch sorting to draw all batches that share the same vertex data after they have been sorted by material. The ultimate goal is usually to minimize the number of draw commands necessary to complete your frame, and using priority/hierarchy-based sorting with emphasis on material often delivers the best results.
Furthermore, if you can fit multiple LODs of your model into a single vertex buffer, instead of swapping between different vertex buffers sometimes you can just draw different sets of indices or even just a different range of indices from a single index buffer. In a very similar way, texture swapping pressure can be alleviated by using packed texture atlases / sprite sheets instead of a single texture object for each texture.
You can definitely squeeze out some performance by reducing the number of changes to vertex array state, but the takeaway message here is that vertex array state is pretty cheap compared to a lot of other states that change frequently. If you can quickly implement a secondary sort to reduce vertex state changes then go for it, but I would not invest a lot of time in anything more sophisticated unless you know it is a bottleneck. Prioritize texture, shader and framebuffer state first as a general rule.

which is the most optimal and correct way to drawing many different dynamic 3D models (they are animated and change every frame)

I need to know how I can render many different 3D models, which change their geometry to each frame (are animated models), don't repeat models and textures.
I carry all models and for each created an "object" model class.
What is the most optimal way to render them?
To use 1 VBO for each 3D model
To use a single VBO for all models (to be all different, I do not see this option possible)
I work with OpenGL 3.x or higher, C++ on Windows.
TL; DR - there's no silver bullet when it comes to rendering performance
Why is that? That depends on the complicated process that gets your data, converts it, pushes it to GPU and then makes pixels on the screen flicker. So, instead of "one best way", a few of guideliness appeared that might usually improve the performance.
Keep all the necessary data on the GPU (because the closer to the screen, the shorter way electrons have to go :))
Send as little data to GPU between frames as possible
Don't sync needlessly between CPU and GPU (that's like trying to run two high speed trains on parallel tracks, but insisting on slowing them down to the point where you can pass something through the window every once in a while),
Now, it's obvious that if you want to have a model that will change, you can't have the cake and eat it. You have to made tradeoffs. Simply put, dynamic objects will never render as fast as static ones. So, what should you do?
Hint GPU about the data usage (GL_STREAM_DRAW or GL_DYNAMIC_DRAW) - that should guarantee optimal memory arrangement.
Don't use interleaved buffers to mix static vertex attributes with dynamic ones - if you divide the memory, you can batch-update the geometry leaving texture coordinates intact, for example.
Try to do as much as you can purely on the GPU - with compute shaders and transform feedback, it might well be possible to store whole animation data as a buffer itself and calculate it on GPU, avoiding expensive syncs.
And last but not least, always carefully measure the impact of your change on performance. Going blindly won't help. Measure accurately and thoroughly (even stuff like shader compilation time might matter sometimes!). Then, even if you go by trial-and-error, there's a hope you'll get somewhere.
And to address one of your points in particular; whether it's one large VBO and a few smaller ones doesn't really matter, but a huge one might have problems in fitting in memory. You can still update parts of it, and what matters most is the memory arrangement inside of it.

OpenGL: recommended way of making lots of edits to VBO

This question comes in two (mostly) independent parts
My current setup is that I have a lot of Objects in gamespace. Each has a VBO assigned to it, which holds Vertex Attribute data for each vertex. If the Object wants to change its vertex data (position etc) it does so in an internal array and then call glBufferSubDataARB to update the version in the GPU.
Now I understand that this is a horrible thing to do and so I am looking for alternatives. One that presents itself is to have some managing thing that has a large VBO in the beginning and Objects can request space from it, and edit points in it. This drops the overhead of loading VBOs but comes with a large energy/time expenditure in creating and debugging such a beast (basically an entire memory management system).
My question (part (a)) is if this is the "best" method for doing this, or if there is something better that I have not thought of.
Such a system should allow easy addition/removal of vertices and editing them, as fast as possible.
Part (b) is about some simple actions taken on every object, ie those of rotation and translation. At the moment I am moving each vertex (ouch), but this must have a better option. I am considering uploading rotation and translation matrices to my shader to do there. This seems fine, but I am slightly worried about the overhead of changing uniform variables. Would it ultimately be to my advantage to do this? How fast is changing uniform variables?
Last time I checked the preferred way to do buffer updates was orphaning.
Basically, whenever you want to update your buffered data, you call glBindBuffer on your buffer, which invalidates the current content of the buffer, and then you write your new data with glMapBuffer / glBufferSubdata.
Hence:
Use a single big VBO for your static data is indeed a good idea. You must take care of the maximum allowed VBO size, and split your static data into multiple VBOs if necessary. But this is probably an over-optimization in most cases (i.e. "I wouldn't bother").
Data which is updated frequently should be grouped in the same VBO (with usage = GL_STREAM_DRAW), and you shall use orphaning to update that.
Unfortunately, the actual performance of this stuff varies on different implementations. This guy made some tests on an actual game, it may be worth reading.
For the second part of your question, obviously using uniforms is the way to do it. Yes there is some (little) overhead, but it's sure 1000 times better than streaming all your data at every frame.