As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
What are the fastest and the most flexible (applicable in most situations) ways to use VBOs?
I am developing an openGL application and i want it to get to the best performance, so i need someone to answer these questions. I read many questions and anwers but i guess there's to much information that i don't need which messes up my brain...
How many vbos should i use?
How should i create vbos?
How should i update vbos data, if the data size is not fixed?
How should i render vbos?
How should i deal with data in vbos that i don't want to render anymore?
How many vbos should i use?
As few as possible. Switching VBOs comes with a small, but measureable cost. In general you'll try to group similar data into VBOs. For example in a FPS game all the different kinds of garbage lying on the street, small props, etc., will be usually located in the same or only a small number of VBOs.
It also comes down to drawing batch sizes. glDraw… calls which render less than about 100 primitives are suboptimal (this has always been the case, even 15 years ago). So you want to batch at least 100 primitives where possible. But if a single mesh has only, say 20 triangles (low polycount props for instancing or such), each in its own VBO you can no longer batch more.
How should i create vbos?
glGenBuffers → glBindBuffer → glBufferData
UPDATE You can pass a null pointer to the data parameter of glBufferData to initialize the buffer object without setting the data.
How should i update vbos data, if the data size is not fixed?
Create VBOs with a coarser size granularity than your data size is. Your operating system is doing this anyway for your host side data, it's called paging. Also if you want to use glMapBuffer making the buffer object a multiple of the host page size is very nice to the whole system.
The usual page size on current systems is 4kiB. So that's the VBO size granularity I'd choose. UPDATE: You can BTW ask your operating system which page size it is using. That's OS dependent though, I'd ask another question for that.
Update the data using glBufferSubData or map it with glMapBuffer modify in the host side mapped memory, then glUnmapBuffer.
If the data outgrows the buffer object, create a new, larger one and copy with glCopyBufferSubData. See the lase paragraph.
How should i render vbos?
glBindBuffer → glDraw…
How should i deal with data in vbos that i don't want to render anymore?
If the data consumes only a part of the VBO and shares it with other data and you're not running out of memory then, well, just don't access it. Ideally you keep around some index in which you keep track of which VBO has which parts of it available for what kind of task. This is very much like memory management, specifically a scheme known as object stacks (obstacks).
However eventually it may make sense to compactify an existing buffer object. For this you'd create a new buffer object, bind it as writing target, with the old buffer object being selected as reading target. Then use glCopyBufferSubData to copy the contents into a new, tightened buffer object. Of course you will then have to update all references to buffer object name (=OpenGL ID) and offsets.
For this reason it makes sense to write a thin abstraction layer on top of OpenGL buffer objects that keeps track of the actual typed data within the structureless blobs OpenGL buffer objects are.
How many vbo's should i use?
As many as you need, sounds silly but well, its that way
How should i update vbos data, if the data size is not fixed?
Overwrite and render the same VBO with different data and lengths.
How should i create vbos?
How should i render vbos?
see VBO tutorial
How should i deal with data in vbos that i don't want to render anymore?
create a new vbo and copy the data into it or render only parts of that vbo which is in memory.
To render only parts see glVertexBuffer glTexCoordPointer(just calculate the new pointer and new size based on the offset)
Edit 1:
Using a single VBO for everything feels wrong, because you have to manage the allcation of new vertex positions/texture coordinates yourself which gets really messy really fast.
It is better to group small props into VBO's and batch the drawing commands.
Can i add data to a buffer with glBufferSubData (adding 100 elements to a buffer with size x)?
No, its not possible because the description says updates a subset of a buffer object's data store and a subset is a smaller set inside a set.
Edit 2
A good tutorial is also Learning Modern 3D Graphics Programming but its not VBO specific.
Related
In one of my previous questions (How to instance draw with different transformations for multiple objects) I asked how to instance draw with different transformations, one person answered that the proper way to do it is using instanced arrays.
This lead me to tutorials where they send transformation data through VAO, which is exactly what I was looking for.
But a new problem arose. Since my objects are dynamic (I want to move them) how do I update the buffer with their transformation?
Most of the tutorials Ive seen usually only render instanced objects once and thus they have no need to update the buffer. For a fact I wouldnt even know how to update a buffer to begin with, as I declare VAO with mesh at the beginning and it is not changed during the runtime of program.
What I think I should be doing: Store the transformations on CPU side in some array, when I do something which results in changing a specific transformation I will update this array and then update the transformation buffer.
Probably the actual question:
How do I update a buffer during the runtime of program?
Just use glBufferSubData to update the corresponding data on your GPU, from which your shaders get the transformations per instance (e.g. the vertex buffer for which the divisor is set to 1 with glVertexAttribDivisor(idx, 1)).
The other possibility would be to use glMapBuffer or glMapBufferRange to update that buffer.
So Let's say I have 100 different meshes that all use the same OpenGL shader. Reading OpenGL best practices apparently I should place them into the same vertex buffer object and draw them using glDrawElementsBaseVertex. Now my question is, if I only render a fraction of these meshes every frame, am I wasting resources by having all these meshes in the same vertex buffer object? What are the best practices for batching in this context?
Also are there any guidelines or ways I can determine how much should be placed into a single vertex buffer object?
if I only render a fraction of these meshes every frame, am I wasting resources by having all these meshes in the same vertex buffer object?
What resources could you possibly be wasting? The mere act of rendering doesn't use resources. And since you're going to render those other meshes sooner or later, it's better to have them in memory than to have to DMA them up.
Of course, this has to be balanced against the question of how much stuff you can fit into memory. It's a memory vs. performance tradeoff, and you have to decide for yourself and your application how appropriate it is to keep data you're not actively using around.
Common techniques for dealing with this include streaming. That is, what data is in memory depends on where you are in the scene. As you move through the world, new data for new areas is loaded in, overwriting data for old areas.
Also are there any guidelines or ways I can determine how much should be placed into a single vertex buffer object?
As much as you possibly can. The general rule of thumb is that the number of buffer objects you have should not vary with the number of objects you render.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am working on a little project in C++ using OpenGL. I want to be able to render multiple 2D rectangles either in some color or with texture as efficiently as possible.
However, I am doing this in modern C++, so I am writing many wrappers over OpenGL API so that I have better logical structure, RAII, ...
In this spirit I wanted to create a class Rectangle, which would have a method draw() and this method would activate corresponding OpenGL context and call glDrawArrays(). This works fine, but then I realized, that if I wanted to render more rectangles, I would have to cycle through many instances. They would each switch context and I don't think this is an effective solution.
After a bit of thinking my solution was to create a Renderer object, which would hold a single VAO for all Rectangles, associated program, and huge buffer in which I would hold all coordinates for my objects (Rectangle instance would then be like a smarter pointer to this buffer) and then draw them all at once. Of course it would add me a lot of work with managing the buffer itself (adding/removing rectangles). Would it be better?
Also, do you have any tips what else should I focus on?
In general you want to minimize the number of drawing calls. Placing a lot of geometry data in a single buffer object and batching it all up into a single glDraw… call is definitely the way to go. The best way to go about this is to not think things drawn using OpenGL being individual objects (there is no spoon) but just patches of colour, which by happenstance appear to be rectangles, boxes, spheres, spoons…
A way to implement this using C++ idioms is to have a buffer object class, which internally manages chunks of data contained inside a buffer object. Keep in mind that buffer objects by themself are pretty much formless, and only by using them as a source for vertex attributes they get meaning. Then your individual rectangles would each allocate from such a buffer object instance; the instance in return could deliver the rectangle attributes' indices relative to a certain internal offset (what's passed to glVertexAttribPointer); it actually kind of makes sense to have a buffer object class, a object attribute view class (which manages the attribute pointers) and the actual geometry classes which operate on attribute views.
In preparation of the actual drawing call the geometry instances would just emit the instances of their respective vertices; concatenate them up and use that for a glDrawElements call.
If your geometry is not changing then best way is to create VAO with one VBO for geometry of renctangle, one vbo with transformations for multiple rectangles you may draw and one vbo for texture coordinates and do instancing you remove lot of trafic from cpu to gpu by this.Also try to cache uniforms and dont set their value every rendering call if the values are not changing. Try to use tool like gltrace and see if you can reduce unnecessary state changes. Collect as much data as possible and then only do rendering call.
I did a lot of researches concerning the way to gather vertex data into groups commonly called batches.
Here's for me the 2 main interesting articles on the subject:
https://www.opengl.org/wiki/Vertex_Specification_Best_Practices
http://www.gamedev.net/page/resources/_/technical/opengl/opengl-batch-rendering-r3900
The first article explains what are the best practices on how to manipulate VBOs (max size, format etc).
The second presents a simple example on how to manage vertex memory using batches. According to the author each batch HAS TO contains an instance of a VBO (plus a VAO) and he insists strongly on the fact that the maximimum size of a VBO is ranged between 1Mo (1000000 bytes) to 4Mo (4000000 bytes). The first article advice the same thing. I quote "1MB to 4MB is a nice size according to one nVidia document. The driver can do memory management more easily. It should be the same case for all other implementations as well like ATI/AMD, Intel, SiS."
I have several questions:
1) Does the maximum byte size mentionned above is an absolute rule ? Is it so bad to allocate VBO with a byte size more important than 4Mo (for example 10 Mo) ?
2) How can we do concerning meshes with a total vertex byte size larger than 4Mo? Do I need to split the geometry into several batches?
3) Does a batch inevitably store as attribute a unique VBO or several batches can be store in a single VBO ? (It's two different ways but the first one seems to be the right choice). Are you agree ?
According to the author each batch handle a unique VBO with a maximum size between 1 and 4 Mo and the whole VBO HAS TO contain only vertex data sharing the same material and transformation information). So if I have to batch an other mesh with a different material (so the vertices can't be merged with existing bathes) I have to create a new batch with a NEW vbo instanciated.
So according to the author my second method is not correct : it's not adviced to store several batches into a single VBO.
Does the maximum byte size mentionned above is an absolute rule ? Is it so bad to allocate VBO with a byte size more important than 4Mo (for example 10 Mo) ?
No.
That was a (very) old piece of info that is not necessarily valid on modern hardware.
The issue that led to the 4MB suggestion was about the driver being able to manage memory. If you allocated more memory than the GPU had, it would need to page some in and out. If you use smaller chunks for your buffer objects, the driver is more easily able to pick whole buffers to page out (because they're not in use at present).
However, this does not matter so much. The best thing you can do for performance is to avoid exceeding memory limits entirely. Paging things in and out hurts performance.
So don't worry about it. Note that I have removed this "advice" from the Wiki.
So according to the author my second method is not correct : it's not adviced to store several batches into a single VBO.
I think you're confusing the word "batch" with "mesh". But that's perfectly understandable; the author of that document you read doesn't seem to recognize the difference either.
For the purposes of this discussion, a "mesh" is a thing that is rendered with a single rendering command, which is conceptually separate from other things you would render. Meshes get rendered with certain state.
A "batch" refers to one or more meshes that could have been rendered with separate rendering commands. However, in order to improve performance, you use techniques to allow them all to be rendered with the same rendering command. That's all a batch is.
"Batching" is the process of taking a sequence of meshes and making it possible to render them as a batch. Instanced rendering is one form of batching; each instance is a separate "mesh", but you are rendering lots of them with one rendering call. They use their instance count to fetch their per-instance state data.
Batching takes many forms beyond instanced rendering. Batching often happens at the level of the artist. While the modeller/texture artist may want to split a character into separate pieces, each with their own textures and materials, the graphics programmer tells them to keep them as a single mesh that can be rendered with the same textures/materials.
With better hardware, the rules for batching can be reduced. With array textures, you can give each mesh a particular ID, which it uses to pick which array layer it uses when fetching textures. This allows the artists to give such characters more texture variety without breaking the batch into multiple rendering calls. Ubershaders are another form, where the shader uses that ID to decide how to do lighting rather than (or in addition to) texture fetching.
The kind of batching that the person you're citing is talking about is... well, very confused.
What do you think about that?
Well, quite frankly I think the person from your second link should be ignored. The very first line of his code: class Batch sealed is not valid C++. It's some C++/CX Microsoft invention, which is fine in that context. But he's trying to pass this off as pure C++; that's not fine.
I'm also not particularly impressed by his code quality. He contradicts himself a lot. For example, he talks about the importance of being able to allocate reasonable-sized chunks of memory, so that the driver can more freely move things around. But his GuiVertex class is horribly bloated. It uses a full 16 bytes, four floats, just for colors. 4 bytes (as normalized unsigned integers) would have been sufficient. Similarly, his texture coordinates are floats, when shorts (as unsigned normalized integers) would have been fine for his use case. That would cut his per-vertex cost down from 32 bytes to 10; that's more than a 3:1 reduction.
4MB goes a lot longer when you use reasonably sized vertex data. And the best part? The OpenGL Wiki page he linked to tells you to do exactly this. But he doesn't do it.
Not to mention, he has apparently written this batch manager for a GUI (as alluded to by his GuiVertex type). Yet GUIs are probably the least batch-friendly rendering scenario in game development. You're frequently having to change state like bound textures, the current program (which reads from the texture or not), blending modes, the scissor box, etc.
Now with modern GPUs, there are certainly ways to make GUI renderers a lot more batch-friendly. But he never talks about them. He doesn't mention techniques to use gl_ClipDistance as a way to do scissor boxes with per-vertex data. He doesn't talk about ubershader usage, nor does his vertex format provide an ID that would allow such a thing.
As previously stated, batching is all about not having state changes between objects. But he focuses entirely on vertex state. He doesn't talk about textures, programs, etc. He doesn't talk about techniques to allow multiple objects to be part of the same batch while still having separate transforms.
His class can't really be used for batching of anything that couldn't have just been a single mesh.
This question comes in two (mostly) independent parts
My current setup is that I have a lot of Objects in gamespace. Each has a VBO assigned to it, which holds Vertex Attribute data for each vertex. If the Object wants to change its vertex data (position etc) it does so in an internal array and then call glBufferSubDataARB to update the version in the GPU.
Now I understand that this is a horrible thing to do and so I am looking for alternatives. One that presents itself is to have some managing thing that has a large VBO in the beginning and Objects can request space from it, and edit points in it. This drops the overhead of loading VBOs but comes with a large energy/time expenditure in creating and debugging such a beast (basically an entire memory management system).
My question (part (a)) is if this is the "best" method for doing this, or if there is something better that I have not thought of.
Such a system should allow easy addition/removal of vertices and editing them, as fast as possible.
Part (b) is about some simple actions taken on every object, ie those of rotation and translation. At the moment I am moving each vertex (ouch), but this must have a better option. I am considering uploading rotation and translation matrices to my shader to do there. This seems fine, but I am slightly worried about the overhead of changing uniform variables. Would it ultimately be to my advantage to do this? How fast is changing uniform variables?
Last time I checked the preferred way to do buffer updates was orphaning.
Basically, whenever you want to update your buffered data, you call glBindBuffer on your buffer, which invalidates the current content of the buffer, and then you write your new data with glMapBuffer / glBufferSubdata.
Hence:
Use a single big VBO for your static data is indeed a good idea. You must take care of the maximum allowed VBO size, and split your static data into multiple VBOs if necessary. But this is probably an over-optimization in most cases (i.e. "I wouldn't bother").
Data which is updated frequently should be grouped in the same VBO (with usage = GL_STREAM_DRAW), and you shall use orphaning to update that.
Unfortunately, the actual performance of this stuff varies on different implementations. This guy made some tests on an actual game, it may be worth reading.
For the second part of your question, obviously using uniforms is the way to do it. Yes there is some (little) overhead, but it's sure 1000 times better than streaming all your data at every frame.