I'm drawing simple 3D shapes and I was wondering in the long run is it better to only use 1 buffer to store all the data of your vertices?
Right now I have arrays of vertex data (positions and colors, per vertex) and I am pushing them to their own separate buffers.
But if I use stride and offset, I could join them into one array but that would become messier and harder to manage.
What is the "traditional" way of doing this?
It feels much cleaner and organized to have separate buffers for each piece of data, but I would imagine it's less efficient.
Is the efficiency increase worth putting it all into a single buffer?
The answer to this is highly usage-dependent.
If all your vertex attributes are highly volatile or highly static, you would probably benefit from interleaving and keeping them all together, as mentioned in the comments.
However, separating the data can yield better performance if one attribute is far more volatile than others. For example, if you have a mesh where you're often changing the vertex positions, but never the texture coordinates, you might benefit from keeping them separate: you would only need to re-upload the positions to the video card, instead of the whole set of attributes. An example of this might be a CPU-driven cloth simulation.
It is also hardware and implementation dependent. Interleaving isn't helpful everywhere, but I've never heard of it having a negative impact. If you can use it, you probably should.
However, since you can't properly interleave if you split the attributes, you're essentially comparing the performance impacts of two unknowns. Will interleaving help on your target hardware/drivers? Will your data benefit from being split? The first there's no real answer to. The second is between you and your data.
Personally, I would suggest just using interleaved single blocks of vertex attributes unless you have a highly specialized need. It cuts the complexity, as opposed to needing to have potentially different systems mixed together in the same back end.
On the other hand, setting up interleaving is a rather complex task as far as memory addressing goes in C++. If you're not designing an entire graphics engine from scratch, I really doubt it's worth the effort for you. But again, that's up to you and your application.
In theory, though, merely grouping together the data you were going to upload to the video card regardless should have little impact. It might be slightly more efficient to group all the attributes together due to reducing the number of calls, but that's again going to be highly driver-dependent.
Unfortunately, I think the simple answer to your question boils down to: "it depends" and "no one really knows".
Related
Let's imagine that I have two squeres. Firstly I generate the VAO, VBO, then bind it and so on... My goal is to check the collision between the two objects in every frame. In this case, I have to know the exact verticies both the cpu and the gpu side. So I store every single vertex twice. If I work with large amount of data, the mirroring not seems to be efficient, not to mention the logistic about keeping the data consistent. Is there a better way to do this? Or this is totally OK, to keep the verticies in an array after the glBufferData call?
There needs to be more information on your part. Is this something you plan on instancing? Are you sure there's a bottleneck on your bandwidth?
If this is something that you can simulate on the GPU, then just do it all on the GPU so you keep the memory on that side and not incur the transfer penalty from CPU to GPU.
If you need it to be on the CPU side for your collision detection, then you have a few options:
Update the ones that change. If all of them are changing you should ignore this option, but you could map the buffer and update it and try to flush it only after you've updated what ranges are needed.
Send a displacement. If you end up having a ton of data, you may be able to get away with just sending a rotation and central position to cut down on updating "every vertex", and might be able to exploit a Geometry Shader... however I've read that these can be problematic for performance so you should consider it but be ready to profile.
You can possibly stream data if you have to update all of them, see this wonderful resource.
You need to define your problem domain a bit more because I'm not sure exactly what the bounds on the problem are. The above are some ways of tackling these problems, but the best solution can only be given to you if you are more specific with what you want when you talk about the large case.
You also have to understand that asking for massive data manipulation and fast transfer tend to be topics that fight each other, and you'll have to be smarter about what you plan on doing depending on exactly how much data you are talking about here.
I'd like to answer with something more concrete but I'm just shooting in the dark because I don't know exactly what the limit of your data is and what hardware you're working with.
Are there any modeling formats that directly support Vertex Buffer Objects?
Currently my game engine has been using Wavefront Models, but I have always been using them with immediate mode and display lists. This works, but I wanted to upgrade my entire system to modern OpenGL, including Shaders. I know that I can use immediate mode and display lists with Shaders, but like most aspiring developers, I want my game to be the best it can be. After asking the question linked above, I quickly came to the realization that Wavefront Models simply don't support Vertex Buffers; this is mainly due to the fact of how the model is indexed. In order for a Vertex Buffer Object to be used, Vertices, Texture Coordinates, and the Normal arrays all need to be equal in length.
I can achieve this by writing my own converter, which I have done. Essentially I unroll the indexing and create the associated arrays. I don't even need to exactly use glDrawElements then, I can just use glDrawArrays, which I'm perfectly fine doing. The only problem is that I am actually duplicating data; the arrays become massive(especially with large models), and this just seems wrong to me. Certainly there has to be a modern way of initializing a model into a Vertex Buffer without completely unrolling the indexing. So I have two questions.
1. Are their any modern model formats/concepts that support direct Vertex Buffer Objects?
2. Is this already an industry standard? Do most game engines unroll the indexing(and inflate the arrays also called unpacking) at runtime to create the game world assets?
The primary concern with storage formats is space efficiency. Reading from storage media you're limited by I/O bandwidth by large. So any CPU cycles you can invest to reduce the total amount of data to be read from storage will hugely benefit asset loading times. Just to give you the general idea. Even the fastest SSDs you can currently buy at the time of writing this won't get over 5GiB/s (believe me, I tried sourcing something that can saturate 8 lanes of PCIe-3 for my work). Your typical CPU memory bandwidth is at least one order of magnitude above that. GPUs have even more memory bandwidth. Even faster are lower level caches.
So what I'm trying to tell you: That index unrolling overhead? It's mostly an inconvenience for you, the developer, but probably shaves off some time from loading the assets.
(suggested edit): Of course storing numbers in their text representation is not going to help with space efficiency; depending on the choice of base a single digit represents between 3 to 5 bits (lets say 4 bits). That same text character however consumes 8 bits, so you have about 100% overhead there. The lowest hanging fruit this is storing in a binary format.
But why stop there? How about applying compression on the data? There are a number of compressed asset formats. But one particularly well developed one is OpenCTM, although it would make some sense to add one of the recently developed compression algorithms to it. I'm thinking of Zstandard here, which compresses data ridiculously well and at the same time is obscenely fast at decompression.
Suppose I have many meshes I'd like to render. I have two choices:
Bake transforms and colors for each mesh into a VBO and render with a single draw call.
Use glUniform for transforms and colors and use many draw calls (but still a single VBO)
Assuming the scene changes very little between frames, which method tends to be better?
There are more than those two choices. At least one more comes to mind:
...
....
Use attributes for transforms and colors and use many draw calls.
Choice 3 is similar to choice 2, but setting attributes (using calls like glVertexAttrib4f) is mostly faster than setting uniforms. The efficiency of setting uniforms is highly platform dependent. But they're generally not intended to be modified very frequently. They are called uniform for a reason. :)
That being said, choice 1 might be the best for your use case where the transforms/colors change rarely. If you're not doing this yet, you could try keeping the attributes that are modified in a separate VBO (with usage GL_DYNAMIC_DRAW), and the attributes that remain constant in their own VBO (with usage GL_STATIC_DRAW). Then make the necessary updates to the dynamic buffer with glBufferSubData.
The reality is that there are no simple rules to predict what is going to perform best. It will depend on the size of your data and draw calls, how frequent and large the data changes are, and also very much on the platform you run on. If you want to be confident that you're using the most efficient solution, you need to implement all of them, and start benchmarking.
Generally, option 1 (minimize number of draw calls) is the best advice. There are a couple of caveats:
I have seen performance fall off a cliff when using very large VBOs on at least one mobile device (assuming relevant for opengl-es tag). The explanation (from the vendor) involved internal buffers exceeding a certain size.
If putting all the information which would otherwise be conveyed with uniforms into vertex attributes significantly increases the size of the vertex buffer, the price you pay (in perhaps costly memory reads) of reading redundant information (because it doesn't really vary per vertex) might negate the savings of using fewer draw calls.
As always the best (but tiresome) advice is to test (I know this is particularly hard developing for mobile where there are many potential implementations your code could be running on). Try to keep your pipeline/toolchain flexible enough that you can easily try out and compare different options.
I need to know how I can render many different 3D models, which change their geometry to each frame (are animated models), don't repeat models and textures.
I carry all models and for each created an "object" model class.
What is the most optimal way to render them?
To use 1 VBO for each 3D model
To use a single VBO for all models (to be all different, I do not see this option possible)
I work with OpenGL 3.x or higher, C++ on Windows.
TL; DR - there's no silver bullet when it comes to rendering performance
Why is that? That depends on the complicated process that gets your data, converts it, pushes it to GPU and then makes pixels on the screen flicker. So, instead of "one best way", a few of guideliness appeared that might usually improve the performance.
Keep all the necessary data on the GPU (because the closer to the screen, the shorter way electrons have to go :))
Send as little data to GPU between frames as possible
Don't sync needlessly between CPU and GPU (that's like trying to run two high speed trains on parallel tracks, but insisting on slowing them down to the point where you can pass something through the window every once in a while),
Now, it's obvious that if you want to have a model that will change, you can't have the cake and eat it. You have to made tradeoffs. Simply put, dynamic objects will never render as fast as static ones. So, what should you do?
Hint GPU about the data usage (GL_STREAM_DRAW or GL_DYNAMIC_DRAW) - that should guarantee optimal memory arrangement.
Don't use interleaved buffers to mix static vertex attributes with dynamic ones - if you divide the memory, you can batch-update the geometry leaving texture coordinates intact, for example.
Try to do as much as you can purely on the GPU - with compute shaders and transform feedback, it might well be possible to store whole animation data as a buffer itself and calculate it on GPU, avoiding expensive syncs.
And last but not least, always carefully measure the impact of your change on performance. Going blindly won't help. Measure accurately and thoroughly (even stuff like shader compilation time might matter sometimes!). Then, even if you go by trial-and-error, there's a hope you'll get somewhere.
And to address one of your points in particular; whether it's one large VBO and a few smaller ones doesn't really matter, but a huge one might have problems in fitting in memory. You can still update parts of it, and what matters most is the memory arrangement inside of it.
What are some common guidelines in choosing vertex buffer type? When should we use interlaced buffers for vertex data, and when separate ones? When should we use an index array and when direct vertex data?
I'm searching for some common quidelines - I some cases where one or the opposite fits better, but not all cases are easily solvable. What should one have in mind choosing the vertex buffer format when aiming for performance?
Links to web resources on the topic are also welcome.
First of all, you can find some useful information on the OpenGL wiki. Second of all, if in doubt, profile, there are some rules-of-thumb about this one but experience can vary based on the data set, hardware, drivers, ... .
Indexed versus direct rendering
I would almost always by default use the indexed method for vertex buffers. The main reason for this is the so called post-transform cache. It's a cache kept after the vertex processing stage of your graphics pipeline. Essentially it means that if you use a vertex multiple times you have a good chance of hitting this cache and being able to skip the vertex computation. There is one condition to even hit this cache and that is that you need to use indexed buffers, it won't work without them as the index is a part of this cache's key.
Also, you likely will save storage, an index can be as small as you want (1 byte, 2 byte) and you can reuse a full vertex specification. Suppose that a vertex and all attributes total to about 30 bytes of data and you share this vertex over let's say 2 polygons. With indexed rendering (2 byte indices) this will cost you 2*index_size+attribute_size = 34 byte. With non-indexed rendering this will cost you 60 bytes. Often your vertices will be shared more than twice.
Is index-based rendering always better? No, there might be scenarios where it's worse. For very simple applications it might not be worth the code overhead to set up an index-based data model. Also, when your attributes are not shared over polygons (e.g. normal per-polygon instead of per-vertex) there is likely no vertex-sharing at all and IBO's won't give a benefit, only overhead.
Next to that, while it enables the post-transform cache, it does make generic memory cache performance worse. Because you access the attributes relatively random, you might have quite some more cache misses and memory prefetching (if this would be done on the GPU) won't work decently. So it might be (but measure) that if you have enough memory and your vertex shader is extremely simple that the non-indexed version outperforms the indexed version.
Interleaving vs non-interleaving vs buffer per-attribute
This story is a bit more subtle and I think it comes down to weighing some properties of your attributes.
Interleaved might be better because all attributes will be close together and likely be in a few memory cachelines (maybe even a single one). Obviously, this can mean better peformance. However, combined with indexed-based rendering your memory access is quite random anyway and the benefit might be smaller than you'd expect.
Know which attributes are static and which are dynamic. If you have 5 attributes of which 2 are completely static, 1 changes every 15 minutes and 2 every 10 seconds, consider putting them in 2 or 3 separate buffers. You don't want to re-upload all 5 attributes every time those 2 most frequent change.
Consider that attributes should be aligned on 4 bytes. So you might want to take interleaving even one step further from time to time. Suppose you have a vec3 1-byte attribute and some scalar 1-byte attribute, naively this will need 8 bytes. You might gain a lot by putting them together in a single vec4, which should reduce usage to 4 bytes.
Play with buffer size, a too large buffer or too many small buffers may impact performance. But this is likely very dependent on the hardware, driver and OpenGL implementation.
Indexed vs Direct
Let's see what you get by indexing. Every repeating vertex, that is, a vertex with "smooth" break will cost you less. Every singular "edge" vertex will cost you more. For data that's based on real world and is relatively dense, one vertex will belong to many triangles, and thus indexes will speed it up. For procedurally generated arbitrary data, direct mode will usually be better.
Indexed buffers also add additional complications to the code.
Interleaved vs Separate
The main difference here is actually based on a question "will I want to update only one component?". If the answer is yes, then you shouldn't interleave, because any update will be extremely costly. If it's no, using interleaved buffers should improve locality of reference and generally be faster on most of the hardware.