Reuse vertex attribute buffer as index buffer?

Reuse vertex attribute buffer as index buffer? - opengl

Can I use a VBO which I initialise like this:
GLuint bufferID;
glGenBuffers(1,&BufferID);
glBindBuffer(GL_ARRAY_BUFFER,bufferID);
glBufferData(GL_ARRAY_BUFFER,nBytes,indexData,GL_DYNAMIC_DRAW);
as an index buffer, like this:
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER,bufferID);
/* ... set up vertex attributes, NOT using bufferID in the process ... */
glDrawElements(...);
I would like to use the buffer mostly as an attribute buffer and occasionally as an index buffer (but never at the same time).

There is nothing in the GL which prevents you from doing such things, your code above is legal GL. You can bind every buffer to every buffer binding target (you can even bind the same buffer to different targets at the same time, so it is even OK if attributes and index data come from the same buffer). However, the GL implementation might do some optimizations based on the observed behavior of the application, so you might end up with sub-optimal performance if you suddenly change the usage of an existing buffer object with such an approach, or use it for two things at once.
Update
The ARB_vertex_buffer_object extension spec, which introduced the concept of buffer objects to OpenGL, mentions this topic in the "Issues" section:
Should this extension include support for allowing vertex indices to be stored in buffer objects?
RESOLVED: YES. It is easily and cleanly added with just the
addition of a binding point for the index buffer object. Since
our approach of overloading pointers works for any pointer in GL,
no additional APIs need be defined, unlike in the various
*_element_array extensions.
Note that it is expected that implementations may have different
memory type requirements for efficient storage of indices and
vertices. For example, some systems may prefer indices in AGP
memory and vertices in video memory, or vice versa; or, on
systems where DMA of index data is not supported, index data must
be stored in (cacheable) system memory for acceptable
performance. As a result, applications are strongly urged to
put their models' vertex and index data in separate buffers, to
assist drivers in choosing the most efficient locations.
The reasoning that some implementations might prefer to keep index buffers in system RAM seems quite outdated, though.

While completely legal, it's sometimes discouraged to have attribute data and index data in the same buffer. I suspect that this is mostly based on a paragraph in the spec document (e.g. page 49 of the OpenGL 3.3 spec, at the end of the section "2.9.7 Array Indices in Buffer Objects"):
In some cases performance will be optimized by storing indices and array data in separate buffer objects, and by creating those buffer objects with the corresponding binding points.
While it seems plausible that it could be harmful to performance, I would be very interested to see benchmark results on actual platforms showing it. Attribute data and index data are used at the same time, and with the same access operations (CPU write, or blit from temporary storage, for filling the buffer with data, GPU read during rendering). So I can't think of a very good reason why they would need to be treated differently.
The only difference I can think of is that the index data is always read sequentially, while the attribute data is read out of order during indexed rendering. So it might be possible to apply different caching attributes for performance tuning the access in both cases.

Related

How to suballocate buffers in Vulkan

A recommended approach for memory management in Vulkan is sub-allocation of buffers, for instance see the image below.
I'm trying to implement "the good" approach. I have a system in place that can tell me where within a Memory Allocation is available, so I can bind a sub area of a single large buffer.
However, I can't find the mechanism to do this, or am just misunderstanding what is happening, as the bind functions take a buffer as input, and an offset. I can't see how to specify the size of the binding other than through the existing buffer.
So I have a few questions I guess:
are the dotted rectangles in the image below just bindings, or are they additional buffers?
if they are bindings, how do I tell Vulkan (ideally using VMA) to use that subsection of the buffer?
if they are additional buffers, how do I create them?
if neither, what are they?
I have read up on a few custom allocators, but they seem to follow the "bad" approach, returning offsets into large allocations for binding, so still plenty of buffers but lower allocation counts.
To be clear, I am not using custom allocator callbacks other than through VMA; the "system" to which I refer above sits on top of the VMA calls.
Any pointers much appreciated!

are the dotted rectangles in the image below just bindings, or are they additional buffers?
They represent the actual data. So the "Index" block is the range of storage that contains vertex indices.
if they are bindings, how do I tell Vulkan (ideally using VMA) to use that subsection of the buffer?
That depends on the particular nature of how you're using that VkBuffer as a resource. Generally speaking, every function that uses a VkBuffer as a resource takes a byte offset that represents where to start reading from. Many such functions also take a size which coupled with the offset represents the full quantity of data that can be read through that particular resource.
For example, vkCmdBindVertexBuffers takes an array of VkBuffers, and for each VkBuffer it also takes a byte offset that represents the starting point for that vertex buffer. VkDescriptorBufferInfo, the structure that represents a buffer used by a descriptor, takes a VkBuffer, a byte offset, and a size.
The vertex buffer (and index buffer) bindings don't have a size, but they don't need one. Their effective size is defined by the rendering command used with them (and the index data being read by it). If you render using 100 32-bit indices, then the expectation is that the index buffer's size, minus the starting offset, should be at least 400 bytes. And if it isn't, UB results.

Deriving the `VkMemoryRequirements`

Is there a way to get the right values for a VkMemoryRequirements structure without having to allocate a buffer first and without using vkGetBufferMemoryRequirements?
Is it supported/compliant?
Motivation - Short version
I have an application that does the following, and everything works as expected.
VkMemoryRequirements memReq;
vkGetBufferMemoryRequirements(application.shell->device, uniformBuffer, &memReq);
int memType = application.shell->findMemoryType(memReq.memoryTypeBits, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT);
Internally, findMemoryType loops over the memory types and checks that they have the required property flags.
If I replace the call to vkGetMemoryRequirements with hardcoded values (which are not portable, specific to my system and obtained through debugging), everything still works and I don't get any validation errors.
VkMemoryRequirements memReq = { 768, 256, 1665 };
int memType = application.shell->findMemoryType(memReq.memoryTypeBits, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT);
The above code is IMHO neat because enables to pre-allocate memory before you actually need it.
Motivation - Long version
In Vulkan you create buffers which initially are not backed by device memory and at a later stage you allocate the memory and bind it to the buffer using vkBindBufferMemory:
VkResult vkBindBufferMemory(
VkDevice device,
VkBuffer buffer,
VkDeviceMemory memory,
VkDeviceSize memoryOffset);
Its Vulkan spec states that:
memory must have been allocated using one of the memory types allowed
in the memoryTypeBits member of the VkMemoryRequirements structure
returned from a call to vkGetBufferMemoryRequirements with buffer
Which implies that before allocating the memory for a buffer, you should have already created the buffer.
I have a feeling that in some circumstances it would be useful to pre-allocate a chunk of memory before you actually need it; in most of the OpenGL flavors I have experience with this was not possible, but Vulkan should not suffer from this limitation, right?
Is there a (more or less automatic) way to get the memory requirements before creating the first buffer?
Is it supported/compliant?
Obviously, when you do allocate the memory for the first buffer you can allocate a little more so that when you need a second buffer you can bind it to another range in the same chunk. But my understanding is that, to comply with the spec, you would still need to call vkGetBufferMemoryRequirements on the second buffer, even if it is exactly the same type and the same size as the first one.

This question already recognizes that the answer is "no"; you just seem to want to do an end-run around what you already know. Which you can't.
The code you showed with the hard-coded values works because you already know the answer. It's not that Vulkan requires you to ask the question; Vulkan requires you to provide buffers that use the answer.
However, since "the answer" is implementation-specific, it changes depending on the hardware. It could change when you install a new driver. Indeed, it could change even depending on which extensions or Vulkan features you activate when creating the VkDevice.
That having been said:
Which implies that before allocating the memory for a buffer, you should have already created the buffer.
Incorrect. It requires that you have the answer and have selected memory and byte offsets appropriate to that answer. But Vulkan is specifically loose about what "the answer" actually means.
Vulkan has specific guarantees in place which allow you to know the answer for a particular buffer/image without necessarily having asked about that specific VkBuffer/Image object. The details are kind of complicated, but for buffers they are pretty lax.
The basic idea is that you can create a test VkBuffer/Image and ask about its memory properties. You can then use that answer to know what the properties of the buffers you intend to use which are "similar" to that. At the very least, Vulkan guarantees that two identical buffer/images (formats, usage flags, sizes, etc) will always produce identical memory properties.
But Vulkan also offers a few other guarantees. There are basically 3 things that the memory properties tell you:
The memory types that this object can be bound to.
The alignment requirement for the offset for the memory object.
The byte size the object will take up in memory.
For the size, you get only the most basic guarantee: equivalent buffer/images will produce equivalent sizes.
For the alignment, images are as strict as sizes: only equivalent images are guaranteed to produce equivalent alignment. But for buffers, things are more lax. If the test buffer differs only in usage flags, and the final buffer uses a subset of the usage flags, then the alignment for the final buffer will not be more restrictive than the test buffer. So you can use the alignment from the test buffer.
For the memory types, things are even more loose. For images, the only things that matter are:
Tiling
Certain memory flags (sparse/split-instance binding)
Whether the image format is color or depth/stencil
If the image format is depth/stencil, then the formats must match
External memory
Transient allocation usage
If all of these are the same for two VkImage objects, then the standard guarantees that all such images will support the same set of memory types.
For buffers, things are even more lax. For non-sparse buffers, if your test buffer differs from the final buffer only by usage flags, then if the final one has a subset of the usage flags of the test buffer, then the set of memory types it supports must include all of the ones from the test buffer. The final buffer could support more, but it must support at least those of such a test buffer.
Oh, and linear images and buffers must always be able to be used in at least one mappable, coherent memory type. Of course, this requires that you have created a valid VkDevice/Image with those usage and flags fields, so if the device doesn't allow (for example) linear images to be used as textures, then that gets stopped well before asking about memory properties.

OpenGL Texture and Object Streaming

I have a need to stream a texture (essentially a camera feed).
With object streaming, the following scenarios seem to be arise:
Is the new object's data store larger, smaller or same size as the old one?
Subset of or whole texture being updated?
Are we streaming a buffer object or texture object (any difference?)
Here are the following approaches I have come across:
Allocate object data store (either BufferData for buffers or TexImage2D for textures) and then each frame, update subset of data with BufferSubData or TexSubImage2D
Nullify/invalidate the object after the last call (eg. draw) that uses the object either with:
Nullify: glTexSubImage2D( ..., NULL), glBufferSubData( ..., NULL)
Invalidate: glBufferInvalidate(), glMapBufferRange with the GL_MAP_INVALIDATE_BUFFER_BIT, glDeleteTextures ?
Simpliy reinvoke BufferData or TexImage2D with the new data
Manually implement object multi-buffering / buffer ping-ponging.
Most immediately, my problem scenario is: entire texture being replaced with new one of same size. How do I implement this? Will (1) implicitly synchronize ? Does (2) avoid the synchronization? Will (3) synchronize or will a new data store for the object be allocated, where our update can be uploaded without waiting for all drawing using the old object state to finish? This passage from the Red Book V4.3 makes be believe so:
Data can also be copied between buffer objects using the
glCopyBufferSubData() function. Rather than assembling chunks of data
in one large buffer object using glBufferSubData(), it is possible to
upload the data into separate buffers using glBufferData() and then
copy from those buffers into the larger buffer using
glCopyBufferSubData(). Depending on the OpenGL implementation, it may
be able to overlap these copies because each time you call
glBufferData() on a buffer object, it invalidates whatever contents
may have been there before. Therefore, OpenGL can sometimes just
allocate a whole new data store for your data, even though a copy
operation from the previous store has not completed yet. It will then
release the old storage at a later opportunity.
But if so, why the need for (2)[nullify/invalidates]?
Also, please discuss the above approaches, and others, and their effectiveness for the various scenarios, while keeping in mind atleast the following issues:
Whether implicit synchronization to object (ie. synchronizing our update with OpenGL's usage) occurs
Memory usage
Speed
I've read http://www.opengl.org/wiki/Buffer_Object_Streaming but it doesn't offer conclusive information.

Let me try to answer at least a few of the questions you raised.
The scenarios you talk about can have a great impact on the performance on the different approaches, especially when considering the first point about the dynamic size of the buffer. In your scenario of video streaming, the size will rarely change, so a more expensive "re-configuration" of the data structures you use might be possible. If the size changes every frame or every few frames, this is typically not feasable. However, if a resonable maximum size limit can be enforced, just using buffers/textures with the maximum size might be a good strategy. Neither with buffers nor with textures you have to use all the space there is (although there are some smaller issues when you do this with texures, like wrap modes).
3.Are we streaming a buffer object or texture object (any difference?)
Well, the only way to efficiently stream image data to or from the GL is to use pixel buffer objects (PBOs). So you always have to deal with buffer objects in the first place, no matter if vertex data, image data or whatever data is to be tranfered. The buffer is just the source for some glTex*Image() call in the texture case, and of course you'll need a texture object for that.
Let's come to your approaches:
In approach (1), you use the "Sub" variant of the update commands. In that case, (parts of or the whole) storage of the existing object is updated. This is likely to trigger an implicit synchronziation ifold data is still in use. The GL has basically only two options: wait for all operations (potentially) depending on that data to complete, or make an intermediate copy of the new data and let the client go on. Both options are not good from a performance point of view.
In approach (2), you have some misconception. The "Sub" variants of the update commands will never invalidate/orphan your buffers. The "non-sub" glBufferData() will create a completely new storage for the object, and using it with NULL as data pointer will leave that storage unintialized. Internally, the GL implementation might re-use some memory which was in use for earlier buffer storage. So if you do this scheme, there is some probablity that you effectively end up using a ring-buffer of the same memory areas if you always use the same buffer size.
The other methods for invalidation you mentiond allow you to also invalidate parts of the buffer and also a more fine-grained control of what is happening.
Approach (3) is basically the same as (2) with the glBufferData() oprhaning, but you just specify the new data directly at this stage.
Approach (4) is the one I actually would recommend, as it is the one which gives the application the most control over what is happening, without having to relies on the GL implementation's specific internal workings.
Without taking synchronization into account, the "sub" variant of the update commands is
more efficient, even if the whole data storage is to be changed, not just some part. That is because the "non-sub" variants of the commands basically recreate the storage and introduce some overhead with this. With manually managing the ring buffers, you can avoid any of that overhead, and you don't have to rely in the GL to be clever, by just using the "sub" variants of the updates functions. At the same time, you can avoid implicit synchroniztion by only updating buffers which aren't in use by th GL any more. This scheme can also nicely be extenden into a multi-threaded scenario. You can have one (or several) extra threads with separate (but shared) GL contexts to fill the buffers for you, and just passing the buffer handlings to the draw thread as soon as the update is complete. You can also just map the buffers in the draw thread and let the be filled by worker threads (wihtout the need for additional GL contexts at all).
OpenGL 4.4 introduced GL_ARB_buffer_storage and with it came the GL_MAP_PERSISTEN_BIT for glMapBufferRange. That will allow you to keep all of the buffers mapped while they are used by the GL - so it allows you to avoid the overhead of mapping the buffers into the address space again and again. You then will have no implicit synchronzation at all - but you have to synchronize the operations manually. OpenGL's synchronization objects (see GL_ARB_sync) might help you with that, but the main burden on synchronization is on your applications logic itself. When streaming videos to the GL, just avoid re-using the buffer which was the source for the glTexSubImage() call immediately and try to delay its re-use as long as possible. You are of course also trading throughput for latency. If you need to minimize latency, you might to have to tweak this logic a bit.
Comparing the approaches for "memory usage" is really hard. There are a lot of of implementation specific details to consider here. A GL implementation might keep some old buffer memories around for some time to fullfill recreation requests of the same size. Also, an GL implementation might make shadow copies of any data at any time. The approaches which don't orphan and recreate storages all the time in principle expose more control of the memory which is in use.
"Speed" itself is also not a very useful metric. You basically have to balance throughput and latency here, according to the requirements of your application.

What is the purpose of OpenGL texture buffer objects?

We use buffer objects for reducing copy operations from CPU-GPU and for texture buffer objects we can change target from vertex to texture in buffer objects. Is there any other advantage here of texture buffer objects? Also, it does not allow filtering, is there any disadvantage of this?

A buffer texture is similar to a 1D-texture but has a backing buffer store that's not part of the texture object (in contrast to any other texture object) but realized with an actual buffer object bound to TEXTURE_BUFFER. Using a buffer texture has several implications and, AFAIK, one use-case that can't be mapped to any other type of texture.
Note that a buffer texture is not a buffer object - a buffer texture is merely associated with a buffer object using glTexBuffer.
By comparison, buffer textures can be huge. Table 23.53 and following of the core OpenGL 4.4 spec defines a minimum maximum (i.e. the minimal value that implementations must provide) number of texels MAX_TEXTURE_BUFFER_SIZE. The potential number of texels being stored in your buffer object is computed as follows (as found in GL_ARB_texture_buffer_object):
floor(<buffer_size> / (<components> * sizeof(<base_type>))
The resulting value clamped to MAX_TEXTURE_BUFFER_SIZE is the number of addressable texels.
Example:
You have a buffer object storing 4MiB of data. What you want is a buffer texture for addressing RGBA texels, so you choose an internal format RGBA8. The addressable number of texels is then
floor(4MiB / (4 * sizeof(UNSIGNED_BYTE)) == 1024^2 texels == 2^20 texels
If your implementation supports this number, you can address the full range of values in your buffer object. The above isn't too impressive and can simply be achieved with any other texture on current implementations. However, the machine on which I'm writing this answer supports 2^28 == 268435456 texels.
With OpenGL 4.4 (and 4.3 and possibly with earlier 4.x versions), the MAX_TEXTURE_SIZE is 2 ^ 16 texels per 1D-texture, so a buffer texture can still be 4 times as large. On my local machine I can allocate a 2GiB buffer texture (even larger actually), but only a 1GiB 1D-texture when using RGBAF32 texels.
A use-case for buffer textures is random (and atomic, if desired) read-/write-access (the latter via image load/store) to a large data store inside a shader. Yes, you can do random read-access on arrays of uniforms inside one or multiple blocks but it get's very tedious if you have to process a lot of data and have to work with multiple blocks and even then, looking at the maximum combined size of all uniform components (where a single float component has a size of 4 bytes) in all uniform blocks for a single stage,
MAX_(stage)_UNIFORM_BLOCKS *
MAX_UNIFORM_BLOCK_SIZE +
MAX_(stage)_UNIFORM_COMPONENTS * 4
isn't really a lot of space to work with in a shader stage (depending on how large your implementation allows the above number to be).
An important difference between textures and buffer textures is that the data store, as a regular buffer object, can be used in operations where a texture simply does not work. The extension mentions:
The use of a buffer object to provide storage allows the texture data to
be specified in a number of different ways: via buffer object loads
(BufferData), direct CPU writes (MapBuffer), framebuffer readbacks
(EXT_pixel_buffer_object extension). A buffer object can also be loaded
by transform feedback (NV_transform_feedback extension), which captures
selected transformed attributes of vertices processed by the GL. Several
of these mechanisms do not require an extra data copy, which would be
required when using conventional TexImage-like entry points.
An implication of using buffer textures is that look-ups inside a shader can only be done via texelFetch. Buffer textures also aren't mip-mapped and, as you already mentioned, during fetches there is no filtering.
Addendum:
Since OpenGL 4.3, we have what is called a
Shader Storage Buffer. These too provide random (atomic) read-/write-access to a large data store but don't need to be accessed with texelFetch() or image load/store functions as is the case for buffer textures. Using buffer textures also implies having to deal with gvec4 return values, both with texelFetch() and imageLoad() / imageStore(). This becomes very tedious as soon as you want to work with structures (or arrays thereof) and you don't want to think of some stupid packing scheme using multiple instances of vec4 or using multiple buffer textures to achieve something similar. With a buffer accessed as shader storage, you can simple index into the data store and pull one or more instances of some struct {} directly from the buffer.
Also, since they are very similar to uniform blocks, using them should be fairly straight forward - if you know how to use uniform buffers, you don't have a long way to go learn how to use shader storage buffers.
It's also absolutely worth browsing the Issues section of the corresponding ARB extension.
Performance Implications
Daniel Rakos did some performance analysis years ago, both as a comparison of uniform buffers and buffer textures, and also on a little more general note based on information from AMD's OpenCL programming guide. There is now a very recent version, specifically targeting OpenCL optimization an AMD platforms.
There are many factors influencing performance:
access patterns and resulting caching behavior
cache line sizes and memory layou
what kind of memory is accessed (registers, local, global, L1/L2 etc.) and its respective memory bandwidth
how well memory fetching latency is hidden by doing something else in the meantime
what kind of hardware you're on, i.e. a dedicated graphics card with dedicated memory or some unified memory architecture
etc., etc.
As always when worrying about performance: implement something that works and see if that solutions is fast enough for your needs. Otherwise, implement two or more approaches to solving the problem, profile them and compare.
Also, vendor specific guides can offer a great deal of insight. The above mentioned OpenCL user and optimization guides provide a high-level architectural perspective and specific hints on how to optimize your CL kernels - stuff that's also relevant when developing shaders.

A one use case I have found was to store per primitive attributes (accessed in the fragment shader with help of gl_PrimitiveID) while still maintaining unique vertices in the indexed mesh.

Buffer drawing in OpenGL

In this question I'm interested in buffer-drawing in OpenGL, specifically in the tradeoff of using one buffer per data set vs one buffer for more than one data set.
Context:
Consider a data set of N vertices each represented by a set of attributes (e.g. color, texture, normals).
Each attribute is represented by a type (e.g. GLfloat, GLint) and a number of components (2, 3, 4). We want to draw this data. Schematically,
(non-interleaved representation)
data set
<-------------->
a_1 a_2 a_3
<---><---><---->
a_i = attribute; e.g. a2 = (3 GLfloats representing color, thus 3*N Glfloats)
We want to map this into the GL state, using glBufferSubData.
Problem
When mapping, we have to keep track of the data in our memory because glBufferSubData requires a start and size. This sounds to me like an allocation problem: we want to allocate memory and keep track of its position. Since we want fast access to it, we would like the data to be in the same memory position, e.g. with a std::vector<char>. Schematically,
data set 1 data set 2
<------------><-------------->
(both have same buffer id)
We commit to the gl state as:
// id is binded to one std::vector<char>, "data".
glBindBuffer(target, id);
// for each data_set (AFTER calling glBindBuffer).
// for each attribute
// "start": the start point of the attribute.
// "size": (sizeof*components of the attribute)*N.
glBufferSubData(target, start, size, &(data[0]))
(non non-interleaved for the sake of the code).
the problem arises when we want to add or remove vertices, e.g. when LOD changes. Because each data set must be a chunk, for instance to allow interleaved drawing (even in non-interleaved, each attribute is a chunk), we will end up with fragmentation in our std::vector<char>.
On the other hand, we can also set one chunk per buffer: instead of assigning chunks to the same buffer, we assign each chuck, now a std::vector<char>, to a different buffer. Schematically,
data set 1 (buffer id1)
<------------>
data set 2 (buffer id2)
<-------------->
We commit data to the gl state as:
// for each data_set (BEFORE calling glBindBuffer).
// "data" is the std::vector<char> of this data_set.
// id is now binded to the specific std::vector<char>
glBindBuffer(target, id);
// for each attribute
// "start": the start point of the attribute.
// "size": (sizeof*components of the attribute)*N.
glBufferSubData(target, start, size, &(data[0]))
Questions
I'm learning this, so, before any of the below: is this reasoning correct?
Assuming yes,
Is it a problem to have an arbitrary number of buffers?
Is "glBindBuffer" expected to scale with the number of buffers?
What are the major points to take into consideration in this decision?

It is not quite clear if you asking about performance trade-offs. But I will answer in this key.
Is it a problem to have an arbitrary number of buffers?
It is a problem came from a dark medieval times when pipelines was fixed and rest for now due to backward compatibility reasons. glBind* is considered as a (one of) performance bottleneck in modern OpenGL drivers, caused by bad locality of references and cache misses. Simply speaking, cache is cold and huge part of time CPU just waits in driver for data transferred from main memory. There is nothing drivers implementers can do with current API. Read Nvidia's short article about it and their bindless extensions proposals.
2. Is "glBindBuffer" expected to scale with the number of buffers?
Surely, the more objects (buffers in your case), more bind calls, more performance loss in driver. But merged, huge resource objects are less manageable.
3. What are the major points to take into consideration in this decision?
Only one. Profiling results ;)
"Premature optimization is the root of all evil", so try to stay as much objective as possible and believe only in numbers. When numbers will go bad, we can think of:
"Huge", "all in one" resources:
less bind calls
less context changes
harder to manage and debug, need some additional code infrastructure (to update resource data for example)
resizing (reallocation) very slow
Separate resources:
more bind calls, loosing time in driver
more context changes
easier to manage, less error-prone
easy to resize, allocate, reallocate
In the end, we can see have performance-complexity trade-off and different behavior when update data. To stick one approach or another, you must:
decide, would you like to keep things simple, manageable or add complexity and gain additional FPS (profile in graphics profilers to know how much. Does it worth it?)
know how often you resize/reallocate buffers (trace API calls in graphics debuggers).
Hope it helps somehow ;)
If you like theoretical assertions like this, probably you will be interested in another one, about interleaving (DirectX one)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js