I'm working on a program that loads new data of a model to the graphics card using OpenGL, it then switches to rendering that one, and then removes the old data so as to create more space for other uses.
From my understanding I shouldn't be creating/releasing buffers on the fly as it can lead to memory thrashing.
Is it bad to call glBufferData frequently to add new data to the graphics card? Does this count as creating/releasing buffers?
If you call glBufferData with the same size and usage parameters as it was called previously, then this is effectively invalidating or "orphaning" the buffer. To do anything else, to change the size or usage, is to effectively create a new buffer.
If you aren't streaming data (uploading new data every frame or so), invalidation is not especially useful. If you're no longer using the buffer, and you haven't used it in a while, just leave it there if you're going to need buffer storage again.
And if your models use different sizes, preallocate a large buffer object and have different models use different regions from that one allocation.
Related
I am working with PointCloud data that I need to render using opengl. I get a new vector of data points every frame. I want that I be able to cache the data previously sent to opengl and only send the newest frame data to it. How can I do so?
I did some searching and found this idea here:
// Bind the old buffer to `GL_COPY_READ_BUFFER`
glBindBuffer (GL_COPY_READ_BUFFER, old_buffer);
// Allocate data for a new buffer
glGenBuffers (1, &new_buffer);
glBindBuffer (GL_COPY_WRITE_BUFFER, new_buffer);
glBufferData (GL_COPY_WRITE_BUFFER, ...);
// Copy `old_buffer_size`-bytes of data from `GL_COPY_READ_BUFFER`
// to `GL_COPY_WRITE_BUFFER` beginning at 0.
glCopyBufferSubData (GL_COPY_READ_BUFFER, GL_COPY_WRITE_BUFFER, 0, 0, old_buffer_size);
But it looks like its finally sending previous and new data in the new buffer instead of caching and sending only the latest data. So I am not sure if its the best way. Please correct me if I am wrong or suggest alternative.
So you store some data in your CPU memory, and you append more data to this storage. Then you want to send only the appended data to GPU, not the whole buffer.
Your code example is irrelevant for this task, as glCopyBufferSubData copies data from a location in GPU memory to another location in GPU memory again.
You need a combination of glBufferData and glBufferSubData. glBufferData allocates memory in GPU and optinoaly initializes it. glBufferSubData writes some data to already allocated GPU buffer. You may treat glBufferData as C's malloc or C++ new, while glBufferSubData is like a special version C's memcpy or C++ std::copy. More precisely, glBufferSubData is memcpy from CPU to GPU, and glCopyBufferSubData is memcpy from GPU to GPU.
How to cook them together? The same way as in C. Call glBufferData once at initialization time (when program starts), and call glBufferSubData when you need to append data. Be sure to allocate enough space! A buffer allocated by glBufferData does not grow, as well as malloced buffer. Overflowing a buffer with glBufferSubData causes undefined behavior and may crash your application.
Try to predict space requirement for your buffer, and call glBufferData only if your data does not fit into the buffer.
Remember that calling glBufferData with already allocated buffer binding will deallocate existing buffer and create a new one.
glBufferSubData will not reallocate your buffer, but will overwrite data which is already there.
Let me illustrate it with C translation:
glGenBuffers(..., buf); // float* buf;
glBindBuffer(buf); // Tell opengl that we will use buf pointer, no analog in C.
glBufferData(/*non-null pointer*/); // buf = malloc(/*..*/); memcpy(to_gpu, from_cpu);
glBufferData(/*another non-null pointer*/); // free(buf); buf = malloc(/*..*/); memcpy(to_gpu, from_cpu);
glBufferSubData(...); // memcpy(to_gpu, from_cpu);
Ideomatic approach
What you need is:
glGenBuffers(..., buf); // float* buf;
glBindBuffer(buf); // Tell opengl that we will use buf pointer, no analog in C.
// Initialization
glBufferData(/*non-null pointer*/); // buf = malloc(/*..*/); memcpy(to_gpu, from_cpu);
// Hot loop
while (needToRender) {
if(needToAppend) {
if (dataDoesNotFit) glBufferData(...); // Reallocate, same buffer name
else glBufferSubData(...); // memcpy(to_gpu, from_cpu);
}
}
Here we reallocate memory only occasionally, when we need to append something and buffer is too small.
Other approaches
I advised to reallocate with glBufferData as you already have all data in a single buffer on CPU. If not (i.e. you have a chunk of data on GPU and another chunk on CPU, but not together), you could use glCopyBufferSubData for reallocating:
glBufferData(/*alloc new_gpu_buffer*/);
glCopyBufferSubData(/*from old_gpu_buffer to new_gpu_buffer*/);
glDeleteBuffers(/*old_gpu_buffer*/);
glBufferSubData(/*from_cpu_buffer to new_cpu_buffer*/)p; // Add some new data from CPU.
Another approach for updating GPU data is mapping it to CPU, so you just access GPU memory by pointer. It's likely to be slow (blocks the buffer, stalls the pipeline), and is useful only in special cases. Use it if you know what you do.
Since OpenGL is an API focused on drawing things (ignoring compute shaders for the moment) and when drawing a scene you normally start from an empty canvas, you'll have to retain the complete backlog of point cloud data throughout for the whole span of time, you want to be able to redraw.
Assuming that for large amounts of point cloud data, redrawing the whole set might take some time, some form of cachine might seem reasonable. But let's do some back of the envelope calculateions first:
Typical GPUs these days are perfectly capable of performing full vertex setup at a rate well over 10^9 vertices / second (already 20 years ago GPUs were able to do something on the order of 20·10^6 vertices / second). Your typical computer display has less than 10·10^6 pixels. So because of the pigeonhole principle, if you were to draw more than 10·10^6 points you're either producing serious overdraw or fill up most of the pixels; in practice it's going to be somewhere inbetween.
But as we've already seen, GPUs are more than capable of drawing that many points at interactive framerates. And drawing any more of them will likely fill up your screen or occlude data.
Some form of data retirement is required, if you want the whole thing to remain readable. And for any size of pointcloud that is readable your GPU will be able to redraw the whole thing just fine.
Considering the need for data retirement, I suggest you allocate a large buffer, that is able to hold a whole set of points over their lifetime, before eviction, and use it as a circular round robin buffer: Have an offset where you write over new data as it arrives (using glBufferSubData), at the edges you may have to split this in two calls, pass the latest writing index as a uniform, to fade out points by their age, and then just submit a single glDrawElements call to draw the whole content of that buffer in one go.
Are there any benefits of having separate vertex buffers for static and dynamic objects in a DirectX 11 application? My approach is to have the vertices of all objects in a scene stored in the same vertex buffer.
However, I will only have to re-map a small number of objects (1 to 5) of the whole collection (up to 200 objects). The majority of objects are static and will not be transformed in any way. What is the best approach for doing this?
Mapping a big vertex buffer with discard forces the driver to allocate new memory every frame. Up to ~4 frames can be in flight, so there can be 4 copies of that buffer. This can lead to memory overcommitment and stuttering. For example, ATI advises to discard vertex buffers up to 4 mb max (GCN Performance Tweets). Besides, every time you will have to needlessly copy static data to a new vertex buffer.
Mapping with no overwrite should work better. It would require to manually manage the memory, so you won't overwrite the data which is in flight. I'm not sure about the performance implications, but for sure this isn't a recommended path.
Best approach would be to simplify driver's work by providing as many hints as possible. Create static vertex buffers with immutable flag, long lived with default flag and dynamic with dynamic flag. See vendor guides like GCN Performance Tweets or Don’t Throw it all Away: Efficient Buffer Management for additional details.
I'm just wondering: when (and maybe how) to clear the data from a VBO. Do you have to clear it always before rewriting the data? Why clear it?
Clearing the buffer (i.e. setting each byte to 0) isn't too useful. Invalidating the buffer is.
Invalidating a section of a buffer means that the contents of that section become invalid, and you must write new content to that section before using it. This allows the OpenGL implementation to avoid waiting until the buffer object is no longer being used in order to upload data to it by giving you a completely 'new' buffer to write to (under the same name). This technique is called buffer orphaning.
To invalidate a buffer, you can either call glBufferData with the same size and usage hints, but with a NULL data pointer, use glMapBufferRange with the GL_MAP_INVALIDATE_BUFFER_BIT, or glInvalidateBufferData if your GPU supports it.
The OpenGL Wiki article for Buffer Object Streaming covers this in more detail, and also offers several other solutions.
To directly answer your question, it is not required that you invalidate or clear a buffer before updating it. You can call glBufferSubData whenever you want to update whatever contents you want. However, doing so without invalidation may cause a pipeline stall as OpenGL waits for the buffer to finish being used before safely updating it.
I have a need to stream a texture (essentially a camera feed).
With object streaming, the following scenarios seem to be arise:
Is the new object's data store larger, smaller or same size as the old one?
Subset of or whole texture being updated?
Are we streaming a buffer object or texture object (any difference?)
Here are the following approaches I have come across:
Allocate object data store (either BufferData for buffers or TexImage2D for textures) and then each frame, update subset of data with BufferSubData or TexSubImage2D
Nullify/invalidate the object after the last call (eg. draw) that uses the object either with:
Nullify: glTexSubImage2D( ..., NULL), glBufferSubData( ..., NULL)
Invalidate: glBufferInvalidate(), glMapBufferRange with the GL_MAP_INVALIDATE_BUFFER_BIT, glDeleteTextures ?
Simpliy reinvoke BufferData or TexImage2D with the new data
Manually implement object multi-buffering / buffer ping-ponging.
Most immediately, my problem scenario is: entire texture being replaced with new one of same size. How do I implement this? Will (1) implicitly synchronize ? Does (2) avoid the synchronization? Will (3) synchronize or will a new data store for the object be allocated, where our update can be uploaded without waiting for all drawing using the old object state to finish? This passage from the Red Book V4.3 makes be believe so:
Data can also be copied between buffer objects using the
glCopyBufferSubData() function. Rather than assembling chunks of data
in one large buffer object using glBufferSubData(), it is possible to
upload the data into separate buffers using glBufferData() and then
copy from those buffers into the larger buffer using
glCopyBufferSubData(). Depending on the OpenGL implementation, it may
be able to overlap these copies because each time you call
glBufferData() on a buffer object, it invalidates whatever contents
may have been there before. Therefore, OpenGL can sometimes just
allocate a whole new data store for your data, even though a copy
operation from the previous store has not completed yet. It will then
release the old storage at a later opportunity.
But if so, why the need for (2)[nullify/invalidates]?
Also, please discuss the above approaches, and others, and their effectiveness for the various scenarios, while keeping in mind atleast the following issues:
Whether implicit synchronization to object (ie. synchronizing our update with OpenGL's usage) occurs
Memory usage
Speed
I've read http://www.opengl.org/wiki/Buffer_Object_Streaming but it doesn't offer conclusive information.
Let me try to answer at least a few of the questions you raised.
The scenarios you talk about can have a great impact on the performance on the different approaches, especially when considering the first point about the dynamic size of the buffer. In your scenario of video streaming, the size will rarely change, so a more expensive "re-configuration" of the data structures you use might be possible. If the size changes every frame or every few frames, this is typically not feasable. However, if a resonable maximum size limit can be enforced, just using buffers/textures with the maximum size might be a good strategy. Neither with buffers nor with textures you have to use all the space there is (although there are some smaller issues when you do this with texures, like wrap modes).
3.Are we streaming a buffer object or texture object (any difference?)
Well, the only way to efficiently stream image data to or from the GL is to use pixel buffer objects (PBOs). So you always have to deal with buffer objects in the first place, no matter if vertex data, image data or whatever data is to be tranfered. The buffer is just the source for some glTex*Image() call in the texture case, and of course you'll need a texture object for that.
Let's come to your approaches:
In approach (1), you use the "Sub" variant of the update commands. In that case, (parts of or the whole) storage of the existing object is updated. This is likely to trigger an implicit synchronziation ifold data is still in use. The GL has basically only two options: wait for all operations (potentially) depending on that data to complete, or make an intermediate copy of the new data and let the client go on. Both options are not good from a performance point of view.
In approach (2), you have some misconception. The "Sub" variants of the update commands will never invalidate/orphan your buffers. The "non-sub" glBufferData() will create a completely new storage for the object, and using it with NULL as data pointer will leave that storage unintialized. Internally, the GL implementation might re-use some memory which was in use for earlier buffer storage. So if you do this scheme, there is some probablity that you effectively end up using a ring-buffer of the same memory areas if you always use the same buffer size.
The other methods for invalidation you mentiond allow you to also invalidate parts of the buffer and also a more fine-grained control of what is happening.
Approach (3) is basically the same as (2) with the glBufferData() oprhaning, but you just specify the new data directly at this stage.
Approach (4) is the one I actually would recommend, as it is the one which gives the application the most control over what is happening, without having to relies on the GL implementation's specific internal workings.
Without taking synchronization into account, the "sub" variant of the update commands is
more efficient, even if the whole data storage is to be changed, not just some part. That is because the "non-sub" variants of the commands basically recreate the storage and introduce some overhead with this. With manually managing the ring buffers, you can avoid any of that overhead, and you don't have to rely in the GL to be clever, by just using the "sub" variants of the updates functions. At the same time, you can avoid implicit synchroniztion by only updating buffers which aren't in use by th GL any more. This scheme can also nicely be extenden into a multi-threaded scenario. You can have one (or several) extra threads with separate (but shared) GL contexts to fill the buffers for you, and just passing the buffer handlings to the draw thread as soon as the update is complete. You can also just map the buffers in the draw thread and let the be filled by worker threads (wihtout the need for additional GL contexts at all).
OpenGL 4.4 introduced GL_ARB_buffer_storage and with it came the GL_MAP_PERSISTEN_BIT for glMapBufferRange. That will allow you to keep all of the buffers mapped while they are used by the GL - so it allows you to avoid the overhead of mapping the buffers into the address space again and again. You then will have no implicit synchronzation at all - but you have to synchronize the operations manually. OpenGL's synchronization objects (see GL_ARB_sync) might help you with that, but the main burden on synchronization is on your applications logic itself. When streaming videos to the GL, just avoid re-using the buffer which was the source for the glTexSubImage() call immediately and try to delay its re-use as long as possible. You are of course also trading throughput for latency. If you need to minimize latency, you might to have to tweak this logic a bit.
Comparing the approaches for "memory usage" is really hard. There are a lot of of implementation specific details to consider here. A GL implementation might keep some old buffer memories around for some time to fullfill recreation requests of the same size. Also, an GL implementation might make shadow copies of any data at any time. The approaches which don't orphan and recreate storages all the time in principle expose more control of the memory which is in use.
"Speed" itself is also not a very useful metric. You basically have to balance throughput and latency here, according to the requirements of your application.
I have just started learning about vertex buffer objects in C++. I am reading a book about OpenGL that says that VBO rendering is more efficient than other forms of rendering because the data is stored on the GPU instead of on the heap. However, I am confused how this could be if you still have to load an array of data from the heap to the GPU. Every few seconds, I update the vertex data of my program, which means that I must then use glBufferData() to refresh the data to update to the new state. I don't see how this is more efficient than just rendering the array normally. I was wondering if I am calling glBufferData() more than is necessary, or if there is a better way to update the vertex data directly on the GPU.
Well, glBufferData (...) does more than you think. True it supplies data to a VBO, but the more important point is that it allocates memory on the server side (GPU for all intents and purposes) for vertex storage.
In your example, the number of vertices, and therefore size required to store them does not seem to change when you refresh your data. What you should actually be doing is calling glBufferSubData (...) to update the data without re-allocating space for it. Coupled with a correct usage flag (e.g. GL_DYNAMIC_DRAW) this can be much more efficient than copying from client to server everytime something is drawn.
Think of glBufferData (...) as a combination of malloc (...) and memcpy (...). glBufferSubData (...) on the other hand is memcpy (...). To this, end you can even do memory mapping of VBOs into your application's address space without having to allocate storage in both the client and server using glMapBuffer (...) and glUnmapBuffer (...), which are analogous to mmap (...) and munmap (...).
You should try to avoid modifying your vertex data every few frames. Vertex/fragment shaders are specifically there to allow you to modify your geometry on the fly, with some limitations of course.
However, in the simplest case (if you don't care about maximizing your performance), it is entirely possible to rewrite the buffer on every frame, and it should still beat calling glBegin..glEnd for every object.