The title says everything, but just to be clear I'll add some extra words.
In this case, resize means:
getting more storage space at the end of the old vbo
saving the old data at the front
(hopefully not copying, but at least not on CPU side, meaning the driver should handle this)
EDIT
As to explain some more details and justify my question:
I will store data of (in forehand) unknown size to the VBO but I only know an upper limit that is a very rough estimation (10 - 100x as much or even more in unusual conditions).
Of course I know how much data I stored, when I am done with it, so it would be nice to store data until I find my VBO too small and resize it and then go on storing.
Here is why I don't want to copy(especially not on CPU side):
I am doing all this on the GPU to get interactive frame rates. When I have to copy it is very slow or even not possible, because there is not enough space. Worst of all is to copy the data over the CPU, hence passing everything over the bus, into a new memory region that has sufficient size, then glBufferDataing the VBO with new size and the new memory region as source. That would be the performance killer.
circumvented
I circumvented the problem with an exact estimation of the needed space. But I will let this question be unanswered for a week to see if someone has another hint on this as I am not very happy with the solution.
I think without doing a copy you won't get around this, because the only way to resize a buffer is to call glBufferData and there is IMO no way to tell the driver to keep the old data.
What you probably can do to at least not copy it to the CPU and back again, is creating some kind of auxiliary VBO for these purposes and copy directly from the VBO into the auxiliary VBO (using the ARB_copy_buffer extension), resize the VBO and copy its contents back.
But I think the best way is just to allocate a larger buffer beforehand, so the resize is not neccessary, but of course in this case you need to know approximately how much extra storage you need.
Revisiting this question after some years, the landscape has changed a bit with newer versions and extensions.
GPU Side Copying
The extension mentioned in Christian Rau's answer is core since 3.1 which allows us to copy contents (via glCopyBufferSubData) from one VBO to another. Hopefully, the driver does this on the GPU side!
Using this function we could create a larger buffer and copy the leading data over. This has the disadvantage of doubling the memory requirements because we need both buffers.
True resizing
The good news is: With sparse buffers an even better solution is on the horizon.
Given this extension we can allocate a virtual buffer with more than enough space for our data without ever paying for the unneeded space. We only have to "commit" the regions of memory we physically want to store data in. This means we can "grow" the VBO by committing new pages at the end of it.
The bad news is: As of the current OpenGL version (4.5) this is still an extension and not yet core, so adopting this might not be possible. You should also not that there are some details in the specification that are not yet worked out. For example, mapping of sparse buffers is disallowed in the current extension but support might be added in future versions.
I would be keen to hear about the availability of this extension if you have any data on that.
Assuming you have support for a recent OpenGL standard, an alternative to VBOs might be to store your data in textures ( again, assuming you have enough memory on your card ). Copying data between old and new textures would take place on the card and not affect the data transfer.
Exactly how you achieve this depends on exactly what your code is doing. But in principle, you use texture data to overwrite dummy vertex data in your drawing calls, or maybe use instancing. It would require a lot of thought and rework.
Related
Let's imagine that I have two squeres. Firstly I generate the VAO, VBO, then bind it and so on... My goal is to check the collision between the two objects in every frame. In this case, I have to know the exact verticies both the cpu and the gpu side. So I store every single vertex twice. If I work with large amount of data, the mirroring not seems to be efficient, not to mention the logistic about keeping the data consistent. Is there a better way to do this? Or this is totally OK, to keep the verticies in an array after the glBufferData call?
There needs to be more information on your part. Is this something you plan on instancing? Are you sure there's a bottleneck on your bandwidth?
If this is something that you can simulate on the GPU, then just do it all on the GPU so you keep the memory on that side and not incur the transfer penalty from CPU to GPU.
If you need it to be on the CPU side for your collision detection, then you have a few options:
Update the ones that change. If all of them are changing you should ignore this option, but you could map the buffer and update it and try to flush it only after you've updated what ranges are needed.
Send a displacement. If you end up having a ton of data, you may be able to get away with just sending a rotation and central position to cut down on updating "every vertex", and might be able to exploit a Geometry Shader... however I've read that these can be problematic for performance so you should consider it but be ready to profile.
You can possibly stream data if you have to update all of them, see this wonderful resource.
You need to define your problem domain a bit more because I'm not sure exactly what the bounds on the problem are. The above are some ways of tackling these problems, but the best solution can only be given to you if you are more specific with what you want when you talk about the large case.
You also have to understand that asking for massive data manipulation and fast transfer tend to be topics that fight each other, and you'll have to be smarter about what you plan on doing depending on exactly how much data you are talking about here.
I'd like to answer with something more concrete but I'm just shooting in the dark because I don't know exactly what the limit of your data is and what hardware you're working with.
this is my first question, so if you have any suggestions on how to improve the question, feel free to tell me :)
So my problem is this: I have an object that changes each frame, and this results in a varying number of vertices that I plan to send through my pipeline. Now, I obviously can't change the size of my vertex buffer on the fly, so what is the best way to approach this?
Here are some of the ideas I had:
Make a vertex buffer of size n, and simply pass the vertices in bunches of n. Downsides: some vertices go through the VS multiple times, so a small decrease in performance (probably not noticeable), having to call Map/Unmap several times per frame, which could be problematic.
Make a huge vertex buffer that will hold enough vertices that my object will never exceed the size of the buffer. Downsides: since I don't know in advance how many vertices my object will have, it might be hard to predict the size of this buffer. In addition, the buffer might take up too much space as a result.
Each frame, create a new vertex buffer of the correct size, and call IASetVertexBuffer. Downsides: This will probably have a huge performance issue...
I'd appreciate any ideas your guys have, or suggestions on which one of these three to use! :)
1/This is not ideal, since as you said you need to map/unmap several times, and do draw calls in between. You do a trade off to lose performance for less memory usage, on a modern card (count in gigabytes of data), memory is unlikely to be an issue (as mentioned in comment).
2/As opposite of 1, you need a single Map/Unmap, and a single draw (you can also specify vertices count in DeviceContext->Draw to make sure you only draw relevant part of your buffer). This will be your best choice performance wise, and it should not be too hard to have some form of maximum defined (even 1 million poly is not so much memory, and you'll have quite a hard time to have your CPU feeding that amount of data every frame).
3/I don't see any good benefit in your use case (recreating resources is common when you do async load for some immutable resources, so it doesn't really apply here).
So go for 2, if one day memory becomes an issue it's quite easy to move back to 1, but I doubt this will ever happen.
This question comes in two (mostly) independent parts
My current setup is that I have a lot of Objects in gamespace. Each has a VBO assigned to it, which holds Vertex Attribute data for each vertex. If the Object wants to change its vertex data (position etc) it does so in an internal array and then call glBufferSubDataARB to update the version in the GPU.
Now I understand that this is a horrible thing to do and so I am looking for alternatives. One that presents itself is to have some managing thing that has a large VBO in the beginning and Objects can request space from it, and edit points in it. This drops the overhead of loading VBOs but comes with a large energy/time expenditure in creating and debugging such a beast (basically an entire memory management system).
My question (part (a)) is if this is the "best" method for doing this, or if there is something better that I have not thought of.
Such a system should allow easy addition/removal of vertices and editing them, as fast as possible.
Part (b) is about some simple actions taken on every object, ie those of rotation and translation. At the moment I am moving each vertex (ouch), but this must have a better option. I am considering uploading rotation and translation matrices to my shader to do there. This seems fine, but I am slightly worried about the overhead of changing uniform variables. Would it ultimately be to my advantage to do this? How fast is changing uniform variables?
Last time I checked the preferred way to do buffer updates was orphaning.
Basically, whenever you want to update your buffered data, you call glBindBuffer on your buffer, which invalidates the current content of the buffer, and then you write your new data with glMapBuffer / glBufferSubdata.
Hence:
Use a single big VBO for your static data is indeed a good idea. You must take care of the maximum allowed VBO size, and split your static data into multiple VBOs if necessary. But this is probably an over-optimization in most cases (i.e. "I wouldn't bother").
Data which is updated frequently should be grouped in the same VBO (with usage = GL_STREAM_DRAW), and you shall use orphaning to update that.
Unfortunately, the actual performance of this stuff varies on different implementations. This guy made some tests on an actual game, it may be worth reading.
For the second part of your question, obviously using uniforms is the way to do it. Yes there is some (little) overhead, but it's sure 1000 times better than streaming all your data at every frame.
I'm asking this question because I don't want to spend time writing some code that duplicates functionalities of the OpenGL drivers.
Can the OpenGL driver/server hold more data than the video card? Say, I have enough video RAM to hold 10 textures. Can I ask OpenGL to allocate 15 textures without getting an GL_OUT_OF_MEMORY error?
If I can rely on the driver to cleverly send the textures/buffers/objects from the 'normal' RAM to the video RAM when needed then I don't really need to Gen/Delete these objects myself. I become limited by the 'normal' RAM which is often plentiful when compared to the video RAM.
The approach "memory is abundant so I don't need to delete" is bad, and the approach "memory is abundant, so I'll never get out of memory errors" is flawed.
OpenGL memory management is obscure, both for technical reasons (see t.niese's comment above) and for ideological reasons ("you don't need to know, you don't want to know"). Though there exist vendor extensions (such as ATI_meminfo) that let you query some non-authorative numbers (non-authorative insofar as they could change the next millisecond, and they do not take effects like fragmentation into account).
Generally, for the most part, your assumption that you can use more memory than there is GPU memory is correct.
However, you are not usually not able to use all available memory. More likely, there is a limit well below "all available RAM" due to constraints on what memory regions (and how large regions) the driver can allocate, lock, and DMA to/from. And even though you can normally use more memory than will fit on the GPU (even if you used it exclusively), this does not mean careless allocations can't and won't eventually fail.
Usually, but not necessarily, you consume as much system memory as GPU memory, too (without knowing, the driver does that secretly). Since the driver swaps resources in and out as needed, it needs to maintain a copy. Sometimes, it is necessary to keep 2 or 3 copies (e.g. when streaming or for ARB_copy_buffer operations). Sometimes, mapping a buffer object is yet another copy in a specially allocated block, and sometimes you're allowed to write straight into the driver's memory.
On the other hand, PCIe 2.0 (and PCIe 3.0 even more so) is fast enough to stream vertices from main memory, so you do not even strictly need GPU memory (other than a small buffer). Some drivers will stream dynamic geometry right away from system memory.
Some GPUs do not even have separate system and GPU memory (Intel Sandy Bridge or AMD Fusion).
Also, you should note that deleting objects does not necessarily delete them (at least not immediately). Usually, with very few exceptions, deleting an OpenGL object is merely a tentative delete which prevents you from further referencing the object. The driver will keep the object valid for as long as it needs to.
On the other hand, you really should delete what you do not need any more, and you should delete early. For example, you should delete a shader immediately after attaching it to the program object. This ensures that you do not leak resources, and it is guaranteed to work. Deleting and re-specifying the in-use vertex or pixel buffer when streaming (by calling glBufferData(... NULL); is a well-known idiom. This only affects your view of the object, and it allows the driver to continue using the old object in parallel for as long as it needs to.
Some additional information to my comment that did not fit in there.
There are different reasons why this is not part of OpenGL.
It isn't an easy task for the system/driver to guess which resources are and will be required. The driver for sure could create an internal heuristic if resource will be required often or rarely (like CPU does for if statements and doing pre executing code certain code parts on that guess). But the GPU will not know (without knowing the application code) what resource will be required next. It even has no knowledge where the geometry is places in the scene (because you do this with you model and view martix you pass to your shader yourself)
If you e.g. have a game where you can walk through a scene, you normally won't render the parts that are out of the view. So the GPU could think that these resources are not required anymore, but if you turn around then all this textures and geometry is required again and needs to be moved from system memory to gpu memory, which could result in really bad performance. But the Game Engine itself has, because of the use of octrees (or similar techniques) and the possible paths that can be walked, an in deep knowledge about the scene and which resource could be removed from the GPU and which one could be move to the GPU while playing and where it would be necessary to display a loading screen.
If you look at the evolution of OpenGL and which features become deprecated you will see that they go to the direction to remove everything except the really required features that can be done best by the graphic card, driver and system. Everything else is up to the user to implement on it's own to get the best performance. (you e.g. create your projection matrix yourself to pass it to the shader, so OpenGl even does not know where the object is placed in the scene).
Here's my TL;DR answer, I recommend reading Daemon's and t.niese's answers as well:
Can the OpenGL driver/server hold more data than the video card?
Yes
Say, I have enough video RAM to hold 10 textures. Can I ask OpenGL to allocate 15 textures without getting an GL_OUT_OF_MEMORY error?
Yes. Depending on the driver / GPU combination it might even be possible to allocate a single texture that exceeds the GPU's memory, and actually use it for rendering. At my current occupation I exploit that fact to extract slices of arbitrary orientation and geometry from large volumetric datasets, using shaders to apply filters on the voxel data in situ. Works well, but doesn't work for interactive frame rates.
I've started to use Pixel Buffer Objects and while I understand how to use them and the gist of what they're doing, I really don't know what's going on under the hood. I'm aware that the OpenGL spec allows for leeway in regards to the exact implementation, but that's still beyond me.
So far as I understand, the Buffer Object typically resides server side in GRAM; though this apparently may vary depending on target and usage. This makes perfect sense as this would be why OpenGL calls on the BOs would operate so fast. But in what such instances would it reside in AGP or system memory? (side question: does PCI-e have an equivalent of AGP memory?)
Also, glMapBuffers() returns a pointer to a block of memory of the BO so the data may be read/written/changed. But how is this done? The manipulations are taking place client side, so the data still has to go from server to client some how. If it is, how is is better than glReadPixels()?
PBOs are obviously better than glReadPixels() as is obvious by the performance difference, I just don't understand how.
I haven't used FBOs yet, but I've heard they're better to use. Is this true? if so, why?
I can't tell you in what memory the buffer object will be allocated. Actually you mostly answered that question yourself, so you can hope that a good driver will actually do it this way.
glMapBuffer can be implemented the same way as memory mapped files. Remember the difference between physical memory and virtual address space: when you write to a memory location, the address is mapped through a page table to a physical location. If the required page is marked as swapped out an interrupt occurs and the system loads the required page from the swap to the RAM. This mechanism can be used to map files and other resources (like GPU memory) to your process's virtual address space. When you call glMapBuffer, the system allocates some address range (not memory, just addresses) and prepares the relevant entries in page table. When you try to read/write to these addresses the system loads/sends it to the GPU. Of course this would be slow, so some buffering is done on the way.
If you constantly transfer data between CPU and GPU, I doubt that PBOs will be faster. They are faster when you make many manipulations on the GPU (like load from frame buffer, change a few texels with CPU and use it as a texture again on the GPU). Well, they can be faster in case of integrated graphics processor or AGP memory, because in that case glMapBuffer can map the addresses directly to the physical memory, effectively eliminating one copy operation.
Are FBOs better? For what? They are better when you need to render to texture. That's again because they eliminate one data copy operation.