OpenGL error handling best practices - opengl

Checking OpenGL error state after OpenGL calls in debug builds can be an invaluable tool for finding bugs in OpenGL code. But what about errors like running out of memory when allocating textures or other OpenGL resources? What are the best practices on handling or avoiding errors like these?
An OpenGL resource allocation failure would probably be fatal in most cases so should a program just try to allocate a reasonable amount of resources and hope for the best? What kind of approaches are used in real world projects on different platforms, e.g., on PC and on mobile platforms?

Running out of memory when allocating resources for textures and vertex buffers is on the rare side these days. When you would run into this sort of situation you should already know that you are approaching limitations for your system requirements, and have a resource manager smart enough to deal with it.
In the PC spectrum, the amount of available memory is becoming less relevant and harder to define. Textures are becoming virtualized resources, where portions of them are only fetched and stored in local (GPU) memory when a specific sub-region is referenced in a shader (Sparse Textures in OpenGL 4.4 terms, or Tiled Resources in D3D 11.2 terms). You may also hear this feature referred to as Partially Resident Textures, and that is the term I like to use most often.
Since Partially Resident Textures (PRT) are an architectural trend on DX 11.2+ PC hardware and a key feature of the Xbox One / PS4 the amount of available memory will be less and less of an application terminating event. It will be more of a performance hitch when page faults have to be serviced (e.g. memory for part of a texture is referenced for the first time), and care will have to be taken to try and minimize thrashing. This is really not much different from the situation 10 years ago, except that instead of a texture either being completely resident or completely non-resident now individual tiles in a texture atlas or mipmap levels may have different states. The way that memory faults are handled can actually open up doors for more efficient procedurally generated content and streaming from optical / network based storage.
Having said that, virtualizing memory resources is not the most efficient way to approach real-time applications and/or embedded applications. Extra hardware is usually needed to handle memory mapping, and extra latency is introduced when a memory fetch for a non-resident resource is issued. In the mobile domain I doubt PRTs are going to change a whole lot, here you will still benefit from lower-level memory management and things like proxy textures before texture allocation; unfortunately OpenGL ES does not even support proxy textures.
Your resource manager should be designed to keep a running tab of the memory allocated for all types of resources. It will not be completely accurate, because OpenGL hides a lot of details from you but it will give you a big picture. You will be able to see immediately that switching from an RGBA16F render buffer to a an RGBA8 saves you X-many bytes of memory or eliminating 1 vertex attribute from one of your vertex buffers changes storage requirements for instance. You can insert your own checks when allocating resources and handle them as assertion failures, etc. at run-time. Better to define and monitor your own thresholds than to have OpenGL complain only AFTER it cannot satisfy a memory request.

There's no "one size fits it all" approach for this. It all depends on the application and how critical it is. The general rule is: Whereever possible fail gracefully and safe.
In the case of a game a preferable course of action would be to save a snapshot of the current game state (it's a good idea to add autosave spots prior and right after critical points) terminate the game process and show the user a understandable reason for the failure; and if there's a save game assure him his progress is not lost.
In the case of a medical diagnostics system inform the user that the graphics display has become corrupt and that he must not use what is currently visible on screen for any further diagnostic purposes.
In the case of a flight controller display, a medical treatment system or similar applications where total failure is not an options, your system must be build in a way that any partial failure the failing part will be isolated and there are enough redundancies and backups that operations can commence normally.
Flight controller displays for example are not fed by a single computer, but each display has (IIRC) three independently operating computers, producing identical output their programming differs so that a programming failure in one of the computers will create an inconsistency with the other 2. Each computer feed its internal state into an arbiter which makes sure that all computers agree on their data. The display signal itself is fed through a further independent comparing arbiter which compares the display output and would disable the offending systems output in case of failure as well.

Related

What can Vulkan do specifically that OpenGL 4.6+ cannot?

I'm looking into whether it's better for me to stay with OpenGL or consider a Vulkan migration for intensive bottlenecked rendering.
However I don't want to make the jump without being informed about it. I was looking up what benefits Vulkan offers me, but with a lot of googling I wasn't able to come across exactly what gives performance boosts. People will throw around terms like "OpenGL is slow, Vulkan is way faster!" or "Low power consumption!" and say nothing more on the subject.
Because of this, it makes it difficult for me to evaluate whether or not the problems I face are something Vulkan can help me with, or if my problems are due to volume and computation (and Vulkan would in such a case not help me much).
I'm assuming Vulkan does not magically make things in the pipeline faster (as in shading in triangles is going to be approximately the same between OpenGL and Vulkan for the same buffers and uniforms and shader). I'm assuming all the things with OpenGL that cause grief (ex: framebuffer and shader program changes) are going to be equally as painful in either API.
There are a few things off the top of my head that I think Vulkan offers based on reading through countless things online (and I'm guessing this certainly is not all the advantages, or whether these are even true):
Texture rendering without [much? any?] binding (or rather a better version of 'bindless textures'), which I've noticed when I switched to bindless textures I gained a significant performance boost, but this might not even be worth mentioning as a point if bindless textures effectively does this and therefore am not sure if Vulkan adds anything here
Reduced CPU/GPU communication by composing some kind of command list that you can execute on the GPU without needing to send much data
Being able to interface in a multithreaded way that OpenGL can't somehow
However I don't know exactly what cases people run into in the real world that demand these, and how OpenGL limits these. All the examples so far online say "you can run faster!" but I haven't seen how people have been using it to run faster.
Where can I find information that answers this question? Or do you know some tangible examples that would answer this for me? Maybe a better question would be where are the typical pain points that people have with OpenGL (or D3D) that caused Vulkan to become a thing in the first place?
An example of answer that would not be satisfying would be a response like
You can multithread and submit things to Vulkan quicker.
but a response that would be more satisfying would be something like
In Vulkan you can multithread your submissions to the GPU. In OpenGL you can't do this because you rely on the implementation to do the appropriate locking and placing fences on your behalf which may end up creating a bottleneck. A quick example of this would be [short example here of a case where OpenGL doesn't cut it for situation X] and in Vulkan it is solved by [action Y].
The last paragraph above may not be accurate whatsoever, but I was trying to give an example of what I'd be looking for without trying to write something egregiously wrong.
Vulkan really has four main advantages in terms of run-time behavior:
Lower CPU load
Predictable CPU load
Better memory interfaces
Predictable memory load
Specifically lower GPU load isn't one of the advantages; the same content using the same GPU features will have very similar GPU performance with both of the APIs.
In my opinion it also has many advantages in terms of developer usability - the programmer's model is a lot cleaner than OpenGL, but there is a steeper learning curve to get to the "something working correctly" stage.
Let's look at each of the advantages in more detail:
Lower CPU load
The lower CPU load in Vulkan comes from multiple areas, but the main ones are:
The API encourages up-front construction of descriptors, so you're not rebuilding state on a draw-by-draw basis.
The API is asynchronous and can therefore move some responsibilities, such as tracking resource dependencies, to the application. A naive application implementation here will be just as slow as OpenGL, but the application has more scope to apply high level algorithmic optimizations because it can know how resources are used and how they relate to the scene structure.
The API moves error checking out to layer drivers, so the release drivers are as lean as possible.
The API encourages multithreading, which is always a great win (especially on mobile where e.g. four threads running slowly will consume a lot less energy than one thread running fast).
Predictable CPU load
OpenGL drivers do various kinds of "magic", either for performance (specializing shaders based on state only known late at draw time), or to maintain the synchronous rendering illusion (creating resource ghosts on the fly to avoid stalling the pipeline when the application modifies a resource which is still referenced by a pending command).
The Vulkan design philosophy is "no magic". You get what you ask for, when you ask for it. Hopefully this means no random slowdowns because the driver is doing something you didn't expect in the background. The downside is that the application takes on the responsibility for doing the right thing ;)
Better memory interfaces
Many parts of the OpenGL design are based on distinct CPU and GPU memory pools which require a programming model which gives the driver enough information to keep them in sync. Most modern hardware can do better with hardware-backed coherency protocols, so Vulkan enables a model where you can just map a buffer once, and then modify it adhoc and guarantee that the "other process" will see the changes. No more "map" / "unmap" / "invalidate" overhead (provided the platform supports coherent buffers, of course, it's still not universal).
Secondly Vulkan separates the concept of the memory allocation and how that memory is used (the memory view). This allows the same memory to be recycled for different things in the frame pipeline, reducing the amount of intermediate storage you need allocated.
Predictable memory load
Related to the "no magic" comment for CPU performance, Vulkan won't generate random resources (e.g. ghosted textures) on the fly to hide application problems. No more random fluctuations in resource memory footprint, but again the application has to take on the responsibility to do the right thing.
This is at risk of being opinion based. I suppose I will just reiterate the Vulkan advantages that are written on the box, and hopefully uncontested.
You can disable validation in Vulkan. It obviously uses less CPU (or battery\power\noise) that way. In some cases this can be significant.
OpenGL does have poorly defined multi-threading. Vulkan has well defined multi-threading in the specification. Meaning you do not immediately lose your mind trying to code with multiple threads, as well as better performance if otherwise the single thread would be a bottleneck on CPU.
Vulkan is more explicit; it does not (or tries to not) expose big magic black boxes. That means e.g. you can do something about micro-stutter and hitching, and other micro-optimizations.
Vulkan has cleaner interface to windowing systems. No more odd contexts and default framebuffers. Vulkan does not even require window to draw (or it can achieve it without weird hacks).
Vulkan is cleaner and more conventional API. For me that means it is easier to learn (despite the other things) and more satisfying to use.
Vulkan takes binary intermediate code shaders. While OpenGL used not to. That should mean faster compilation of such code.
Vulkan has mobile GPUs as first class citizen. No more ES.
Vulkan have open source, and conventional (GitHub) public tracker(s). Meaning you can improve the ecosystem without going through hoops. E.g. you can improve\implement a validation check for error that often trips you. Or you can improve the specification so it does make sense for people that are not insiders.

Does OpenGL takes care of GPU memory fragmentation?

So basically whenever I create buffer objects Opengl allocates some memory on the GPU.
Consider scenario 1 where I generate 2 uniform buffers for 2 uniform variables.
Now consider scenario 2 where I create a single buffer and enclose the 2 uniform variables inside an interface block.
My understanding is that for scenario 1, two separate regions of memory get allocate while for scenario 2, one big contiguous block of memory gets allocated. If so, then Scenario 1 might be susceptible to memory fragmentation and if this happens is it managed by OpenGL or something else OR should we keep this in mind before writing performance critical code?
Actually I have to fix that for you. It's
So basically whenever I create buffer objects Opengl allocates some memory.
You don't know – and it's invalid to make assumptions – about the whereabouts of where this memory is located. You just get the assurance that it's there (somewhere) and that you can make use of it.
managed by OpenGL or something
Yes. In fact, and reasonable OpenGL implementations do have to move around data on a regular basis. Think about it: On a modern computer system several applications do use the GPU in parallel, and neither process (usually) does care about or respect the inner working of the other processes that coinhabit the same machine. Yet the user (naturally) expects, that all processes will "just work" independent of the situation.
The GPU drivers do a lot of data pushing in the background, moving stuff between the system memory, the GPU memory or even swap space on storage devices without processes noticing any of that.
OR should we keep this in mind before writing performance critical code?
Average-joe-programmer will get the best performance by just using the OpenGL API in a straightforward way, without trying to outsmart the implementation. Every OpenGL implementation (= combination of GPU model + driver version) has "fast paths", however short of having access to intimately detailed knowledge about the GPU and driver details those are very difficult to hit.
Usually only the GPU makers themselves have this knowledge; if you're a AAA game studio, you're usually having a few GPU vendor guys on quick dial to come for a visit to your office and do their voodoo; most people visiting this site probably don't.

Directx 11 Registers

I'm somewhat confused about registers in DirectX 11. Let me give an example of the situation: Assume you have 3 models. They each have a texture that is mapped to register t0. Models 1 and 3 use the same texture, and model 2 uses a different texture. When drawing model 1, I set the texture resource view to register 0 and draw the model. Then I do the same things for models 2 and 3, but use the same resource view for model 3. When I set the the texture for model 2, does the GPU replace the texture in the GPU memory with a different one, or does it maintain that texture memory until space is needed and just moves some pointers around? I would like to minimize data transfer to GPU and I'm wondering if I should handle situations like these myself or does DX handle it for me. Btw, I am NOT using the Effects 11 framework.
Thanks in advance.
In general, you should assume that once you have created a resource on the GPU (e.g. using CreateTexture2D), that memory is reserved and resident for that resource for use by the 3D pipeline. Note that this is independent of data transfer to the GPU, which is also explicit via Map/Unmap or UpdateSubresource.
There are some cases where the OS will swap memory in and out, but usually this should be avoided if possible. For example, if you create a bunch of large textures but never access them, eventually the video memory manager will page them out to system memory for other tasks (e.g. watching Netflix / browsing the internet on another display). You can also run into real problems if you overcommit video memory (using more than what is available on the system). This used to be impossible (you would just get E_OUTOFMEMORY) but now the memory manager tries to make it work by paging things to system memory or even disk. This is something you should really strive to avoid since if you ever bind and use a paged-out resource, you'll get a glitch waiting for the memory manager to page it back in for use.
Note that the above really just applies to discrete GPU configs. On integrated systems e.g. from Intel or AMD, you get unified memory which has completely different characteristics. But in general you should target discrete configs first, since there are more performance cliffs you have to worry about if you screwed something up, and they would be unlikely to show up on integrated.
Going back to your original question, changing SRVs between draw calls is not that expensive - it's more than a pointer swap, but nowhere near the cost of transferring the entire texture across the bus. You should feel free to swap SRVs at the same frequency as your draw calls and expect no adverse performance impact.
I think what you're asking depends a lot on hardware and driver implementation. That's also why DX11 documentation doesn't do any claims on how memory is managed for graphics resources. I can't really give you any valid sources but I believe it's safe to assume that textures/buffers for which you've made a view will reside in GPU's memory (especially the ones you're accessing more frequently). The graphics driver will do a lot of access optimization. However, it's good practice to change the state of graphics pipeline as little as possible.
You can read an in depth discussion of graphics pipeline here

Why does OpenGL give handles to objects instead of pointers?

The OpenGL tradition is to let the user manipulate OpenGL objects using an unsigned int handle. Why not just give a pointer instead? What are the advantages of unique IDs over pointers?
TL;DR: OpenGL IDs don't map bijectively to memory locations. A single OpenGL ID may refer to multiple memory locations at the same time. Also OpenGL has been designed to work for distributed rendering architectures (like X11) as well, and given an indirect context programs running on different machines may use the same OpenGL context.
OpenGL has been designed as an architecture and display system agnostic API. When OpenGL was first developed this happened in light of client-server display architectures (like X11). If you look into the OpenGL specification, even of modern OpenGL-4 it refers to clients and servers.
However in a client/server architectures pointers make no sense. For one the address space of the server is not accessible to the clients without jumping some hoops. And even if you set up a shared memory mapping, the addresses of objects are not the same for client and server. Add to this that on architectures like X11 a single indirect OpenGL context can be used by multiple clients, that may even run on different machines. Pointers simply don't work for that.
Last but not least the OpenGL object model is highly abstract and the OpenGL drawing model is asynchonous Say I do the following:
id = glGenTextures(1)
glBindTexture(id)
glTexStorage(…)
glTexSubImage(image_a)
draw_something()
glTexSubImage(image_b)
draw_someting_b()
When the end of this little snippet has reached, actually nothing at all may have been drawn yet, because no synchronization point has been reached (glFinish, glReadPixels, a buffer swap). Note the two calls to glTexSubImage, which happen on the same id. When the pixels are finally put to the framebuffer, there two different images to be sourced from a single texture ID, because OpenGL guarantees you, that things will appear as if things were drawn synchronously. So at the end of a drawing batch a single object ID may refer to a whole collection of different data sets with different locations in memory.
My first consideration - having pointers would make programmers wonder if they can operate with them in a pointer-arithmetic way, e.g. by pointing to a middle of a texture to update it or something like that. Maybe even more crazy things, such as patching shaders code on-the-fly. That all sounds like a whole new cool degree of freedom, unless you think of additional complications caused by tampering with highly efficient and optimized GPU "black-box" way of operation.
For example - consider inner workings of GPU memory allocation. Just like with OS - pointers you get from OS are not the real "physical" ones, OS memory manager can move things around behind the scenes while keeping the pointers the same (f.e. swapping to HDD). In that case IDs are just the same - GPU can optimize and pack entities with even more freedom, while keeping the nice facade of them being available at 1-2-3.
Another example - OpenGL is not actually the same across manufacturers. In fact OpenGL is just a description of API, where each vendor can make his own implementation the way it works best for him. For example there's no rule on hot to store texture mipmaps, aligned, or interleaved or whatever. Having pointers to a texture would lure developers into tampering with mipmaps, which would cause a lot of trouble to support various implementations or force all the implementations to become strictly unified, which again is a bad idea for performance.
The OpenGL device (GPU) may have its own memory with its own address space, independent of the host (CPU) memory system. (Think of a discrete video card with its own onboard RAM.) The host can't (directly) access that memory, so it's not possible to have a pointer to it.
It's best to think of the GPU as a whole separate computer; it's actually possible to do OpenGL over a network, with a program running on one computer rendering graphics on the video card in another. When you set up your textures and buffers, you're basically uploading data to the GL device for its own internal use.

Confusion regarding memory management in OpenGL

I'm asking this question because I don't want to spend time writing some code that duplicates functionalities of the OpenGL drivers.
Can the OpenGL driver/server hold more data than the video card? Say, I have enough video RAM to hold 10 textures. Can I ask OpenGL to allocate 15 textures without getting an GL_OUT_OF_MEMORY error?
If I can rely on the driver to cleverly send the textures/buffers/objects from the 'normal' RAM to the video RAM when needed then I don't really need to Gen/Delete these objects myself. I become limited by the 'normal' RAM which is often plentiful when compared to the video RAM.
The approach "memory is abundant so I don't need to delete" is bad, and the approach "memory is abundant, so I'll never get out of memory errors" is flawed.
OpenGL memory management is obscure, both for technical reasons (see t.niese's comment above) and for ideological reasons ("you don't need to know, you don't want to know"). Though there exist vendor extensions (such as ATI_meminfo) that let you query some non-authorative numbers (non-authorative insofar as they could change the next millisecond, and they do not take effects like fragmentation into account).
Generally, for the most part, your assumption that you can use more memory than there is GPU memory is correct.
However, you are not usually not able to use all available memory. More likely, there is a limit well below "all available RAM" due to constraints on what memory regions (and how large regions) the driver can allocate, lock, and DMA to/from. And even though you can normally use more memory than will fit on the GPU (even if you used it exclusively), this does not mean careless allocations can't and won't eventually fail.
Usually, but not necessarily, you consume as much system memory as GPU memory, too (without knowing, the driver does that secretly). Since the driver swaps resources in and out as needed, it needs to maintain a copy. Sometimes, it is necessary to keep 2 or 3 copies (e.g. when streaming or for ARB_copy_buffer operations). Sometimes, mapping a buffer object is yet another copy in a specially allocated block, and sometimes you're allowed to write straight into the driver's memory.
On the other hand, PCIe 2.0 (and PCIe 3.0 even more so) is fast enough to stream vertices from main memory, so you do not even strictly need GPU memory (other than a small buffer). Some drivers will stream dynamic geometry right away from system memory.
Some GPUs do not even have separate system and GPU memory (Intel Sandy Bridge or AMD Fusion).
Also, you should note that deleting objects does not necessarily delete them (at least not immediately). Usually, with very few exceptions, deleting an OpenGL object is merely a tentative delete which prevents you from further referencing the object. The driver will keep the object valid for as long as it needs to.
On the other hand, you really should delete what you do not need any more, and you should delete early. For example, you should delete a shader immediately after attaching it to the program object. This ensures that you do not leak resources, and it is guaranteed to work. Deleting and re-specifying the in-use vertex or pixel buffer when streaming (by calling glBufferData(... NULL); is a well-known idiom. This only affects your view of the object, and it allows the driver to continue using the old object in parallel for as long as it needs to.
Some additional information to my comment that did not fit in there.
There are different reasons why this is not part of OpenGL.
It isn't an easy task for the system/driver to guess which resources are and will be required. The driver for sure could create an internal heuristic if resource will be required often or rarely (like CPU does for if statements and doing pre executing code certain code parts on that guess). But the GPU will not know (without knowing the application code) what resource will be required next. It even has no knowledge where the geometry is places in the scene (because you do this with you model and view martix you pass to your shader yourself)
If you e.g. have a game where you can walk through a scene, you normally won't render the parts that are out of the view. So the GPU could think that these resources are not required anymore, but if you turn around then all this textures and geometry is required again and needs to be moved from system memory to gpu memory, which could result in really bad performance. But the Game Engine itself has, because of the use of octrees (or similar techniques) and the possible paths that can be walked, an in deep knowledge about the scene and which resource could be removed from the GPU and which one could be move to the GPU while playing and where it would be necessary to display a loading screen.
If you look at the evolution of OpenGL and which features become deprecated you will see that they go to the direction to remove everything except the really required features that can be done best by the graphic card, driver and system. Everything else is up to the user to implement on it's own to get the best performance. (you e.g. create your projection matrix yourself to pass it to the shader, so OpenGl even does not know where the object is placed in the scene).
Here's my TL;DR answer, I recommend reading Daemon's and t.niese's answers as well:
Can the OpenGL driver/server hold more data than the video card?
Yes
Say, I have enough video RAM to hold 10 textures. Can I ask OpenGL to allocate 15 textures without getting an GL_OUT_OF_MEMORY error?
Yes. Depending on the driver / GPU combination it might even be possible to allocate a single texture that exceeds the GPU's memory, and actually use it for rendering. At my current occupation I exploit that fact to extract slices of arbitrary orientation and geometry from large volumetric datasets, using shaders to apply filters on the voxel data in situ. Works well, but doesn't work for interactive frame rates.