SDL / OpenGL: Implementing a "Loading thread" - c++

I currently try to implement a "Loading thread" for a very basic gaming engine, which takes care of loading e.g. textures or audio while the main thread keeps rendering a proper message/screen until the operation is finished or even render regular game scenes while loading of smaller objects occurs in background.
Now, I am by far no OpenGL expert, but as I implemented such a "Loading" mechanism I quickly found out that OGL doesn't like access to the rendering context from a thread other than the one it was created on very much. I googled around and the solution seems to be:
"Create a second rendering context on the thread and share it with the context of the main thread"
The problem with this is that I use SDL to take care of my window management and context creation, and as far as I can tell from inspecting the API there is no way to tell SDL to share contexts between each other :(
I came to the conclusion that the best solutions for my case are:
Approach A) Alter the SDL library to support context sharing with the platform specific functions (wglShareLists() and glXCreateContext() I assume)
Approach B) Let the "Loading Thread" only load the data into memory and process it to be in a OpenGL-friendly format and pass it to the main thread which e.g. takes care of uploading the texture to the graphics adapter. This, of course, only applies to data that needs a valid OpenGL context to be done
The first solution is the least efficient one I guess. I don't really want to mangle with SDL and beside that I read that context sharing is not a high-performance operation. So my next take would be on the second approach so far.
EDIT: Regarding the "high-performance operation": I read the article wrong, it actually isn't that performance intensive. The article suggested shifting the CPU intensive operations to the second thread with a second context. Sorry for that
After all this introduction I would really appreciate if anyone could give me some hints and comments to the following questions:
1) Is there any way to share contexts with SDL and would it be any good anyway to do so?
2) Is there any other more "elegant" way to load my data in the background that I may have missed or didn't think about?
3) Can my intention of going with approach B considered to be a good choice? There would still be slight overhead from the OpenGL operations on my main thread which blocks rendering, or is it that small that it can be ignored?

Is there any way to share contexts with SDL
No.
Yes!
You have to get the current context, using platform-specific calls. From there, you can create a new context and make it shared, also with platform-specific calls.
Is there any other more "elegant" way to load my data in the background that I may have missed or didn't think about?
Not really. You enumerated the options quite well: hack SDL to get the data you need, or load data inefficiently.
However, you can load the data into mapped buffer objects and transfer the data to OpenGL. You can only do the mapping/unmapping on the OpenGL thread, but the pointer you get when you map can be used on any thread. So map a buffer, pass it to the worker thread. It loads data into the mapped memory, and flips a switch. The GL thread unmaps the pointer (the worker thread should forget about the pointer now) and uploads the texture data.
Can my intention of going with approach B considered to be a good choice?
Define "good"? There's no way to answer this without knowing more about your problem domain.

Related

Asynchronous rendering model in vulkan

Recently I'm reading https://github.com/ARM-software/vulkan_best_practice_for_mobile_developers/blob/master/samples/vulkan_basics.md, and it said:
OpenGL ES uses a synchronous rendering model, which means that an API call must behave as if all earlier API calls have already been processed. In reality no modern GPU works this way, rendering workloads are processed asynchronously and the synchronous model is an elaborate illusion maintained by the device driver. To maintain this illusion the driver must track which resources are read or written by each rendering operation in the queue, ensure that workloads run in a legal order to avoid rendering corruption, and ensure that API calls which need a data resource block and wait until that resource is safely available.
Vulkan uses an asynchronous rendering model, reflecting how the modern GPUs work. Applications queue rendering commands into a queue, use explicit scheduling dependencies to control workload execution order, and use explicit synchronization primitives to align dependent CPU and GPU processing.
The impact of these changes is to significantly reduce the CPU overhead of the graphics drivers, at the expense of requiring the application to handle dependency management and synchronization.
Could someone help explain why asynchronous rendering model could reduce CPU overhead? Since in Vulkan you still have to track state yourself.
Could someone help explain why asynchronous rendering model could
reduce CPU overhead?
First of all, let's get back to the original statement you are referring to, emphasis mine:
The impact of these changes is to significantly reduce the CPU
overhead of the graphics drivers, [...]
So the claim here is that the driver itself will need to consume less CPU, and it is easy to see as it can more directly forward your requests "as-is".
However, one overall goal of a low-level rendering API like Vulkan is also a potentially reduced CPU overhead in general, not only in the driver.
Consider the following example: You have a draw call which renders to a texture. And then you have another draw call which samples from this texture.
To get the implicit synchronization right, the driver has to track the usage of this texture, both as render target and as source for texture sampling operations.
It doesn't know in advance if the next draw call will need any resources which are still to be written to in previous draw calls. It has to always track every possible such conflicts, no matter if they can occur in your application or not. And it also must be extremely conservative in its decisions. It might be possible that you have a texture bound for a framebuffer for a draw call, but you may know that with the actual uniform values you set for this shaders the texture is not modified. But the GPU driver can't know that. If it can't rule - out with absolute certainty - that a resource is modified, it has to assume it is.
However, your application will more like know such details. If you have several render passes, and the second pass will depend on the texture rendered to in the first, you can (and must) add proper synchronization primitives - but the GPU driver doesn't need to care why there is any synchronization necessary at all, and it doesn't need track any resource usage to find out - it can just do as it is told. And your application also doesn't need to track it's own resource usage in many cases. It is just inherent from the usage as you coded it that a synchronization might be required at some point. There might be still cases where you need to track your own resource usage to find out though, especially if you write some intermediate layer like some more high-level graphics library where you know less and less of the structure of the rendering - then you are getting into a position similar to what a GL driver has to do (unless you want to forward all the burden of synchronization on the users of your library, like Vulkan does).

What can Vulkan do specifically that OpenGL 4.6+ cannot?

I'm looking into whether it's better for me to stay with OpenGL or consider a Vulkan migration for intensive bottlenecked rendering.
However I don't want to make the jump without being informed about it. I was looking up what benefits Vulkan offers me, but with a lot of googling I wasn't able to come across exactly what gives performance boosts. People will throw around terms like "OpenGL is slow, Vulkan is way faster!" or "Low power consumption!" and say nothing more on the subject.
Because of this, it makes it difficult for me to evaluate whether or not the problems I face are something Vulkan can help me with, or if my problems are due to volume and computation (and Vulkan would in such a case not help me much).
I'm assuming Vulkan does not magically make things in the pipeline faster (as in shading in triangles is going to be approximately the same between OpenGL and Vulkan for the same buffers and uniforms and shader). I'm assuming all the things with OpenGL that cause grief (ex: framebuffer and shader program changes) are going to be equally as painful in either API.
There are a few things off the top of my head that I think Vulkan offers based on reading through countless things online (and I'm guessing this certainly is not all the advantages, or whether these are even true):
Texture rendering without [much? any?] binding (or rather a better version of 'bindless textures'), which I've noticed when I switched to bindless textures I gained a significant performance boost, but this might not even be worth mentioning as a point if bindless textures effectively does this and therefore am not sure if Vulkan adds anything here
Reduced CPU/GPU communication by composing some kind of command list that you can execute on the GPU without needing to send much data
Being able to interface in a multithreaded way that OpenGL can't somehow
However I don't know exactly what cases people run into in the real world that demand these, and how OpenGL limits these. All the examples so far online say "you can run faster!" but I haven't seen how people have been using it to run faster.
Where can I find information that answers this question? Or do you know some tangible examples that would answer this for me? Maybe a better question would be where are the typical pain points that people have with OpenGL (or D3D) that caused Vulkan to become a thing in the first place?
An example of answer that would not be satisfying would be a response like
You can multithread and submit things to Vulkan quicker.
but a response that would be more satisfying would be something like
In Vulkan you can multithread your submissions to the GPU. In OpenGL you can't do this because you rely on the implementation to do the appropriate locking and placing fences on your behalf which may end up creating a bottleneck. A quick example of this would be [short example here of a case where OpenGL doesn't cut it for situation X] and in Vulkan it is solved by [action Y].
The last paragraph above may not be accurate whatsoever, but I was trying to give an example of what I'd be looking for without trying to write something egregiously wrong.
Vulkan really has four main advantages in terms of run-time behavior:
Lower CPU load
Predictable CPU load
Better memory interfaces
Predictable memory load
Specifically lower GPU load isn't one of the advantages; the same content using the same GPU features will have very similar GPU performance with both of the APIs.
In my opinion it also has many advantages in terms of developer usability - the programmer's model is a lot cleaner than OpenGL, but there is a steeper learning curve to get to the "something working correctly" stage.
Let's look at each of the advantages in more detail:
Lower CPU load
The lower CPU load in Vulkan comes from multiple areas, but the main ones are:
The API encourages up-front construction of descriptors, so you're not rebuilding state on a draw-by-draw basis.
The API is asynchronous and can therefore move some responsibilities, such as tracking resource dependencies, to the application. A naive application implementation here will be just as slow as OpenGL, but the application has more scope to apply high level algorithmic optimizations because it can know how resources are used and how they relate to the scene structure.
The API moves error checking out to layer drivers, so the release drivers are as lean as possible.
The API encourages multithreading, which is always a great win (especially on mobile where e.g. four threads running slowly will consume a lot less energy than one thread running fast).
Predictable CPU load
OpenGL drivers do various kinds of "magic", either for performance (specializing shaders based on state only known late at draw time), or to maintain the synchronous rendering illusion (creating resource ghosts on the fly to avoid stalling the pipeline when the application modifies a resource which is still referenced by a pending command).
The Vulkan design philosophy is "no magic". You get what you ask for, when you ask for it. Hopefully this means no random slowdowns because the driver is doing something you didn't expect in the background. The downside is that the application takes on the responsibility for doing the right thing ;)
Better memory interfaces
Many parts of the OpenGL design are based on distinct CPU and GPU memory pools which require a programming model which gives the driver enough information to keep them in sync. Most modern hardware can do better with hardware-backed coherency protocols, so Vulkan enables a model where you can just map a buffer once, and then modify it adhoc and guarantee that the "other process" will see the changes. No more "map" / "unmap" / "invalidate" overhead (provided the platform supports coherent buffers, of course, it's still not universal).
Secondly Vulkan separates the concept of the memory allocation and how that memory is used (the memory view). This allows the same memory to be recycled for different things in the frame pipeline, reducing the amount of intermediate storage you need allocated.
Predictable memory load
Related to the "no magic" comment for CPU performance, Vulkan won't generate random resources (e.g. ghosted textures) on the fly to hide application problems. No more random fluctuations in resource memory footprint, but again the application has to take on the responsibility to do the right thing.
This is at risk of being opinion based. I suppose I will just reiterate the Vulkan advantages that are written on the box, and hopefully uncontested.
You can disable validation in Vulkan. It obviously uses less CPU (or battery\power\noise) that way. In some cases this can be significant.
OpenGL does have poorly defined multi-threading. Vulkan has well defined multi-threading in the specification. Meaning you do not immediately lose your mind trying to code with multiple threads, as well as better performance if otherwise the single thread would be a bottleneck on CPU.
Vulkan is more explicit; it does not (or tries to not) expose big magic black boxes. That means e.g. you can do something about micro-stutter and hitching, and other micro-optimizations.
Vulkan has cleaner interface to windowing systems. No more odd contexts and default framebuffers. Vulkan does not even require window to draw (or it can achieve it without weird hacks).
Vulkan is cleaner and more conventional API. For me that means it is easier to learn (despite the other things) and more satisfying to use.
Vulkan takes binary intermediate code shaders. While OpenGL used not to. That should mean faster compilation of such code.
Vulkan has mobile GPUs as first class citizen. No more ES.
Vulkan have open source, and conventional (GitHub) public tracker(s). Meaning you can improve the ecosystem without going through hoops. E.g. you can improve\implement a validation check for error that often trips you. Or you can improve the specification so it does make sense for people that are not insiders.

Is it possible to preserve all the state at once in OpenGL?

If we have several OpenGL contexts, each in its own process, the driver somehow virtualises the device, so that each program thinks it exclusively runs the GPU. That is, if one program calls glEnable, the other one will never notice that.
This could be otherwise done with a ton of glGet calls to save state and its counterparts to restore it afterwards. Obviously, the driver does it more efficiently. However, in userspace we need to track which changes we made to the state and handle them selectively. Maybe it's just me missing something, but I thought it would be nice, for one, to adjust Viewport for a Framebuffer, and then just undo those changes to whatever state they were before.
Maybe there is a way of achieving the effect of a context switch yet within a single program?
Maybe there is a way of achieving the effect of a context switch yet within a single program?
You may create as many OpenGL contexts in a single process as you like and switch between them. Also with modern GPUs the state of the OpenGL context has little resemblance of what's actually happening on the GPU.
For pre-Core OpenGL there's glPushAttrib()/glPopAttrib() that will let you store off some GL state.
You're probably better off writing your own client-side state shadowing though.
The state machine (and command queue, discussed below) are unique to each context. It is much, much higher-level than you are thinking; the state is generally wrapped up nicely in usermode.
As for context-switching in a single process, be aware that each render context in GL is unsynchronized. An implicit flush is generated during a context switch in order to help alleviate this problem. As long as a context is only used by a single thread, this is generally adequate but probably going to negatively impact performance.

Why does OpenGL give handles to objects instead of pointers?

The OpenGL tradition is to let the user manipulate OpenGL objects using an unsigned int handle. Why not just give a pointer instead? What are the advantages of unique IDs over pointers?
TL;DR: OpenGL IDs don't map bijectively to memory locations. A single OpenGL ID may refer to multiple memory locations at the same time. Also OpenGL has been designed to work for distributed rendering architectures (like X11) as well, and given an indirect context programs running on different machines may use the same OpenGL context.
OpenGL has been designed as an architecture and display system agnostic API. When OpenGL was first developed this happened in light of client-server display architectures (like X11). If you look into the OpenGL specification, even of modern OpenGL-4 it refers to clients and servers.
However in a client/server architectures pointers make no sense. For one the address space of the server is not accessible to the clients without jumping some hoops. And even if you set up a shared memory mapping, the addresses of objects are not the same for client and server. Add to this that on architectures like X11 a single indirect OpenGL context can be used by multiple clients, that may even run on different machines. Pointers simply don't work for that.
Last but not least the OpenGL object model is highly abstract and the OpenGL drawing model is asynchonous Say I do the following:
id = glGenTextures(1)
glBindTexture(id)
glTexStorage(…)
glTexSubImage(image_a)
draw_something()
glTexSubImage(image_b)
draw_someting_b()
When the end of this little snippet has reached, actually nothing at all may have been drawn yet, because no synchronization point has been reached (glFinish, glReadPixels, a buffer swap). Note the two calls to glTexSubImage, which happen on the same id. When the pixels are finally put to the framebuffer, there two different images to be sourced from a single texture ID, because OpenGL guarantees you, that things will appear as if things were drawn synchronously. So at the end of a drawing batch a single object ID may refer to a whole collection of different data sets with different locations in memory.
My first consideration - having pointers would make programmers wonder if they can operate with them in a pointer-arithmetic way, e.g. by pointing to a middle of a texture to update it or something like that. Maybe even more crazy things, such as patching shaders code on-the-fly. That all sounds like a whole new cool degree of freedom, unless you think of additional complications caused by tampering with highly efficient and optimized GPU "black-box" way of operation.
For example - consider inner workings of GPU memory allocation. Just like with OS - pointers you get from OS are not the real "physical" ones, OS memory manager can move things around behind the scenes while keeping the pointers the same (f.e. swapping to HDD). In that case IDs are just the same - GPU can optimize and pack entities with even more freedom, while keeping the nice facade of them being available at 1-2-3.
Another example - OpenGL is not actually the same across manufacturers. In fact OpenGL is just a description of API, where each vendor can make his own implementation the way it works best for him. For example there's no rule on hot to store texture mipmaps, aligned, or interleaved or whatever. Having pointers to a texture would lure developers into tampering with mipmaps, which would cause a lot of trouble to support various implementations or force all the implementations to become strictly unified, which again is a bad idea for performance.
The OpenGL device (GPU) may have its own memory with its own address space, independent of the host (CPU) memory system. (Think of a discrete video card with its own onboard RAM.) The host can't (directly) access that memory, so it's not possible to have a pointer to it.
It's best to think of the GPU as a whole separate computer; it's actually possible to do OpenGL over a network, with a program running on one computer rendering graphics on the video card in another. When you set up your textures and buffers, you're basically uploading data to the GL device for its own internal use.

OpenGL multithreading/shared context and glGenBuffers

If I plan to use multithreading in OpenGL, should I have separate buffers (from glGenBuffers) for each context?
I do not know much about OpenGL multithreading yet (for now I work in "single" thread). I need to know if I can share buffers already pushed to Video Memory (with glBufferData/glBufferSubData), or I have to keep copy of buffer for another thread.
You do not want to use several contexts with several threads. You really don't.
While this sounds like a good idea, in practice muli-context-multi-thread is complicated, troublesome, and badly supported on the driver side, and it only marginally improves (possibly even reduces!) performance.
What you really want is to have only one thread talk to OpenGL (with one context, obviously), map a buffer, and pass the memory pointer to another thread, preferrably using 3 buffers (3 subbuffers of a 3x sized buffer) with immutable storage and persistent mapping, if this is available.
That, and doing indirect render calls, where a second thread feeds the buffers the indirect call reads from.
Further info on the persistent mapping topic: See in particular slides 22-25 of this GDC2014 presentation, which is basically a remake of Cass Everitt's 2013 SIGGRAPH talk.
See also Everitt's original talk: Beyond porting.
Vaos aren't shared so you'll need to generate a new vao for each object per context or else the behavior will become unpredictable and incorrect upon deletion / creation of a new one. This can be a major source of error. Vbos can be shared, so you just need one vbo per object.