Open GL: multithreaded glFlushMappedBufferRange? - c++

I know that multi threaded OpenGL is a delicate topic and I am not trying here to render from multiple threads. I also do not try to create multiple contexts and share objects with share lists. I have a single context and I issue draw commands and gl state changes only from the main thread.
However, I am dynamically updating parts of a VBO in every frame. I only write to the VBO, I do not need to read it on the CPU side. I use glMapBufferRange so I can compute the changed data on the fly and don't need an additional copy (which would be created by the blocking glBufferSubData).
It works and now I would like to multi thread the the data update (since it needs to update a lot of vertices at steady 90 fps) and use a persistently mapped buffer (using GL_MAP_PERSISTENT_BIT). This will require to issue glFlushMappedBufferRange whenever a worker thread finished updating parts of the mapped buffer.
Is it fine to call glFlushMappedBufferRange on a separate thread? The Ranges the different threads operate on do not overlap. Is there an overhead or implicit synchronisation involved in doing so?

No you need to call glFlushMappedBufferRange in the thread that does the openGL stuff.
To overcome this you have 2 options:
get the openGL context and make it current in the worker thread. Which means the openGL thread has to relinquish the context for it to work.
push the relevant range into a thread-safe queue and let the openGL thread pop each range from it and call glFlushMappedBufferRange.

Related

Multithreaded loading data to GPU on Opengl

I am attempting to write a multi threaded "load from disk" algorithm for my C++/OpenGL game engine.
My current situation is as follows:
Thread #0(main thread): Core engine functions, Rendering work.
Thread #1: Spatial processing //Physics, etc. Not really relevant to the problem.
Thread #2: background loading from disk
The algorithm loads Entity details from an XML file on disk, which contains graphics information, such as model and texture files for rendering.
The idea is that the engine should be capable of loading entities from disk, and then from memory to the GPU without blocking the main thread. However, at the moment, it can only load into main memory.
Whenever I load data on thread 2, I have to notify thread 0 that the load from disk is complete. Then the engine code running on thread 0 makes the necessary GL calls to send data from memory to the GPU, and sends the entities to the renderer.
I am aware that making OpenGL calls on multiple threads is undefined and will cause the program to crash.
I am also aware that it is possible to have a shared gl context, and that on each thread that you want to make GL calls, you must first make the context current (Synchronise), then make the calls.
As I understand it, making a GL context current will make all other contexts on other threads inactive, which leads us back to the undefined behaviour/crash situation.
I would prefer that the VAO/VBO/texture object populating is done on thread 2. I think this could be achieved by creating a context on thread 2, and making it currently active context, but I am unsure how this would affect rendering. Would I have to stop rendering whilst this is being done? If so, then I don't see any benefit as it may as well be done on the main thread.
Is there a way to create and populate buffer objects on thread 2, whilst not interfering with rendering operations on thread 0?
To Clarify:
Thread 2 will never perform rendering, only loading of data onto the GPU.
Thread 0 only works with already populated buffers. It will only be given data that already exists on the GPU.

Use opengl-command from different thread

I have two threads: One main-thread (opengl) for 3d-rendering and one thread for logic. How should I connect the threads if I want to create a box-mesh in the rendering thread, if the order comes from the logic-thread?
In this case the logic-thread would use opengl-commands, which is not possible because every opengl-command should only be exectued in the main-thread. I know that I can not share opengl context over different threads (which seems to be a bad idea), so how should I solve this problem? Do there exist some general purpose design pattern or something else? Thanks.
You could implement a draw commands queue. Each draw command contains whatever is needed to make the required OpenGL calls. Each frame the rendering thread empties the current queue and processes the commands. Any other thread prepares its own commands and enqueues them at any time to the queue for the next frame.
Very primitive draw commands can be implemented as a class hierarchy with virtual Draw method. Of course this is not a small change at all but modern engines adopt this approach, of course much more advanced version of it. It can be efficient if the subsystems which submitted their command objects re-use them in the next frame, including their buffers. So each submodule constantly prepares and updates the draw command but submits it only when it should be rendered based on some logic.
There are various ways to approach this. One is to implement a command queue with the logic thread being a command producer and the rendering thread the consumer.
Another approach is to make use of an auxiliary OpenGL context, which is setup to share the primary OpenGL context data. You can have both contexts made current at the same time in different threads. And for OpenGL-3.x core onward, you can make current a context without a drawable. You can then use the auxiliary context to load new data, map buffers and so on.

synchronising fences and texture unloading

My OpenGL application displays a 3D view where images are loaded on demand based on what is visible. The load operation send's a load request to the texture loader thread, which then does this:
...
<Open image and read into an openGL texture = tex>
// is this lock needed if this is one of multiple loader threads that each pass textures to the main thread? I.e is glClientWaitSync() thread-safe?
{
std::lock_guard<std::mutex> lk(mOpenGLClientWaitSyncMutex);
// we need to wait on a fence before alerting the primary thread that the Texture is ready
glClientWaitSync(mSync, GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
// Pass texture to MAIN THREAD
textureLoaderRequest->mContentViewer->setTexture(tex, requestFrameIndex);
}
...
Note. A shared OpenGL context is created for each loader thread and associated with the main.
I have the following areas of uncertainty:
a) OpenGL Fences.
I must admit, I am still chewing over the OpenGL fences subject. At the moment there is only 1 texture loader thread, and the main thread. If I had multiple texture loader threads, would I need to synchronise the fence->clientWaitSync()'s? (or am I being daft here as would the clientWaitSync() in each loader thread provide the necessary sync guards for multiple threads setting a texture to be drawn on the main thread). I did some reading, added the above lock_guard, and a bit of testing and it didn't seem to affect anything. (but that doesn't usually mean anything, hence my question).
b) Texture Unloading.
My goal is to have another thread that performs the unloads to keep the UI smooth, whilst not delaying the IO related loader thread/s. Because I am creating the texture on the loader thread, what sort of context arrangement is required such that the unloader thread can unload it. I mean, at the moment they are all 'attached' to the main context, so I guess my question is whether the siblings need some sort of "direct" sharing arrangement, or are they already shared due to the association with the main opengl context? How would you tackle such a problem?

ARB_sync and proper testing

I came across the concept of Sync Objects, and decided to test them out. They seem to work as expected, but my current test cases are limited.
What would be a proper test to ensure that these sync objects are performing as intended as a means to synchronize the CPU rendering thread with the GPU?
An example use-case for this would be for video capture programs which "hook" into the OpenGL context of a video game, or some other application using OpenGL.
Your example use-case seems fishy to me.
FRAPS is an example of a program that "hooks" into an OpenGL application to capture video, and it does it very differently. Rather than force a CPU-GPU synchronization, FRAPS inserts an asynchronous pixelbuffer read immediately before SwapBuffers (...) is called. It will then try and read the results back the next time SwapBuffers (...) is called instead of stalling while the result becomes available the first time around. Latency does not matter for FRAPS.
However, even without the async PBO read, there would be no reason for FRAPS to use a sync object. glReadPixels (...) and commands like it will implicitly wait for all pending commands to finish before reading the results and returning control to the CPU. It would really hurt performance, but GL would automatically do the synchronization.
The simplest use-case for sync objects is two or more render contexts running simultaneously.
In OpenGL you can share certain resources (including sync objects) across contexts, but the command stream for each context is completely separate and no synchronization of any sort is enforced. Thus, if you were to upload data to a vertex buffer in one context and use it in another, you would insert a fence sync in the producer (upload context) and wait for it to be signaled in the consumer (draw context). This will ensure that the draw command does not occur until the upload is finished - if the commands were all issued from the same context, GL would actually guarantee this without the use of a sync object.
The example I just gave does not require CPU-GPU synchronization (only GPU-GPU), but you can use glClientWaitSync (...) to block your calling thread until the upload is finished if you had a situation where CPU-GPU made sense.
Here is some pseudo-code to evaluate the effectiveness of a sync object:
Thread 1:
glBindBuffer (GL_ARRAY_BUFFER, vbo);
glBufferSubData (GL_ARRAY_BUFFER, 0, 4096*4096, foo); // Upload a 16 MiB buffer
GLsync ready =
glFenceSync (GL_SYNC_GPU_COMMANDS_COMPLETE​, 0);
Thread 0:
glBindBuffer (GL_ARRAY_BUFFER, vbo);
// Try with and without synchronization
if (sync) {
// Wait up to 1 second for the upload to finish
glClientWaitSync (ready, GL_SYNC_FLUSH_COMMANDS_BIT, 1000000000UL);
}
// Ordinarily mapping a buffer would wait for everything else to finish,
// we need to eliminate that behavior (GL_MAP_UNSYNCHRONIZED_BIT) for this test.
void* bar =
glMapBufferRange (GL_ARRAY_BUFFER, 0, 4096*4096, GL_MAP_UNSYNCHRONIZED_BIT​);
// When `sync` is true and the sync object is working, bar should be identical to foo

A way of generating chunks

I'm making a game and I'm actually on the generation of the map.
The map is generated procedurally with some algorithms. There's no problems with this.
The problem is that my map can be huge. So I've thought about cutting the map in chunks.
My chunks are ok, they're 512*512 pixels each, but the only problem is : I have to generate a texture (actually a RenderTexture from the SFML). It takes around 0.5ms to generate so it makes the game to freeze each time I generate a chunk.
I've thought about a way to fix this : I've made a kind of a threadpool with a factory. I just have to send a task to it and it creates the chunk.
Now that it's all implemented, it raises opengl warnings like :
"An internal OpenGL call failed in RenderTarget.cpp (219) : GL_INVALID_OPERATION, the specified operation is not allowed in the current state".
I don't know if this is the good way of dealing with chunks. I've also thought about saving the chunks into images / files, but I fear that it take too much time to save / load them.
Do you know a better way to deal with this kind of "infinite" maps ?
It is an invalid operation because you must have a context bound to each thread. More importantly, all of the GL window system APIs enforce a strict 1:1 mapping between threads and contexts... no thread may have more than one context bound and no context may be bound to more than one thread. What you would need to do is use shared contexts (one context for drawing and one for each worker thread), things like buffer objects and textures will be shared between all shared contexts but the state machine and container objects like FBOs and VAOs will not.
Are you using tiled rendering for this map, or is this just one giant texture?
If you do not need to update individual sub-regions of your "chunk" images you can simply create new textures in your worker threads. The worker threads can create new textures and give them data while the drawing thread goes about its business. Only after a worker thread finishes would you actually try to draw using one of the chunks. This may increase the overall latency between the time a chunk starts loading and eventually appears in the finished scene but you should get a more consistent framerate.
If you need to use a single texture for this, I would suggest you double buffer your texture. Have one that you use in the drawing thread and another one that your worker threads issue glTexSubImage2D (...) on. When the worker thread(s) finish updating their regions of the texture you can swap the texture you use for drawing and updating. This will reduce the amount of synchronization required, but again increases the latency before an update eventually appears on screen.
things to try:
make your chunks smaller
generate the chunks in a separate thread, but pass to the gpu from the main thread
pass to the gpu a small piece at a time, taking a second or two