c++ OpenGL Multithreading with buffer resource - c++

I have an OpenGL program that needs to periodically update the textures. But at the same time I want the program to be responsive (specifically, to continue running the draw/display code) while it's updating these textures.
But this seems impossible: If I make thread1 do the draw/display code and thread2 to the texture moving, then they will conflict under the resource GL_ARRAY_BUFFER_ARB because thread2 has to keep that resource to move some textures over to a vbo. And I need GL_ARRAY_BUFFER_ARB to do the draw/display code for thread1 because that uses different vbo's.
For example this code
glBindBufferARB(GL_ARRAY_BUFFER_ARB, tVboId);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, numVertices*2*sizeof(GLfloat), texCoords, GL_DYNAMIC_DRAW_ARB);
glBindBufferARB(GL_ARRAY_BUFFER_ARB,0);
will move some textures over but it will take a while. During that time the display code is supposed to run many times but it will instead crash, because GL_ARRAY_BUFFER_ARB is in use.
I thought I could do something like GL_ARRAY_BUFFER_ARB2 but there is no such thing, I think.

Use PBOs they allow you to do asynchronous transfers, read more here.

This should help: http://hacksoflife.blogspot.com/2008/02/creating-opengl-objects-in-second.html

Related

Open GL: multithreaded glFlushMappedBufferRange?

I know that multi threaded OpenGL is a delicate topic and I am not trying here to render from multiple threads. I also do not try to create multiple contexts and share objects with share lists. I have a single context and I issue draw commands and gl state changes only from the main thread.
However, I am dynamically updating parts of a VBO in every frame. I only write to the VBO, I do not need to read it on the CPU side. I use glMapBufferRange so I can compute the changed data on the fly and don't need an additional copy (which would be created by the blocking glBufferSubData).
It works and now I would like to multi thread the the data update (since it needs to update a lot of vertices at steady 90 fps) and use a persistently mapped buffer (using GL_MAP_PERSISTENT_BIT). This will require to issue glFlushMappedBufferRange whenever a worker thread finished updating parts of the mapped buffer.
Is it fine to call glFlushMappedBufferRange on a separate thread? The Ranges the different threads operate on do not overlap. Is there an overhead or implicit synchronisation involved in doing so?
No you need to call glFlushMappedBufferRange in the thread that does the openGL stuff.
To overcome this you have 2 options:
get the openGL context and make it current in the worker thread. Which means the openGL thread has to relinquish the context for it to work.
push the relevant range into a thread-safe queue and let the openGL thread pop each range from it and call glFlushMappedBufferRange.

ARB_sync and proper testing

I came across the concept of Sync Objects, and decided to test them out. They seem to work as expected, but my current test cases are limited.
What would be a proper test to ensure that these sync objects are performing as intended as a means to synchronize the CPU rendering thread with the GPU?
An example use-case for this would be for video capture programs which "hook" into the OpenGL context of a video game, or some other application using OpenGL.
Your example use-case seems fishy to me.
FRAPS is an example of a program that "hooks" into an OpenGL application to capture video, and it does it very differently. Rather than force a CPU-GPU synchronization, FRAPS inserts an asynchronous pixelbuffer read immediately before SwapBuffers (...) is called. It will then try and read the results back the next time SwapBuffers (...) is called instead of stalling while the result becomes available the first time around. Latency does not matter for FRAPS.
However, even without the async PBO read, there would be no reason for FRAPS to use a sync object. glReadPixels (...) and commands like it will implicitly wait for all pending commands to finish before reading the results and returning control to the CPU. It would really hurt performance, but GL would automatically do the synchronization.
The simplest use-case for sync objects is two or more render contexts running simultaneously.
In OpenGL you can share certain resources (including sync objects) across contexts, but the command stream for each context is completely separate and no synchronization of any sort is enforced. Thus, if you were to upload data to a vertex buffer in one context and use it in another, you would insert a fence sync in the producer (upload context) and wait for it to be signaled in the consumer (draw context). This will ensure that the draw command does not occur until the upload is finished - if the commands were all issued from the same context, GL would actually guarantee this without the use of a sync object.
The example I just gave does not require CPU-GPU synchronization (only GPU-GPU), but you can use glClientWaitSync (...) to block your calling thread until the upload is finished if you had a situation where CPU-GPU made sense.
Here is some pseudo-code to evaluate the effectiveness of a sync object:
Thread 1:
glBindBuffer (GL_ARRAY_BUFFER, vbo);
glBufferSubData (GL_ARRAY_BUFFER, 0, 4096*4096, foo); // Upload a 16 MiB buffer
GLsync ready =
glFenceSync (GL_SYNC_GPU_COMMANDS_COMPLETE​, 0);
Thread 0:
glBindBuffer (GL_ARRAY_BUFFER, vbo);
// Try with and without synchronization
if (sync) {
// Wait up to 1 second for the upload to finish
glClientWaitSync (ready, GL_SYNC_FLUSH_COMMANDS_BIT, 1000000000UL);
}
// Ordinarily mapping a buffer would wait for everything else to finish,
// we need to eliminate that behavior (GL_MAP_UNSYNCHRONIZED_BIT) for this test.
void* bar =
glMapBufferRange (GL_ARRAY_BUFFER, 0, 4096*4096, GL_MAP_UNSYNCHRONIZED_BIT​);
// When `sync` is true and the sync object is working, bar should be identical to foo

How to cancel a blocking OpenGL call

I'm investigating GPGPU programming with OpenGL + GLSL. One problem is that if you have a shader that takes a long time to finish, it seems to be impossible to cancel it.
After setting up everything, I issue the final glReadPixels call which blocks until all pixels have been rendered to a framebuffer. Depending on the shader, this could take a long time, even seconds. Is there a way to cancel the call (from another thread) or even query the progress? What happens if you set up an infinite loop in a shader?
instead of glReadPixels you could use PixelBufferObjects which are not blocking. glReadPixels will wait (in your main thread) for the results, but PBO will continue... somewhere later in the code you can check if the data in PBO are available.
http://www.songho.ca/opengl/gl_pbo.html
http://www.opengl.org/wiki/Pixel_Buffer_Object
if you need some more advanced calculations then you may want to use OpenCL, that will give you more flexibility.
What happens if you set up an infinite loop in a shader?
I think you will get crash of video driver.

Asynchronous readback from opengl front buffer using multiple PBO's

I am developing an application that needs to read back the whole frame from the front buffer of an openGL application. I can hijack the application's opengl library and insert my code on swapbuffers. At the moment I am successfully using a simple but excruciating slow glReadPixels command without PBO's.
Now I read about using multiple PBO's to speed things up. While I think I've found enough resources to actually program that (isn't that hard), I have some operational questions left. I would do something like this:
create a series (e.g. 3) of PBO's
use glReadPixels in my swapBuffers override to read data from front buffer to a PBO (should be fast and non-blocking, right?)
Create a seperate thread to call glMapBufferARB, once per PBO after a glReadPixels, because this will block until the pixels are in client memory.
Process the data from step 3.
Now my main concern is of course in steps 2 and 3. I read about glReadPixels used on PBO's being non-blocking, will this be an issue if I issue new opengl commands after that very fast? Will those opengl commands block? Or will they continue (my guess), and if so, I guess only swapbuffers can be a problem, will this one stall or will glReadPixels from front buffer be many times faster than swapping (about each 15->30ms) or, worst case scenario, will swapbuffers be executed while glReadPixels is still reading data to the PBO? My current guess is this logic will do something like this: copy FRONT_BUFFER -> generic place in VRAM, copy VRAM->RAM. But I have no idea which of those 2 is the real bottleneck and more, what the influence on the normal opengl command stream is.
Then in step 3. Is it wise to do this asynchronously in a thread separated from normal opengl logic? At the moment I think not, It seems you have to restore buffer operations to normal after doing this and I can't install synchronization objects in the original code to temporarily block those. So I think my best option is to define a certain swapbuffer delay before reading them out, so e.g. calling glReadPixels on PBO i%3 and glMapBufferARB on PBO (i+2)%3 in the same thread, resulting in a delay of 2 frames. Also, when I call glMapBufferARB to use data in client memory, will this be the bottleneck or will glReadPixels (asynchronously) be the bottleneck?
And finally, if you have some better ideas to speed up frame readback from GPU in opengl, please tell me, because this is a painful bottleneck in my current system.
I hope my question is clear enough, I know the answer will probably also be somewhere on the internet but I mostly came up with results that used PBO's to keep buffers in video memory and do processing there. I really need to read back the front buffer to RAM and I do not find any clear explanations about performance in that case (which I need, I cannot rely on "it's faster", I need to explain why it's faster).
Thank you
Are you sure you want to read from the front buffer? You do not own this buffer, and depending on your OS it might be destroyed, e.g., by another window on top of it.
For your use case, people typically do
draw N
start PBO read N from back buffer
draw N+1
start PBO read N+1
sync PBO read N
process N
...
from a single thread.

What is a good way to load textures dynamically in OpenGL?

Currently I am loading an image in to memory on a 2nd thread, and then during the display loop (if there is a texture load required), load the texture.
I discovered that I could not load the texture on the 2nd thread because OpenGL didn't like that; perhaps this is possible but I did something wrong - so please correct me if this is actually possible.
On the other hand, if my failure was valid - how do I load a texture without disrupting the rendering loop? Currently the textures take around 1 second to load from memory, and although this isn't a major issue, it can be slightly irritating for the user.
You can load a texture from disk to memory on any thread you like, using any tool you wish for reading the files.
However, when you bind it to OpenGL, it's going to need to be handled on the same thread as the rendering for that OpenGL context. That being said, this discussion suggests that using a PBO in a second thread is an option, and can speed up the process.
You can certainly load the texture from disk into RAM in any number of threads you like, but OpenGL won't upload to VRAM in multiple threads for the reason mentioned in Reed's answer.
Given the loading from disk is the slowest part, thats the bit you'll probably want to thread. The loading thread(s) build up a queue of textures to be uploaded, then this queue is consumed by the thread that owns the GL context (mind your access to that queue by the various threads however). You could also consider a non-threaded approach of uploading N textures per frame, where N is a number that doesn't slow the rendering down too much.