Using fence sync objects in OpenGL

Using fence sync objects in OpenGL - opengl

I am trying to look for scenarios where Sync Objects can be used in OpenGL. My understanding is that a sync object once put in GL command stream ( using glFenceSync() ) will be signaled after all the GL commands are executed and realized.
If the sync objects are synchronization primitives why can't we MANUALLY signal them ? Where exactly this functionality can help GL programmer ?
Is the following scenario a correct one ?
Thread 1 :
Load model
Draw()
glFenceSync()
Thread 2 :
glWaitSync();
ReadPixels
Use data for subsequent operation.
Does this mean that I can't launch thread 2 unless glFenceSync() is called in Thread 1 ?

Fences are not so much meant to synchronize threads, but to know, when asynchronus operations are finished. For example if you do a glReadPixels into a pixel buffer object (PBO), you might want to know, that the read has been completed, before you even attempt to read from or map the PBO to client address space.
If you do a glReadPixels with a PBO being the target, the call will return immediately, but the data transfer may indeed take some time. That's where fences come in handy.

Related

OpenGL client—server model, synchronization and 'deferred' loading in OpenGL 3.3

I heard of PBOs in OpenGL, they are pretty neat in texture loading / uploading. Using PBO involves using synchronisation fences, like they are used in this PBO tutorial. I've tried this technique and it's a great replacement of glTexImage and glGetTexImage in case of big images. Now I want to apply the same approach to other loading / uploading routines (and possibly some other).
If I understand OpenGL client—server model correctly, it works as follows:
(Italics are the things I am not sure of)
Client (my program) 'asks' OGL context to place new commands into OGL command queue. It is done by simply calling the gl* functions in the order that client wants them to execute;
If command flushing is enabled (it is by default), commands are immidiately flushed to the GPU (e. g. via PCI). Otherwise they are placed in some buffer and flushed afterwards when needed (call to glFlush does this);
GPU (server) receives commands from OGL context and executes them in desired order and changes object / context state;
GPU sends back a reply that context (client) asked for;
Context replies to client with the data received from server.
Fences may be used to indicate whether GPU is done executing previous commands.
(2) implies that command execution is not necessarily done instantly. One can, for example, block command flushing by calling glWaitSync, place new commands into queue and then call glFlush. Commands will be flushed to GPU and executed asynchronously (independently from client). When GPU is busy executing given commands, CPU can focus on doing other stuff (e. g. sending info to a remote TCP server, or receiving input from user, or pretty much anything else). When CPU needs to perform something with OGL context again, it can wait until GPU is done with previous job by calling glClientWaitSync and place new commands in the queue, and the cycle repeats.
Based on all of the above, in case of that PBO tutorial, OGL context receives data from the program, buffers it and then sends it to GPU. Sending large amounts of data takes time, hence fence is used to know when sending is complete.
However, Khronos wiki says that only rendering functions are asynchronous. I understand it: rendering also takes time. But then why does the PBO example above work? And it's not like the image upload to GPU is instant, fence is not signaled instantly. Surely, the time it takes to finish uploading depends on how big is the image (I tested it with different image sizes).
Another example: I send a source code for a shader with glShaderSource and then do glCompileShader. Then I instantly check the GL_COMPILE_STATUS with glGetShaderiv. If the shader is not yet compiled when I do glGetShaderiv (it simply did not have enough time to compile), is it possible that GL_COMPILE_STATUS will state that shader is not compiled? Or is it guaranteed that the GL_COMPILE_STATUS is only returned after the compilation? Or is the compilation performed on CPU and does not need to communicate with GPU? (i. e. compilation does not place any commands in GPU queue). It has never really happened to me that shader compilation failed due to time limits, it failed only because of bad shader code.
The questions are:
Is my understanding of OGL client—server model correct or does it need some adjustments?
If not all functions can be executed asynchronously, what are those functions exactly?
Why does wiki say that only render actions may be performed asynchronously?
If command execution may indeed be 'deferred' (not really the right word for it...) with glWaitSync for example, how can I upload and compile shaders the same way images are uploaded in PBO example? Or how can I perform VBO / EBO upload the same way? Or UBO? Or TBO? Non-buffer objects? Is it just uploading and then waiting for fence to be signaled?
In case it matters, I use OGL with latest GLFW github release, latest GLAD (configuration) and C++ (MinGWx64 11.2.0).
UPD: I found this answer that touches the subject of my question. However, I must specify that my question is not about where and how OGL functions are executed, it's about how to control the flow of them, i. e. control when they are executed, to perform asynchronous work of GPU and CPU if it is even possible (it seems to be, if I understand wiki page right).

When is glFlush called too often?

I have an application that issues about 100 drawcalls per frame, each with an individual VBO. The VBOs are uploaded via glBufferData in a separate thread has has gl context resource sharing. The render thread tests buffer upload state via glClientWaitSync.
Now my question:
According to the documentation glClientWaitSync and GL_SYNC_FLUSH_COMMANDS_BIT cause a flush at every call, right? This would mean that for every not yet finished glBufferData in the upload thread I would have dozens of flushes in the render thread right? What impact on performance would it have if thus, in the worst case, I practically issue a flush before every drawcall?

The behavior of GL_SYNC_FLUSH_COMMANDS_BIT has changed from its original specification.
In the original, the use of that bit was the equivalent of issuing a glFlush before doing the wait.
However, GL 4.5 changed the wording. Now, it is the equivalent of having performed a flush immediately after you submitted that sync object. That is, instead of doing a flush relative to the current stream, it works as if you had flushed after submitting the sync. Thus, repeated use does not mean repeatedly flushing.
You can get the equivalent behavior of course by manually issuing a flush after you submit the sync object, then not using GL_SYNC_FLUSH_COMMANDS_BIT when you wait for it.

Which to use for OpenGL client side waiting: glGetSynciv vs. glClientWaitSync?

I am unclear from the OpenGL Specification on Sync objects, whether to use glGetSynciv or glClientWaitSync in case I want to check for signalling of a sync object without waiting. How do the following two commands compare in terms of behavior and performance:
GLint syncStatus;
glGetSynciv(*sync, GL_SYNC_STATUS, sizeof(GLint), NULL, &syncStatus);
bool finished = syncStatus == GL_SIGNALED;
vs
bool finished = glClientWaitSync(*sync, 0 /*flags*/, 0 /*timeout*/) == ALREADY_SIGNALED;
Some details to the questions:
Does glGetSynciv perform a roundtrip to the GL server?
Is any method preferred in terms of driver support / bugs?
Could either method deadlock or not return immediately?
Some context:
This is for a video player, which is streaming images from a physical source to the GPU for rendering.
One thread is streaming / continuously uploading textures and another thread renders them once they are finished uploading. Each render frame we are checking if the next texture has finished uploading. If it has, then we start rendering this new texture, otherwise continue to using the old texture.
The decision is client side only and I do not want to wait at all, but quickly continue to render the correct texture.
Both methods have examples of people using them for the purpose of not waiting, but none seem to discuss the merits of using one or the other.

Quoting the Red Book,
void glGetSynciv(GLsync sync, GLenum pname, GLsizei bufSize, GLsizei *lenght, GLint *values);
Retrieves the properties of a sync object. sync specifies a handle to the sync object from wich to read the property specified by pname. bufSize is the size in bytes of the buffer whose address is given in values. lenght is the address of an integer variable that will receive the number of bytes written into values
While for glClientWaitSync:
GLenum glClientWaitSync(GLsync sync, GLbitfields flags, GLuint64 timeout);
Causes the client to wait for the sync object to become signaled.
glClientWaitSync() will wait at most timeout nanoseconds for the object to become signaled before generating a timeout. The flags parameter may be used to control flushing behavior of the command. Specifying GL_SYNC_FLUSH_COMMANDS_BIT is equivalent to calling glFlush() before executing wait.
So, basically glGetSynciv() is used to know if the fence object has become signaled and glClientWaitSync() is used to wait until the fence object has become signaled.
If you only want to know if a fence object has become signaled, I would suggest using glGetSynciv().
Obviously glClientWaitSync() should take longer to execute then glGetSynciv(), but I'm guessing.
Hope i helped you.

Multithreaded Rendering on OpenGL

I have a multithreaded application, in which I'm trying to render with different threads. First I tried to use the same Rendering Context between all threads, but I was getting NULL current contexts for other threads. I've read on the internet that one context can only be current at one thread at a time.
So I decided to make something different. I create a window, I get the HDC from it and create the first RC. AFter that, I share this HDC between threads, and in every new thread I create I obtain a new RC from the same HDC and I make it current for that thread. Everytime I do it, the RC returned is always different (usually the previous value + 1). I make an assertion to check if wglGetCurrentContext() returns a RC, and it looks like it returns the one that was just created. But after making the rendering, i get no rendering and if I call GetLastError() I obtain error 6 (invalid handle??)
So, does this mean that, despite every new call of wglCreateContext() gives me a new value, somehow it means that all these different values are the same "Connection channel" to the OpenGL calls?
Does this mean that I will always have to invalid the previous Rendering Context on a thread, and activate it on the new one? I really have to make this sync all the time or is there any other way to work arround this problem?

I have a multithreaded application, in which I'm trying to render with different threads.
DON'T!!!
You will gain nothing from trying to multithread your renderer. Basically you're running into one large race condition and the driver will just be busy synchronizing the threads to somehow make sense of it.
To gain best rendering performance keep all OpenGL operations to only one thread. All parallelization happens for free on the GPU.

I suggest to read the following wiki article from the OpenGL Consortium.
In simple words, it depends a lot on what you mean for multi threading in regards to OpenGl, if you have one thread doing the rendering part and one (or more) doing other jobs (i.e. AI, Physics, game logic etc) it is a perfectly right.
If you wish to have multiple threads messing up with OpenGL, you cannot, or better, you could but it will really give you more troubles than advantages.
Try to read the following FAQ on parallel OpenGL usage to have a better idea on this concept:
http://www.equalizergraphics.com/documentation/parallelOpenGLFAQ.html

In some cases it may make sense to use multiple rendering contexts in different threads. I have used such a design to load image data from filesystem and push this data into a texture.

OpenGL on Mac OS X is single-thread safe; to enable it:
#include <OpenGL/OpenGL.h>
CGLError err = 0;
CGLContextObj ctx = CGLGetCurrentContext();
// Enable the multi-threading
err = CGLEnable( ctx, kCGLCEMPEngine);
if (err != kCGLNoError ) {
// Multi-threaded execution is possibly not available
// Insert your code to take appropriate action
}
See:
Concurrency and OpenGL - Apple Developer
And:
Technical Q&A QA1612: OpenGL ES multithreading and ...

https://www.imaginationtech.com/blog/understanding-opengl-es-multi-thread-multi-window-rendering/
When shouldn’t I use multi-threaded rendering?
When you’re not CPU limited or load times are not a concern.
So, if you are CPU limited, separate other threads to do CPU limited job, such as codec, texture upload, calculate...

multiple processing in WinCE6.0 or DMA implementation

in my application i want to do the task parallely like one thread will do the calculation, and other will draw the data on screen, but while drawing the data processor is gettign engaged and during that time it is not able to process the data of diffrent thread. i runnig both thread on above normal priorty. Is there any way in whch i can do the drawing parallely, so that measurment thread can do the calculation at that speed wthout getting affected by drawing thread. i heared from some one DMA can solve the problem, but how to imlement it in WINCE6.0 platform i have no idea.
Pls provide any pointer
Mukesh

No idea how DMA would "solve" this issue - you're using a single processor core, which can only execute one set of instructions at a time. DMA won't change that.
The problem you're having sounds like you're using the processor at just about full capacity and so you're not seeing much time sharing between your threads. There are generally 2 ways to approach this.
1) adjust the priority of your more important thread to have it get more time from the scheduler to do its work.
or
2) adjust the thread quantum for your threads to force the scheduler to swap between threads more frequently.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js