When is glFlush called too often? - c++

I have an application that issues about 100 drawcalls per frame, each with an individual VBO. The VBOs are uploaded via glBufferData in a separate thread has has gl context resource sharing. The render thread tests buffer upload state via glClientWaitSync.
Now my question:
According to the documentation glClientWaitSync and GL_SYNC_FLUSH_COMMANDS_BIT cause a flush at every call, right? This would mean that for every not yet finished glBufferData in the upload thread I would have dozens of flushes in the render thread right? What impact on performance would it have if thus, in the worst case, I practically issue a flush before every drawcall?

The behavior of GL_SYNC_FLUSH_COMMANDS_BIT has changed from its original specification.
In the original, the use of that bit was the equivalent of issuing a glFlush before doing the wait.
However, GL 4.5 changed the wording. Now, it is the equivalent of having performed a flush immediately after you submitted that sync object. That is, instead of doing a flush relative to the current stream, it works as if you had flushed after submitting the sync. Thus, repeated use does not mean repeatedly flushing.
You can get the equivalent behavior of course by manually issuing a flush after you submit the sync object, then not using GL_SYNC_FLUSH_COMMANDS_BIT when you wait for it.

Related

OpenGL client—server model, synchronization and 'deferred' loading in OpenGL 3.3

I heard of PBOs in OpenGL, they are pretty neat in texture loading / uploading. Using PBO involves using synchronisation fences, like they are used in this PBO tutorial. I've tried this technique and it's a great replacement of glTexImage and glGetTexImage in case of big images. Now I want to apply the same approach to other loading / uploading routines (and possibly some other).
If I understand OpenGL client—server model correctly, it works as follows:
(Italics are the things I am not sure of)
Client (my program) 'asks' OGL context to place new commands into OGL command queue. It is done by simply calling the gl* functions in the order that client wants them to execute;
If command flushing is enabled (it is by default), commands are immidiately flushed to the GPU (e. g. via PCI). Otherwise they are placed in some buffer and flushed afterwards when needed (call to glFlush does this);
GPU (server) receives commands from OGL context and executes them in desired order and changes object / context state;
GPU sends back a reply that context (client) asked for;
Context replies to client with the data received from server.
Fences may be used to indicate whether GPU is done executing previous commands.
(2) implies that command execution is not necessarily done instantly. One can, for example, block command flushing by calling glWaitSync, place new commands into queue and then call glFlush. Commands will be flushed to GPU and executed asynchronously (independently from client). When GPU is busy executing given commands, CPU can focus on doing other stuff (e. g. sending info to a remote TCP server, or receiving input from user, or pretty much anything else). When CPU needs to perform something with OGL context again, it can wait until GPU is done with previous job by calling glClientWaitSync and place new commands in the queue, and the cycle repeats.
Based on all of the above, in case of that PBO tutorial, OGL context receives data from the program, buffers it and then sends it to GPU. Sending large amounts of data takes time, hence fence is used to know when sending is complete.
However, Khronos wiki says that only rendering functions are asynchronous. I understand it: rendering also takes time. But then why does the PBO example above work? And it's not like the image upload to GPU is instant, fence is not signaled instantly. Surely, the time it takes to finish uploading depends on how big is the image (I tested it with different image sizes).
Another example: I send a source code for a shader with glShaderSource and then do glCompileShader. Then I instantly check the GL_COMPILE_STATUS with glGetShaderiv. If the shader is not yet compiled when I do glGetShaderiv (it simply did not have enough time to compile), is it possible that GL_COMPILE_STATUS will state that shader is not compiled? Or is it guaranteed that the GL_COMPILE_STATUS is only returned after the compilation? Or is the compilation performed on CPU and does not need to communicate with GPU? (i. e. compilation does not place any commands in GPU queue). It has never really happened to me that shader compilation failed due to time limits, it failed only because of bad shader code.
The questions are:
Is my understanding of OGL client—server model correct or does it need some adjustments?
If not all functions can be executed asynchronously, what are those functions exactly?
Why does wiki say that only render actions may be performed asynchronously?
If command execution may indeed be 'deferred' (not really the right word for it...) with glWaitSync for example, how can I upload and compile shaders the same way images are uploaded in PBO example? Or how can I perform VBO / EBO upload the same way? Or UBO? Or TBO? Non-buffer objects? Is it just uploading and then waiting for fence to be signaled?
In case it matters, I use OGL with latest GLFW github release, latest GLAD (configuration) and C++ (MinGWx64 11.2.0).
UPD: I found this answer that touches the subject of my question. However, I must specify that my question is not about where and how OGL functions are executed, it's about how to control the flow of them, i. e. control when they are executed, to perform asynchronous work of GPU and CPU if it is even possible (it seems to be, if I understand wiki page right).

Which to use for OpenGL client side waiting: glGetSynciv vs. glClientWaitSync?

I am unclear from the OpenGL Specification on Sync objects, whether to use glGetSynciv or glClientWaitSync in case I want to check for signalling of a sync object without waiting. How do the following two commands compare in terms of behavior and performance:
GLint syncStatus;
glGetSynciv(*sync, GL_SYNC_STATUS, sizeof(GLint), NULL, &syncStatus);
bool finished = syncStatus == GL_SIGNALED;
vs
bool finished = glClientWaitSync(*sync, 0 /*flags*/, 0 /*timeout*/) == ALREADY_SIGNALED;
Some details to the questions:
Does glGetSynciv perform a roundtrip to the GL server?
Is any method preferred in terms of driver support / bugs?
Could either method deadlock or not return immediately?
Some context:
This is for a video player, which is streaming images from a physical source to the GPU for rendering.
One thread is streaming / continuously uploading textures and another thread renders them once they are finished uploading. Each render frame we are checking if the next texture has finished uploading. If it has, then we start rendering this new texture, otherwise continue to using the old texture.
The decision is client side only and I do not want to wait at all, but quickly continue to render the correct texture.
Both methods have examples of people using them for the purpose of not waiting, but none seem to discuss the merits of using one or the other.
Quoting the Red Book,
void glGetSynciv(GLsync sync, GLenum pname, GLsizei bufSize, GLsizei *lenght, GLint *values);
Retrieves the properties of a sync object. sync specifies a handle to the sync object from wich to read the property specified by pname. bufSize is the size in bytes of the buffer whose address is given in values. lenght is the address of an integer variable that will receive the number of bytes written into values
While for glClientWaitSync:
GLenum glClientWaitSync(GLsync sync, GLbitfields flags, GLuint64 timeout);
Causes the client to wait for the sync object to become signaled.
glClientWaitSync() will wait at most timeout nanoseconds for the object to become signaled before generating a timeout. The flags parameter may be used to control flushing behavior of the command. Specifying GL_SYNC_FLUSH_COMMANDS_BIT is equivalent to calling glFlush() before executing wait.
So, basically glGetSynciv() is used to know if the fence object has become signaled and glClientWaitSync() is used to wait until the fence object has become signaled.
If you only want to know if a fence object has become signaled, I would suggest using glGetSynciv().
Obviously glClientWaitSync() should take longer to execute then glGetSynciv(), but I'm guessing.
Hope i helped you.

OpenGL, measuring rendering time on gpu

I have some big performance issues here
So I would like to take some measurements on the gpu side.
By reading this thread I wrote this code around my draw functions, including the gl error check and the swapBuffers() (auto swapping is indeed disabled)
gl4.glBeginQuery(GL4.GL_TIME_ELAPSED, queryId[0]);
{
draw(gl4);
checkGlError(gl4);
glad.swapBuffers();
}
gl4.glEndQuery(GL4.GL_TIME_ELAPSED);
gl4.glGetQueryObjectiv(queryId[0], GL4.GL_QUERY_RESULT, frameGpuTime, 0);
And since OpenGL rendering commands are supposed to be asynchronous ( the driver can buffer up to X commands before sending them all together in one batch), my question regards essentially if:
the code above is correct
I am right assuming that at the begin of a new frame all the previous GL commands (from the previous frame) have been sent, executed and terminated on the gpu
I am right assuming that when I get query result with glGetQueryObjectiv and GL_QUERY_RESULT all the GL commands so far have been terminated? That is OpenGL will wait until the result become available (from the thread)?
Yes, when you query the timer it will block until the data is available, ie until the GPU is finished with everything that happened between beginning and ending the query. To avoid synchronising with the GPU, you can use GL_QUERY_RESULT_AVAILABLE to check if the results are already available and only then read them then. That might require less straightforward code to keep tabs on open queries and periodically checking them, but it will have the least performance impact. Waiting for the value every time is a sure way to kill your performance.
Edit: To address your second question, swapping the buffer doesn't necessarily mean it will block until the operation succeeds. You may see that behaviour, but it's just as likely that it is just an implicit glFlush and the command buffer is not empty yet. Which is also the more wanted behaviour because ideally you want to start with your next frame right away and keep the CPUs command buffer filled. Check the implementations documentation for more info though, as that is implementation defined.
Edit 2: Checking for errors might end up being an implicit synchronization by the way, so you will probably see the command buffer emptying when you wait for error checking in the command stream.

Windows Audio / WaveInAddBuffer() blocks

My application records audio samples from a microphone connected to my PC. So I chose the Windows WaveInXXX API to do the job.
After reading the documentation I decided to avoid using the callback mechanism with WaveInProc to save me the hassle synchronizing the threads. The whole application is pretty big and I thought this would make debugging simpler. When the application requests a block of samples, I just iterate over my buffer queue, take one out, copy the data, unprepare it, prepare it and add it back to the buffer queue. Basic program structure looks like this, I hope it makes the basic program flow clear:
WaveInOpen()
WaveInStart()
FunctionAddingPreparedBuffersToTheQueue()
while(someConditionThatEventuallyBecomesFalse)
if(NextBufferInQueueIsMarkedDone)
GetDataFromBuffer()
UnpreparePrepareHeaderAndAddBuffer()
else
WaitForAShortTime()
WaveInStop()
WaveInClose()
Now the problem appears: After some time (and I am unable to reproduce the exact condition), WaveInAddBuffer() causes a deadlock although it's in the same thread as all the rest. The header for the buffer that shall be added when the deadlock happens is prepared and dwFlags == WHDR_PREPARED == 2.
Any ideas what could cause this deadlock?
I have not seen such a problem, but a guess might be something like fragmentation related to all the unprepare/prepare cycles. They are not necessary. You can do the prepare once for each buffer and then unprepare when finished recording. (Prepare locks the buffer into physical memory.)

Using fence sync objects in OpenGL

I am trying to look for scenarios where Sync Objects can be used in OpenGL. My understanding is that a sync object once put in GL command stream ( using glFenceSync() ) will be signaled after all the GL commands are executed and realized.
If the sync objects are synchronization primitives why can't we MANUALLY signal them ? Where exactly this functionality can help GL programmer ?
Is the following scenario a correct one ?
Thread 1 :
Load model
Draw()
glFenceSync()
Thread 2 :
glWaitSync();
ReadPixels
Use data for subsequent operation.
Does this mean that I can't launch thread 2 unless glFenceSync() is called in Thread 1 ?
Fences are not so much meant to synchronize threads, but to know, when asynchronus operations are finished. For example if you do a glReadPixels into a pixel buffer object (PBO), you might want to know, that the read has been completed, before you even attempt to read from or map the PBO to client address space.
If you do a glReadPixels with a PBO being the target, the call will return immediately, but the data transfer may indeed take some time. That's where fences come in handy.