So when you call opengl functions, like glDraw or gLBufferData, does it cause the thread of the program to stop and wait for GL to finish the calls?
If not, then how does GL handle calling important functions like glDraw, and then immediately afterwards having a setting changed that affects the draw calls?
No, they (mostly) do not. The majority of GL functions are buffered when used and actually executed later. This means that you cannot think of the CPU and the GPU as two processors working together at the same time. Usually, the CPU executes a bunch of GL functions that get buffered and, as soon as they are delivered to the GPU, this one executes them. This means that you cannot reliably control how much time it took for a specific GL function to execute by just comparing the time before and after it's execution.
If you want to do that, you need to first run a glFinish() so it will actually wait for all previously buffered GL calls to execute, and then you can start counting, execute the calls that you want to benchmark, call glFinish again to make sure these calls executed as well, and then finish the benchmark.
On the other hand, I said "mostly". This is because reading functions will actually NEED to synchronize with the GPU to show real results and so, in this case, they DO wait and freeze the main thread.
edit: I think the explanation itself answers the question you asked second, but just in case: the fact that all calls are buffered make it possible for a draw to complete first, and then change a setting afterwards for succesive calls
It strictly depends on the OpenGL call in question and the OpenGL state. When you make OpenGL calls, the implementation first queues them up internally and then executes them asynchronously to the calling program's execution. One important concept of OpenGL are synchronization points. Those are operations in the work queue that require the OpenGL call to block until certain conditions are met.
OpenGL objects (textures, buffer objects, etc.) are purely abstract and by specification the handle of an object in the client program always to the data, the object has at calling time of OpenGL functions that refer to this object. So take for example this sequence:
glBindTexture(GL_TEXTURE_2D, texID);
glTexImage2D(..., image_1);
draw_textured_quad();
glTexImage2D(..., image_2);
draw_textured_quad();
The first draw_textured_quad may return even long before anything has been drawn. However by making the calls OpenGL creates an internal reference to the data currently hold by the texture. So when glTexImage2D is called a second time, which may happen before the first quad was drawn, OpenGL must internally create a secondary texture object that's to become texture texID and to be used by the second calls of draw_textured_quad. If glTexSubImage2D was called, it would even have to make a modified copy of it.
OpenGL calls will only block, if the result of the call modifies client side memory and depends of data generated by previous OpenGL calls. In other words, when doing OpenGL calls, the OpenGL implementation internally generates a dependency tree to keep track of what depends on what. And when a synchronization point must block it will at least block until all dependencies are met.
Related
I'm reading this document
and I have a question about this sentence:
While OpenGL explicitly requires that commands are completed in order,
that does not mean that two (or more) commands cannot be concurrently
executing. As such, it is possible for shader invocations from one
command to be exeucting in tandem with shader invocations from other
commands.
Does this mean that, for example, when I issue two consecutive glDrawArrays calls it is possible that the second call is processed immediately before the first one has finished?
My first idea was that the OpenGL calls merely map to internal commands of the gpu and that the OpenGL call returns immediately without those commands completed, thus enabling the second OpenGL call to issue its own internal commands. The internal commands created by the OpenGL calls can then be parallelized.
What is says is, that the exact order in which the commands are executed and any concurrency is left to the judgement of the implementation with the only constraint being that the final result must look exactly as if all the commands would have been executed one after another in the very order they were called by the client program.
EDIT: Certain OpenGL calls cause an implicit or explicit synchronization. Reading back pixels for example or waiting for a synchronization event.
I did google for the question, and got from this link
clEnqueueAcquireGLObjects
Acquire OpenCL memory objects that have been created from OpenGL objects.
These objects need to be acquired before they can be used by any OpenCL commands queued to a command-queue.
I really don't understand why these objects need to be acquired. In my opinion, the reason of the acquiring is NOT OpenGL/OpenCL synchronization because the synchronization can be achieved by glFinish and clFinish.
I mean, if clEnqueueAcquireGLObjects/clEnqueueReleaseGLObjects are used, then glFinish/clFinish are redundant, and vice-versa.
I mean, if clEnqueueAcquireGLObjects/clEnqueueReleaseGLObjects are used, then glFinish/clFinish are redundant, and vice-versa.
You're thinking about this in entirely the wrong way.
glFinish causes OpenGL to perform a full CPU synchronization, such that the implementation will have completed all commands afterwards. clFinish does something similar for OpenCL.
The fact that you called one or the other has absolutely no effect on what a different system does. OpenGL has no idea that OpenCL exists, and vice-versa. glFinish has nothing to do with clFinish and vice-versa. So while OpenGL may have finished making some modification to an object, OpenCL has no idea that these modifications took place.
The purpose of acquiring and releasing OpenGL objects is for OpenCL and OpenGL to talk to one another. When objects are acquired, OpenCL tells OpenGL, "Hey, see these objects? They're mine now, so give them to me." This means that the OpenGL/OpenCL driver will do whatever mechanics are necessary to transfer access control over those objects to OpenCL.
For example, if an object has been paged out of GPU memory, OpenCL acquiring it may need to make it resident again. OpenCL and OpenGL have two separate sets of records that refer to this memory; by acquiring the object, you synchronize the OpenCL data with changes made by OpenGL. And so forth.
Notice that these mechanics have nothing at all to do with synchronizing GPU operations. They are about making the objects accessible to OpenCL.
If your OpenCL implementation doesn't have cl_khr_gl_event, then you must use OpenGL's synchronization mechanism to ensure that those objects are no longer in use before you acquire them. The two functions aren't redundant; they're doing different things to ensure the integrity of the system.
If I call glDrawElements with the draw target being the back buffer, and then I call glReadPixels, is it guaranteed that I will read what was drew?
In other word, is glDrawElements a blocking call?
Note: I am observing an weird issue here that may be caused by glDrawElements not being blocking...
In other word, is glDrawElements a blocking call?
That's not how OpenGL works.
The OpenGL memory model is built on the "as if" rule. Certain exceptions aside, everything in OpenGL will function as if all of the commands you issued have already completed. In effect, everything will work as if every command blocked until it completed.
However, this does not mean that the OpenGL implementation actually works this way. It just has to do everything to make it appear to work that way.
Therefore, glDrawElements is generally not a blocking call; however, glReadPixels (when reading to client memory) is a blocking call. Because the results of a pixel transfer directly to client memory must be available when glReadPixels has returned, the implementation must check to see if there are any outstanding rendering commands going to the framebuffer being read from. If there are, then it must block until those rendering commands have completed. Then it can execute the read and store the data in your client memory.
If you were reading to a buffer object, there would be no need for glReadPixels to block. No memory accessible to the client will be modified by the function, since you're reading into a buffer object. So the driver can issue the readback asynchronously. However, if you issue some command that depends on the contents of this buffer (like mapping it for reading or using glGetBufferSubData), then the OpenGL implementation must stall until the reading operation is done.
In short, OpenGL tries to delay blocking for as long as possible. Your job, to ensure performance, is to help OpenGL to do so by not forcing an implicit synchronization unless absolutely necessary. Sync objects can help with this.
I want to do parallel rendering with 2 GPUs. So a readback from GPU1 and then drawing pixels to GPU2 are needed.
I created two windows in each screen with its own GPU connected. And there are two threads associated to each window.
However, the readpixel+drawpixel is a bottleneck. So a async PBO method is considered: 2 PBOs for reading back and 2 PBOs for drawing back in alternative way.
My question is:
Could Pointer returned from glMapBufferARB be used in another thread and different GPU?
If not, I must copy data to main memory and copy it to another GPU, the bottleneck will be CPU->GPU copying. Is there any better idea?
Yes, pointer form glMapBuffer can be used by any thread - even without GL context. Just remember to synchronize threads and don't call glUnmapBuffer before thread finishes its job with pointer.h
I was having previously already the problem that I wanted to blend color values in an image unit by doing something like:
vec4 texelCol = imageLoad(myImage, myTexel);
imageStore(myImage, myTexel, texelCol+newCol);
In a scenario where multiple fragments can have the same value for 'myTexel', this aparently isn't possible because one can't create atomicity between the imageLoad and imageStore commands and other shaderinvocations could change the texel color in between.
Now someone told me that poeple are working arround this problem by creating semaphores using the atomic comands on uint textures, such that the shader would wait somehow in a while loop before accessing the texel and as soon as it is free, atomically write itno the integer texture to block other fragment shader invocations, process the color texel and when finished atomically free the integer texel again.
But I can't get my brains arround how this could really work and how such code would look like?
Is it really possible to do this? can a GLSL fragment shader be set to wait in a while loop? If it's possible, can someone give an example?
Basically, you're just implementing a spinlock. Only instead of one lock variable, you have an entire texture's worth of locks.
Logically, what you're doing makes sense. But as far as OpenGL is concerned, this won't actually work.
See, the OpenGL shader execution model states that invocations execute in an order which is largely undefined relative to one another. But spinlocks only work if there is a guarantee of forward progress among the various threads. Basically, spinlocks require that the thread which is spinning not be able to starve the execution system from executing the thread that it is waiting on.
OpenGL provides no such guarantee. Which means that it is entirely possible for one thread to lock a pixel, then stop executing (for whatever reason), while another thread comes along and blocks on that pixel. The blocked thread never stops executing, and the thread that owns the lock never restarts execution.
How might this happen in a real system? Well, let's say you have a fragment shader invocation group executing on some fragments from a triangle. They all lock their pixels. But then they diverge in execution due to a conditional branch within the locking region. Divergence of execution can mean that some of those invocations get transferred to a different execution unit. If there are none available at the moment, then they effectively pause until one becomes available.
Now, let's say that some other fragment shader invocation group comes along and gets assigned an execution unit before the divergent group. If that group tries to spinlock on pixels from the divergent group, it is essentially starving the divergent group of execution time, waiting on an even that will never happen.
Now obviously, in real GPUs there is more than one execution unit, but you can imagine that with lots of invocation groups out there, it is entirely possible for such a scenario to occasionally jam up the works.