I came across the concept of Sync Objects, and decided to test them out. They seem to work as expected, but my current test cases are limited.
What would be a proper test to ensure that these sync objects are performing as intended as a means to synchronize the CPU rendering thread with the GPU?
An example use-case for this would be for video capture programs which "hook" into the OpenGL context of a video game, or some other application using OpenGL.
Your example use-case seems fishy to me.
FRAPS is an example of a program that "hooks" into an OpenGL application to capture video, and it does it very differently. Rather than force a CPU-GPU synchronization, FRAPS inserts an asynchronous pixelbuffer read immediately before SwapBuffers (...) is called. It will then try and read the results back the next time SwapBuffers (...) is called instead of stalling while the result becomes available the first time around. Latency does not matter for FRAPS.
However, even without the async PBO read, there would be no reason for FRAPS to use a sync object. glReadPixels (...) and commands like it will implicitly wait for all pending commands to finish before reading the results and returning control to the CPU. It would really hurt performance, but GL would automatically do the synchronization.
The simplest use-case for sync objects is two or more render contexts running simultaneously.
In OpenGL you can share certain resources (including sync objects) across contexts, but the command stream for each context is completely separate and no synchronization of any sort is enforced. Thus, if you were to upload data to a vertex buffer in one context and use it in another, you would insert a fence sync in the producer (upload context) and wait for it to be signaled in the consumer (draw context). This will ensure that the draw command does not occur until the upload is finished - if the commands were all issued from the same context, GL would actually guarantee this without the use of a sync object.
The example I just gave does not require CPU-GPU synchronization (only GPU-GPU), but you can use glClientWaitSync (...) to block your calling thread until the upload is finished if you had a situation where CPU-GPU made sense.
Here is some pseudo-code to evaluate the effectiveness of a sync object:
Thread 1:
glBindBuffer (GL_ARRAY_BUFFER, vbo);
glBufferSubData (GL_ARRAY_BUFFER, 0, 4096*4096, foo); // Upload a 16 MiB buffer
GLsync ready =
glFenceSync (GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
Thread 0:
glBindBuffer (GL_ARRAY_BUFFER, vbo);
// Try with and without synchronization
if (sync) {
// Wait up to 1 second for the upload to finish
glClientWaitSync (ready, GL_SYNC_FLUSH_COMMANDS_BIT, 1000000000UL);
}
// Ordinarily mapping a buffer would wait for everything else to finish,
// we need to eliminate that behavior (GL_MAP_UNSYNCHRONIZED_BIT) for this test.
void* bar =
glMapBufferRange (GL_ARRAY_BUFFER, 0, 4096*4096, GL_MAP_UNSYNCHRONIZED_BIT);
// When `sync` is true and the sync object is working, bar should be identical to foo
Related
In a Round Robin fashion, you usually have a few buffers and you cycle between these buffers, how you manage GLFW callbacks in this situation?
Let's suppose that you have 3 buffers, you send draw commands with a specified viewport in the first one, but when the cpu is processing the second one, it gets a callback of a window resize for example, the server may be rendering whatever you sent with the previous viewport size yet, causing some "artifacts", and this is just a example, but it will happen for literally everything right? A easy fix would be to process the callbacks(the last ones received) just after rendering the last buffer, and block the client until the server processed all the commands, is that correct(what would imply a frame delay per buffer)? Is there something else that could be done?
OpenGL's internal state machine takes care of all of that. All OpenGL commands are queued up in a command queue and executed in order. A call to glViewport – and any other OpenGL command for that matter – affects only the outcome of the commands that follow it, and nothing that comes before.
There's no need to implement custom round robin buffering.
This even covers things like textures and buffer objects (with the notable exceptions of buffer objects that are persistent mapped). I.e. if you do the following sequence of operations
glDrawElements(…); // (1)
glTexSubImage2D(GL_TEXTURE_2D, …);
glDrawElements(…); // (2)
The OpenGL rendering model mandates that glDrawElements (1) uses the texture data of the bound texture object as it was before the call to glTexSubImage2D and that glDrawElements (2) must use the data that has been uploaded between (1) and (2).
Yes, this involves tracking the contents, implicit data copies and a lot of other unpleasant things. Yes, this also likely implies that you're hitting a slow path.
I'm writing a DirectX application with two threads:
Producer thread grabs desktop frames with DirectX (as in the Desktop Duplication DirectX sample)
IDXGIResource* DesktopResource = nullptr;
ID3D11Texture2D *m_AcquiredDesktopImage = nullptr;
HRESULT hr = m_DeskDupl->AcquireNextFrame(500, &FrameInfo, &DesktopResource);
hr = DesktopResource->QueryInterface(__uuidof(ID3D11Texture2D), reinterpret_cast<void **>(&m_AcquiredDesktopImage));
DesktopResource->Release();
// The texture pointer I'm interested is m_AcquiredDesktopImage
Consumer thread performs image processing operations on the GPU.
To avoid copies I'd like to keep everything on the GPU as much as possible. From ReleaseFrame's documentation I kinda get that I should call ReleaseFrame on the desktop duplication interface as soon as I'm done processing the frame.
My question: should I copy the m_AcquiredDesktopImage texture into another one and call ReleaseFrame as soon as the copy is finished and return that new texture to the producer thread for processing or can I just get away with returning the m_AcquiredDesktopImage texture pointer to the consumer thread? Is this a copy of the framebuffer texture or is it the framebuffer texture and I might generate a data race by returning it?
Which one is the correct way to handle a producer of grabbed frames and a consumer of GPU textures?
...should I copy the m_AcquiredDesktopImage texture into another one and call ReleaseFrame as soon as the copy is finished and return that new texture to the producer thread for processing or...
Yes, this is the way. You got your texture, you are finished with it and you release it because the data is no longer valid after the release.
...can I just get away with returning the m_AcquiredDesktopImage texture pointer to the consumer thread? Is this a copy of the framebuffer texture or is it the framebuffer texture and I might generate a data race by returning it?
The API keeps updating this texture. You are promised that between successful return from AcquireNextFrame and your ReleaseFrame call the API does not touch the texture and you are free to use it. If you cannot complete your use between the mentioned calls (which is your case, after all you created a consumer thread to run asynchronously to capture) you copy data and ReleaseFrame. Once you released it, the API resumes the updating.
An attempt to use the texture after ReleaseFrame will result in concurrent access to the texture, your and the API's further updates.
The MSDN documentation on ReleaseFrame is a little convoluted. It specifically states you need to release the current frame before processing the next one, and that the surface state is "invalid" after release, which would indicate it is either not a copy, or not a copy that your process owns (which would yield the same effective result). It also states you should delay the call to ReleaseFrame until right before you call AcquireNextFrame for performance reasons, which can make for some interesting timing issues, especially with the threading model you're using.
I think you'd be better off making a copy (so ReleaseFrame from the previous capture, AcquireNextFrame, CopyResource). Unless you're using fences you don't have any guarantees the GPU will be consuming the resource before your producer thread has called ReleaseFrame, which could give you undefined results. And if you are using fences, and the AcquireNextFrame call is delayed until the GPU has finished consuming the previous frame's data, you'll introduce stalls and lose a lot of the benefits of the CPU being able to run ahead of the GPU.
I'm curious why you're going with this threading model, when the work is done on the GPU. I suspect it makes life a little more complicated. Although making a copy of the texture would remove a lot of those complications.
I know that multi threaded OpenGL is a delicate topic and I am not trying here to render from multiple threads. I also do not try to create multiple contexts and share objects with share lists. I have a single context and I issue draw commands and gl state changes only from the main thread.
However, I am dynamically updating parts of a VBO in every frame. I only write to the VBO, I do not need to read it on the CPU side. I use glMapBufferRange so I can compute the changed data on the fly and don't need an additional copy (which would be created by the blocking glBufferSubData).
It works and now I would like to multi thread the the data update (since it needs to update a lot of vertices at steady 90 fps) and use a persistently mapped buffer (using GL_MAP_PERSISTENT_BIT). This will require to issue glFlushMappedBufferRange whenever a worker thread finished updating parts of the mapped buffer.
Is it fine to call glFlushMappedBufferRange on a separate thread? The Ranges the different threads operate on do not overlap. Is there an overhead or implicit synchronisation involved in doing so?
No you need to call glFlushMappedBufferRange in the thread that does the openGL stuff.
To overcome this you have 2 options:
get the openGL context and make it current in the worker thread. Which means the openGL thread has to relinquish the context for it to work.
push the relevant range into a thread-safe queue and let the openGL thread pop each range from it and call glFlushMappedBufferRange.
When I call glReadPixels on another thread, it doesn't return me any data. I read somewhere suggesting that I need to create a new context in the calling thread and copy the memory over. How exactly do I do this?
This is the glReadPixels code I use:
pixels = new BYTE[ 3 * width * height];
glReadPixels(0, 0, width, height, GL_RGB, GL_UNSIGNED_BYTE, pixels);
image = FreeImage_ConvertFromRawBits(pixels, width, height, 3 * width, 24, 0xFF0000, 0x00FF00, 0x0000FF, false);
FreeImage_Save(FIF_PNG, image, pngpath.c_str() , 0);
Alternatively, I read from this thread they suggest to use another piece of code (see the end) but I dont understand what are origX, origY, srcOrigX, srcOrigY?
You can create shared contexts, and this will work as you intended. See wglShareLists (the name is chosen badly, it shares more than just lists). Or, use WGL_ARB_create_context, which directly supports sharing contexts too (you have tagged the question "windows", but similar functionality exists for non-WGL too).
However, it is much, much easier to use a pixel buffer object instead, that will have the same net effect as multithreading (the transfer will run asynchronously without blocking the render thread), and it is many times less complex.
You have different options.
You call ReadPixel pipelined with the rendering thread. In this case the returned data shall be stored on a buffer that can be enqueued to a thread that is dedcated for saving pictures. This can be done easily with a buffer queue, a mutex and a semaphore: rendering thread get data using ReadPixel, lock the mutex, enqueue (system memory) pixel buffer, unlock the mutex, increase the semaphore; the worker thread (locked on the semaphore) will be signaled by the rendering thread, lock the mutex, dequeue the pixel buffer, unlock the mutex and save the image.
Otherwise, you can copy the current framebuffer on a texture or a pixel buffer object. In this case you must have two different threads, having an OpenGL context current (via MakeCurrent) each, sharing their object space with each other (as suggested by user771921). When the first rendering thread calls ReadPixels (or CopyPixels), notifies the second thread about the operation (using a semaphore for example); the second rendering thread will map the pixel buffer object (or get the texture data).
This method has the advantage to allow the driver to pipeline the first thread read operation, but it actually doubles the memory copy operations by introducing an additional support buffer. Moreover the ReadPixel operation is flushed when the second thread maps the buffer, which is executed (most probably) just after the second thread is signaled.
I would suggest the first option, since it is the much cleaner and simple. The second one is overcomplicated, and I doubt you can get advantages from using it: the image saving operation is a lot slower than ReadPixel.
Even if the ReadPixel is not pipelined, does your FPS really slow down? Don't optimize before you can profile.
The example you have linked uses GDI functions, which are not OpenGL related. I think the code would causea repaint form event and then capture the window client area contents. It seems to much slower compared with ReadPixel, even if I haven't actually executed any profiling on this issue.
Well, using opengl in a multi threaded program is a bad idea - specially if you use opengl functions in a thread that has no context created.
Apart from that, there is nothing wrong in your code example.
I have an OpenGL program that needs to periodically update the textures. But at the same time I want the program to be responsive (specifically, to continue running the draw/display code) while it's updating these textures.
But this seems impossible: If I make thread1 do the draw/display code and thread2 to the texture moving, then they will conflict under the resource GL_ARRAY_BUFFER_ARB because thread2 has to keep that resource to move some textures over to a vbo. And I need GL_ARRAY_BUFFER_ARB to do the draw/display code for thread1 because that uses different vbo's.
For example this code
glBindBufferARB(GL_ARRAY_BUFFER_ARB, tVboId);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, numVertices*2*sizeof(GLfloat), texCoords, GL_DYNAMIC_DRAW_ARB);
glBindBufferARB(GL_ARRAY_BUFFER_ARB,0);
will move some textures over but it will take a while. During that time the display code is supposed to run many times but it will instead crash, because GL_ARRAY_BUFFER_ARB is in use.
I thought I could do something like GL_ARRAY_BUFFER_ARB2 but there is no such thing, I think.
Use PBOs they allow you to do asynchronous transfers, read more here.
This should help: http://hacksoflife.blogspot.com/2008/02/creating-opengl-objects-in-second.html