I have two threads: One main-thread (opengl) for 3d-rendering and one thread for logic. How should I connect the threads if I want to create a box-mesh in the rendering thread, if the order comes from the logic-thread?
In this case the logic-thread would use opengl-commands, which is not possible because every opengl-command should only be exectued in the main-thread. I know that I can not share opengl context over different threads (which seems to be a bad idea), so how should I solve this problem? Do there exist some general purpose design pattern or something else? Thanks.
You could implement a draw commands queue. Each draw command contains whatever is needed to make the required OpenGL calls. Each frame the rendering thread empties the current queue and processes the commands. Any other thread prepares its own commands and enqueues them at any time to the queue for the next frame.
Very primitive draw commands can be implemented as a class hierarchy with virtual Draw method. Of course this is not a small change at all but modern engines adopt this approach, of course much more advanced version of it. It can be efficient if the subsystems which submitted their command objects re-use them in the next frame, including their buffers. So each submodule constantly prepares and updates the draw command but submits it only when it should be rendered based on some logic.
There are various ways to approach this. One is to implement a command queue with the logic thread being a command producer and the rendering thread the consumer.
Another approach is to make use of an auxiliary OpenGL context, which is setup to share the primary OpenGL context data. You can have both contexts made current at the same time in different threads. And for OpenGL-3.x core onward, you can make current a context without a drawable. You can then use the auxiliary context to load new data, map buffers and so on.
Related
I don't properly understand how to parallelize work on separate threads in Vulkan.
In order to begin issuing vkCmd*s, you need to begin a render pass. The call to begin render pass needs a reference to a framebuffer. However, vkAcquireNextImageKHR() is not guaranteed to return image indexes in a round robin way. So, in a triple-buffering setup, if the current image index is 0, I can't just bind framebuffer 1 and start issuing draw calls for the next frame, because the next call to vkAcquireNextImageKHR() might return image index 2.
What is a proper way to record commands without having to specify the framebuffer to use ahead of time?
You have one or more render passes that you want to execute per-frame. And each one has one or more subpasses, into which you want to pour work. So your main rendering thread will generate one or more secondary command buffers for those subpasses, and it will pass that sequence of secondary CBs off to the submission thread.
The submissions thread will create the primary CB that gets rendered. It begins/ends render passes, and into each subpass, it executes the secondary CB(s) created on the rendering thread for that particular subpass.
So each thread is creating its own command buffers. The submission thread is the one that deals with the VkFramebuffer object, since it begins the render passes. It also is the one that acquires the swapchain images and so forth. The render thread is the one making the secondary CBs that do all of the real work.
Yes, you'll still be doing some CB building on the submission thread, but it ought to be pretty minimalistic overall. This also serves to abstract away the details of the render targets from your rendering thread, so that code dealing with the swapchain can be localized to the submission thread. This gives you more flexibility.
For example, if you want to triple buffer, and the swapchain doesn't actually allow that, then your submission thread can create its own extra images, then copy from its internal images into the real swapchain. The rendering thread's code does not have to be disturbed at all to allow this.
You can use multiple threads to generate draw commands for the same renderpass using secondary command buffers. And you can generate work for different renderpasses in the same frame in parallel -- only the very last pass (usually a postprocess pass) depends on the specific swapchain image, all your shadow passes, gbuffer/shading/lighting passes, and all but the last postprocess pass don't. It's not required, but it's often a good idea to not even call vkAcquireNextImageKHR until you're ready to start generating the final renderpass, after you've already generated many of the prior passes.
First, to be clear:
In order to begin issuing vkCmd*s, you need to begin a render pass.
That is not necessarily true. In command buffers You can record multiple different commands, all of which begin with vkCmd. Only some of these commands need to recorded inside a render pass - the ones that are connected with drawing. There are some commands, which cannot be called inside a render pass (like for example dispatching compute shaders). But this is just a side note to sort things out.
Next thing - mentioned triple buffering. In Vulkan the way images are displayed depends on the supported present mode. Different hardware vendors, or even different driver versions, may offer different present modes, so on one hardware You may get present mode that is most similar to triple buffering (MAILBOX), but on other You may not get it. And present mode impacts the way presentation engine allows You to acquire images from a swapchain, and then displays them on screen. But as You noted, You cannot depend on the order of returned images, so You shouldn't design Your application to behave as if You always have the same behavior on all platforms.
But to answer Your question - the easiest, naive, way is to call vkAcquireNextImageKHR() at the beginning of a frame, record command buffers that use an image returned by it, submit those command buffers and present the image. You can create framebuffers on demand, just before You need to use it inside a command buffer: You create a framebuffer that uses appropriate image (the one associated with index returned by the vkAcquireNextImageKHR() function) and after command buffers are submitted and when they stop using it, You destroy it. Such behavior is presented in the Vulkan Cookbook: here and here.
More appropriate way would be to prepare framebuffers for all available swapchain images and take appropriate framebuffer during a frame. But You need to remember to recreate them when You recreate swapchain.
More advanced scenarios would postpone swapchain acquiring until it is really needed. vkAcquireNextImageKHR() function call may block Your application (wait until image is available) so it should be called as late as possible when You prepare a frame. That's why You should record command buffers that don't need to reference swapchain images first (for example those that render geometry into a G-buffer in deferred shading algorithms). After that when You want to display image on screen (like for example some postprocessing technique) You just take the approach describe above: acquire an image, prepare appropriate command buffer(s) and present the image.
You can also pre-record command buffers that reference particular swapchain images. If You know that the source of Your images will always be the same (like the mentioned G-buffer), You can have a set of command buffers that always perform some postprocess/copy-like operations from this data to all swapchain images - one command buffer per swapchain image. Then, during the frame, if all of Your data is set, You acquire an image, check which pre-recorded command buffer is appropriate and submit the one associated with acquired image.
There are multiple ways to achieve what You want, all of them depend on many factors - performance, platform, specific goal You want to achieve, type of operations You perform in Your application, synchronization mechanisms You implemented and many other things. You need to figure out what best suits You. But in the end - You need to reference a swapchain image in command buffers if You want to display image on screen. I'd suggest starting with the easiest option first and then, when You get used to it, You can improve Your implementation for higher performance, flexibility, easier code maintenance etc.
You can call vkAcquireNextImageKHR in any thread. As long as you make sure the access to the swapchain, semaphore and fence you pass to it is synchronized.
There is nothing else restricting you from calling it in any thread, including the recording thread.
You are also allowed to have multiple images acquired at a time. Assuming you have created enough. In other words acquiring the next image before you present the current one is allowed.
I know that multi threaded OpenGL is a delicate topic and I am not trying here to render from multiple threads. I also do not try to create multiple contexts and share objects with share lists. I have a single context and I issue draw commands and gl state changes only from the main thread.
However, I am dynamically updating parts of a VBO in every frame. I only write to the VBO, I do not need to read it on the CPU side. I use glMapBufferRange so I can compute the changed data on the fly and don't need an additional copy (which would be created by the blocking glBufferSubData).
It works and now I would like to multi thread the the data update (since it needs to update a lot of vertices at steady 90 fps) and use a persistently mapped buffer (using GL_MAP_PERSISTENT_BIT). This will require to issue glFlushMappedBufferRange whenever a worker thread finished updating parts of the mapped buffer.
Is it fine to call glFlushMappedBufferRange on a separate thread? The Ranges the different threads operate on do not overlap. Is there an overhead or implicit synchronisation involved in doing so?
No you need to call glFlushMappedBufferRange in the thread that does the openGL stuff.
To overcome this you have 2 options:
get the openGL context and make it current in the worker thread. Which means the openGL thread has to relinquish the context for it to work.
push the relevant range into a thread-safe queue and let the openGL thread pop each range from it and call glFlushMappedBufferRange.
I'm making a game and I'm actually on the generation of the map.
The map is generated procedurally with some algorithms. There's no problems with this.
The problem is that my map can be huge. So I've thought about cutting the map in chunks.
My chunks are ok, they're 512*512 pixels each, but the only problem is : I have to generate a texture (actually a RenderTexture from the SFML). It takes around 0.5ms to generate so it makes the game to freeze each time I generate a chunk.
I've thought about a way to fix this : I've made a kind of a threadpool with a factory. I just have to send a task to it and it creates the chunk.
Now that it's all implemented, it raises opengl warnings like :
"An internal OpenGL call failed in RenderTarget.cpp (219) : GL_INVALID_OPERATION, the specified operation is not allowed in the current state".
I don't know if this is the good way of dealing with chunks. I've also thought about saving the chunks into images / files, but I fear that it take too much time to save / load them.
Do you know a better way to deal with this kind of "infinite" maps ?
It is an invalid operation because you must have a context bound to each thread. More importantly, all of the GL window system APIs enforce a strict 1:1 mapping between threads and contexts... no thread may have more than one context bound and no context may be bound to more than one thread. What you would need to do is use shared contexts (one context for drawing and one for each worker thread), things like buffer objects and textures will be shared between all shared contexts but the state machine and container objects like FBOs and VAOs will not.
Are you using tiled rendering for this map, or is this just one giant texture?
If you do not need to update individual sub-regions of your "chunk" images you can simply create new textures in your worker threads. The worker threads can create new textures and give them data while the drawing thread goes about its business. Only after a worker thread finishes would you actually try to draw using one of the chunks. This may increase the overall latency between the time a chunk starts loading and eventually appears in the finished scene but you should get a more consistent framerate.
If you need to use a single texture for this, I would suggest you double buffer your texture. Have one that you use in the drawing thread and another one that your worker threads issue glTexSubImage2D (...) on. When the worker thread(s) finish updating their regions of the texture you can swap the texture you use for drawing and updating. This will reduce the amount of synchronization required, but again increases the latency before an update eventually appears on screen.
things to try:
make your chunks smaller
generate the chunks in a separate thread, but pass to the gpu from the main thread
pass to the gpu a small piece at a time, taking a second or two
I'm interested in getting threading into the small engine I'm working on in my spare time, but I'm curious over what the best approuch is. I'm curious about the recommended way to sync the physics thread with the rest of the engine, similar to ThisGuy. I'm working with the Bullet Physics SDK, which already use the data copy method he was describing, but I was wondering, once bullet goes through one simulation then syncs the data back to the other threads, won't it result in something like vertical sync, where the rendering thread, half way through processing data suddenly starts using a newer and different set of information?
Is this something which the viewer will be able to notice? What if an explosion of some sort appears with the object that is meant to be destroyed?
If this is an issue, what is then is the best way to solve it?
Lock the physics thread so it can't do anything until the rendering thread (And basically every other thread) has gone through its frame? That seems like it would waste some CPU time. Or is the preferable method to triple buffer, copy the physics data to a second location, continue the physics simulation then copy that data to the rendering thread once its ready?
What approaches do you guys recommend?
The easiest and probably most used variant is to run physic, render, ai, ... threads in parallel and syncronise them after each of them has finished with a frame/timestep.
This is not the fastest solution, but the one with the fewest problems.
Writing back the data to the rendering thread while this is running, leads to massive syncronisation problems (e.g. you have to lock each vector/matrix while updating it).
To make the paralellisation efficent, you have to minimize the amount of data to syncronize, e.g. only write data to the render thread, that can possible be rendered.
When not synronizing after each frame, you can probably get the effect, that the physic/ai uses all the cpu power producing 60fps, while the renderer only renders 10fps, which in most cases is not, what you want.
A double buffering would also increase performance, but you still need to syncronize your threads. A problem is ai and physic or similar threads, because they possible want modify the same data
I want to make an opengl application that shows some 3d graphics and a command-line. I would like to make them separate threads, because they both are heavy processes. I thought that I could approach this with 2 different viewports, but I would like to know how to handle the threads in opengl.
According to what I've been reading, Opengl is asynchronous, and calling its functions from different threads can be very problematic. Is there a way that I could use to approach this problem? Ideally, I would like to draw the command line on top of the 3d graphics with some transparecy effect... (this is impossible with viewports I guess)
Is important that the solution is portable.
Thanks!
Its possible you can achieve what you want to do using Overlays.
Overlays are a somewhat dated feature but it should still be supported in most setups. Basically an overlay is a separate GL Context which is rendered in the same window as another layer, drawing on top of whatever was drawn on the windows with its original context.
You can read about it here.
I think, rather than trying to create two threads to draw to the screen, you need to use the MVC pattern and make your model thread-safe. Then you can have one thread that grabs the necessary info from the model each frame and throws it on screen, and then two other threads, one for the graphics and one for the command-line, which manage only the model.
So for instance, you have a Simulation class that has your 3D graphics stuff, and then a CommandLine class that has your command line. Each of these classes does not use OpenGL at all; only manages the data, such as where things are in 3d-space, and in the case of the command-line a queue of the lines on-screen. Then the opengl thread can query thread-safe functions of these classes each frame, to get the necessary info. So for example, it grabs the positions of the 3d things, draws them on-screen, then grabs the lines to display on the command line and draws them.
You can't do it with 2 viewports.
Each OpenGL context must be used from the same thread in which it was created.
Two separate window handles, each with their own context, can be setup and used from two separate threads safely, but not 2 viewports in one window.
A good place to look at ideas for this is OpenSceneGraph. It does a lot of work with threading to help try to speed up handling and maintain a constant, high framerate with very large scenes.
I've successfully embedded it and used it from 2 separate threads, but OSG has many checks in place to help with this sort of scenario.