Dealing with the catch 22 of object lifetimes in vulkan's device, surface, and swapchain in C++? - c++

Background:
In order to even display to the screen you need to enable a "KHR" (khronos group extension) extension for presentation surfaces.
A surface, as far as I understand, is an abstraction of the windows/places images are displayed returned by your window software.
In vulkan you have a VkSurface (returned by your window software, ie GLFW), which has certain properties
These properties are needed in order to know if a Device is compatible with it. In other words, before a VkDevice is created (the actual logical view of the GPU which you can actually use to submit commands to), it needs to know about the Surface if you are going to use it, specifically in order to create a device with presentation queues that support that surface with the properties it has.
Once the device is created, you can create the swapchain, which is basically a series of buffers/attachments you actually use to render to.
Swapchain's however have a 1:1 relationship with surfaces. There can only ever be a single swapchain per surface at max.
Problem:
This is where I start running into issues. In my code-base, I codify this relationship in a member variable. A surface has a swapchain, which guarantees that you as the programmer can't accidentally create multiple swapchains per surface if you use my wrapper.
But, if we use this abstraction the following happens:
my::Surface surface = window.create_surface(...); //VkSurface wrapper
auto queue_family = physical_device.find_queue_family_that_matches(surface,...);
auto queue_create_list = {{queue_family, priority},...};
my::Device device = physical_device.create_device(...,queue_create_list,...);
my::swapchain_builder.swapchain_builder(device);
swapchain_builder.builder_pattern_x(...).builder_pattern_x(...)....;
surface.create_swapchain(swapchain_builder);
...
render loop{
}
...
//end of program
return 0;
//ERROR! device no longer exists to destroy swapchain!
}
Because the surface is created before the device, and because the swapchain is a member of the surface, on destruction the device is destroyed before the swapchain.
The "solution" I came up with in the mean time was this:
my::Device device; //device default constructible, but becomes a VK_NULL_HANDLE underneath
my::Surface surface = ...;
...
device = physical_device.create_device(...,queue_create_list,...);
...
surface.create_swapchain(swapchain_builder);
And this certainly works. The surface is destroyed before the device is, and thus so is the swapchain. But it leaves a bad taste in my mouth.
The whole reason I made swapchain a member was to eliminate bugs caused by multiple swapchains being created, my eliminating the option for the bug to exist in the first place, and remove the need for the user to think about the Vulkan Spec by encoding that requirement into my wrapper itself.
But Now the user has to remember to default initialize the device first... or they will get an esoteric error (not as good as the one I show here) unless they use validation layers.
Question:
Is there some way to encode this object relationship at compile time with out runtime declaration order issues?, is there maybe a better way to codify a 1:1 relationship in this scenario, such that the surface object could exist on it's own and RAII order would handle this?

Swapchain's however have a 1:1 relationship with surfaces. There can only ever be a single swapchain per surface at max.
That is not true. From the standard:
A native window cannot be associated with more than one non-retired swapchain at a time.
You can create multiple swapchains for a surface. However, when you create a new one, you have to provide the old one, and the old one becomes "retired". Images you have previously acquired from the retired swapchain can still be presented, but you cannot acquire images more from the swapchain.
This moves nicely into the next point: the user needs to be able to recreate the swapchain for a surface.
Swapchains can become invalid, perhaps due to user rescaling of a window or other things. When this happens, the user needs to recreate them. Whether you retire the old one or not, you're going to have to call the function to create one.
So if you want your surface class to store a swapchain, your API needs a way for the user to create a swapchain.
In short, your goal is wrong; users need the function you're trying to get rid of.

Related

Vulkan: How to record command buffers in separate thread?

I don't properly understand how to parallelize work on separate threads in Vulkan.
In order to begin issuing vkCmd*s, you need to begin a render pass. The call to begin render pass needs a reference to a framebuffer. However, vkAcquireNextImageKHR() is not guaranteed to return image indexes in a round robin way. So, in a triple-buffering setup, if the current image index is 0, I can't just bind framebuffer 1 and start issuing draw calls for the next frame, because the next call to vkAcquireNextImageKHR() might return image index 2.
What is a proper way to record commands without having to specify the framebuffer to use ahead of time?
You have one or more render passes that you want to execute per-frame. And each one has one or more subpasses, into which you want to pour work. So your main rendering thread will generate one or more secondary command buffers for those subpasses, and it will pass that sequence of secondary CBs off to the submission thread.
The submissions thread will create the primary CB that gets rendered. It begins/ends render passes, and into each subpass, it executes the secondary CB(s) created on the rendering thread for that particular subpass.
So each thread is creating its own command buffers. The submission thread is the one that deals with the VkFramebuffer object, since it begins the render passes. It also is the one that acquires the swapchain images and so forth. The render thread is the one making the secondary CBs that do all of the real work.
Yes, you'll still be doing some CB building on the submission thread, but it ought to be pretty minimalistic overall. This also serves to abstract away the details of the render targets from your rendering thread, so that code dealing with the swapchain can be localized to the submission thread. This gives you more flexibility.
For example, if you want to triple buffer, and the swapchain doesn't actually allow that, then your submission thread can create its own extra images, then copy from its internal images into the real swapchain. The rendering thread's code does not have to be disturbed at all to allow this.
You can use multiple threads to generate draw commands for the same renderpass using secondary command buffers. And you can generate work for different renderpasses in the same frame in parallel -- only the very last pass (usually a postprocess pass) depends on the specific swapchain image, all your shadow passes, gbuffer/shading/lighting passes, and all but the last postprocess pass don't. It's not required, but it's often a good idea to not even call vkAcquireNextImageKHR until you're ready to start generating the final renderpass, after you've already generated many of the prior passes.
First, to be clear:
In order to begin issuing vkCmd*s, you need to begin a render pass.
That is not necessarily true. In command buffers You can record multiple different commands, all of which begin with vkCmd. Only some of these commands need to recorded inside a render pass - the ones that are connected with drawing. There are some commands, which cannot be called inside a render pass (like for example dispatching compute shaders). But this is just a side note to sort things out.
Next thing - mentioned triple buffering. In Vulkan the way images are displayed depends on the supported present mode. Different hardware vendors, or even different driver versions, may offer different present modes, so on one hardware You may get present mode that is most similar to triple buffering (MAILBOX), but on other You may not get it. And present mode impacts the way presentation engine allows You to acquire images from a swapchain, and then displays them on screen. But as You noted, You cannot depend on the order of returned images, so You shouldn't design Your application to behave as if You always have the same behavior on all platforms.
But to answer Your question - the easiest, naive, way is to call vkAcquireNextImageKHR() at the beginning of a frame, record command buffers that use an image returned by it, submit those command buffers and present the image. You can create framebuffers on demand, just before You need to use it inside a command buffer: You create a framebuffer that uses appropriate image (the one associated with index returned by the vkAcquireNextImageKHR() function) and after command buffers are submitted and when they stop using it, You destroy it. Such behavior is presented in the Vulkan Cookbook: here and here.
More appropriate way would be to prepare framebuffers for all available swapchain images and take appropriate framebuffer during a frame. But You need to remember to recreate them when You recreate swapchain.
More advanced scenarios would postpone swapchain acquiring until it is really needed. vkAcquireNextImageKHR() function call may block Your application (wait until image is available) so it should be called as late as possible when You prepare a frame. That's why You should record command buffers that don't need to reference swapchain images first (for example those that render geometry into a G-buffer in deferred shading algorithms). After that when You want to display image on screen (like for example some postprocessing technique) You just take the approach describe above: acquire an image, prepare appropriate command buffer(s) and present the image.
You can also pre-record command buffers that reference particular swapchain images. If You know that the source of Your images will always be the same (like the mentioned G-buffer), You can have a set of command buffers that always perform some postprocess/copy-like operations from this data to all swapchain images - one command buffer per swapchain image. Then, during the frame, if all of Your data is set, You acquire an image, check which pre-recorded command buffer is appropriate and submit the one associated with acquired image.
There are multiple ways to achieve what You want, all of them depend on many factors - performance, platform, specific goal You want to achieve, type of operations You perform in Your application, synchronization mechanisms You implemented and many other things. You need to figure out what best suits You. But in the end - You need to reference a swapchain image in command buffers if You want to display image on screen. I'd suggest starting with the easiest option first and then, when You get used to it, You can improve Your implementation for higher performance, flexibility, easier code maintenance etc.
You can call vkAcquireNextImageKHR in any thread. As long as you make sure the access to the swapchain, semaphore and fence you pass to it is synchronized.
There is nothing else restricting you from calling it in any thread, including the recording thread.
You are also allowed to have multiple images acquired at a time. Assuming you have created enough. In other words acquiring the next image before you present the current one is allowed.

How should I allocate/populate/update memory on GPU for different type of scene objects?

I'm trying to write my first DX12 app. I have no previous experience in DX11. I would like to display some rigid and some soft objects. Without textures for now. So I need to place into GPU some vertex/index buffers which I will never change later and some which I will change. And the scene per se isn't static, so some new objects can appear and some can vanish.
How should I allocate/populate/update memory on GPU for it? I would like to see high level overview easy to read and understand, not real code. Hope the question isn't too broad.
You said you are new to DirectX, i will strongly recommend you to stay away from DX12 and stick with DX11. DX12 is only useful for people that are already Expert ( with a big E ) and project that has to push very far or have edge cases for a feature in DX12 not possible in DX11.
But anyway, on DX12, as an example to initialize a buffer, you have to create instances of ID3D12Resource. You will need two, one in the an upload heap and one in the default heap. You fill the first one on the CPU using Map. Then you need to use a command list to copy to the second one. Of course, you have to manage the resource state of your resource with barriers ( copy destination, shader resource, ... ). You need then to execute the command list on the command queue. You also need to add a fence to listen the gpu for completion before you can destroy the resource in the upload heap.
On DX11, you call ID3D11Device::CreateBuffer, by providing the description struct with a SRV binding flag and the pointer to the cpu data you want to put in it… Done
It is slightly more complex for texture as you deal with memory layout. So, as i state above, you should focus on DX11, it is not degrading at all, both have their roles.

Controlling Access To IDirect3DDevice9

So I am writing a resource manager for my in house game engine and am stuck on something. Well, not really stuck but I feel like there should be a better way to do this. The issue is: I have a resource manager class that consists of an LRU, a hash table (for quick resource look up), and another hash table that contains resource controllers(dictates how various files load and how to destroy different resources).
For the rendering part I have encapsulated the actual IDirect3DDevice object in a renderer class that sits in a SceneManager class. The scene manager determines which objects are actually visible, etc through use of an OctTree and then renders them using the renderer object.
The problem is that, say the resource controller for a .jpg file to be loaded as a texture, needs access to the device. This means I have to give the controller a pointer to the renderer which then needs to have a function that returns the texture created by the device. However, that gives too much functionality to the renderer, theoretically it shouldn't care about anything but drawing. This happens because, upon creation of a texture, you either have to call the devices member function or pass the device into one of the D3DX texture functions. The same problem exists for other resources such as meshes because they need access to the renderer to create vertex and index buffers.
In addition, once the resource controller has access to the renderer, it could potentially call any of the draw functions if so inclined which is totally unnecessary. Anyone have any work arounds to a problem like this, or is the inherent nature just a result of Microsoft giving to much functionality to the device object for D3D9?

Beating the state machine

I'm working on a plugin for a scripting language that allows the user to access the OpenGL 1.1 command set. On top of that, all functions of the scripting language's own gfx command set are transparently redirected to appropriate OpenGL calls. Normally, the user should use either the OpenGL command set or the scripting language's inbuilt gfx command set which basically contains just your typical 2D drawing commands like DrawLine(), DrawRectangle(), DrawPolygon(), etc.
Under certain conditions, however, the user might want to mix calls to the OpenGL and the inbuilt gfx command sets. This leads to the problem that my OpenGL implementations of inbuilt commands like DrawLine(), DrawRectangle(), DrawPolygon(), etc. have to be able to deal with whatever state the OpenGL state machine might currently be in.
Therefore, my idea was to first save all state information on the stack, then prepare a clean OpenGL context needed for my implementations of commands like DrawLine(), etc. and then restore the original state. E.g. something like this:
glPushAttrib(GL_ALL_ATTRIB_BITS);
glPushClientAttrib(GL_CLIENT_ALL_ATTRIB_BITS);
glPushMatrix();
....prepare OpenGL context for my needs.... --> problem: see below #2
....do drawing....
glPopMatrix();
glPopClientAttrib();
glPopAttrib();
Doing it like this, however, leads to several problems:
glPushAttrib() doesn't push all attributes, e.g save pixel pack and unpack state, render mode state, and select and feedback state are not pushed. Also, extension states are not saved. Extension states are not important as my plugin is not designed to support extensions. Saving and restoring other information (pixel pack and unpack) could probably be implemented manually using glGet().
Big problem: How should I prepare the OpenGL context after having saved all state information? I could save a copy of a "clean" state on the stack right after OpenGL's initialization and then try to pop this stack but for this I'd need a function to move data inside the stack, i.e. I'd need a function to copy or move a saved state from the back to the top of stack so that I can pop it. But I didn't see a function that can accomplish this...
It's probably going to be very slow but this is something I could live with because the user should not mix OpenGL and inbuilt gfx calls. If he does nevertheless, he will have to live with a very poor performance.
After these introductory considerations I'd finally like to present my question: Is it possible to "beat" the OpenGL state machine somehow? By "beating" I mean the following: Is it possible to completely save all current state information, then restore the default state and prepare it for my needs and do the drawing, and then finally restore the complete previous state again so that everything is exactly as it was before. For example, an OpenGL based version of the scripting language's DrawLine() command would do something like this then:
1. Save all current state information
2. Restore default state, set up a 2D projection matrix
3. Draw the line
4. Restore all saved state information so that the state is exactly the same as before
Is that possible somehow? It doesn't matter if it's very slow as long as it is 100% guaranteed to put the state into exactly the same state as it was before.
You can simply use different contexts, especially if you do not care about performance. Just keep an context for your internal gfx operations and another one the user might mess with and just bind the appropriate one to your window (and thread).
The way you describe it looks like you never want to share objects with the user's GL stuff, so simple "unshared" contexts will do fine. All you seem to want to share is the framebuffer - and the GL framebuffer (including back and front color buffers, depth buffer, stencil, etc..) is part of the drawable/window, not the context - so you get access to it whit any context when you make the context current. Changing the contexts mid-frame is not a problem.

Understanding Device Contexts

As a relative newcomer to MFC, I see Device Contexts (DCs) a lot. I vaguely understand that it's something to do with drawing, but the specifics are not very well explained anywhere that I can find. What does creating a "compatible Device Context" mean, and why is it important? What does SelectObject do, and how must I make a DC compatible first?
A Device Context is just a place that drawing occurs, so if you have two different DC's, you're drawing in two different places. Kind of like a file handle.
Device Contexts can refer to real-estate on screen, or to bitmaps that just reside in memory, and probably other places, too, those are just the two I can think of at the moment.
Compatible contexts are ones that have the same underlying pixel organization, by which is meant number of bits per pixel, bytes per pixel, color organization and so forth. Memory bitmap device contexts can have any organization you want, but your screen contexts are going to be related (eventually) to buffers on your graphics card, which will (depending on mode, etc) have a very specific pixel organization.
Having compatible contexts means its efficient to transfer image data between them, because little or no translation of the data is required. At the other extreme, if you have a 256 color palette, 8 bit map and you try to blit it to a screen that has 8 bits each of RGBA per pixel, every last pixel will require significant massaging as it is copied and so copying incompatible bitmaps is very much slower. According to the Win32 SDK documentation, at least BitBlt() and StretchBlt() "convert the source color format to match the destination format", so it can be done.
Investigate CreateCompatibleDC() and CreateCompatibleBitmap() as starting points for how to create drawing contexts that are compatible with already existing ones.
SelectObject() controls which resources are currently active within the device context. A context has a current pen, brush, font, and bitmap. These make a lot of the other GDI calls simpler by allowing you to specifiy fewer parameters. For instance, you don't have to specify the font when you use TextOut(), but if you want to change the font, that's where SelectObject() comes in. If you feed SelectObject() a handle to a font, the return value is a handle to the font that was in effect, and subsequent operations use the new font. Behavior is the same for the other kinds of resources, pens, brushes, etc.
(Old question but this is shown when googling...)
I'm afraid that, for beginners, the selected answer can be a bit misleading.
Please keep in mind that MFC wraps the Win32 API, so we need to see from the Win32 level, to better understand what's going on.
To understand why there is Device Context, we should understand GDI(Graphics Device Interface).
Then, why GDI exist? - For device independence. To achieve this, Microsoft made Graphic Objects (Brush, Pen, ...), each of which wraps and abstracts all the device dependency issues.
Now we don't have to care about different devices, and that's the whole point of GDI.
So we need to hold Graphic Objects(Brush, Pen, Bitmap,...) in some data structure, and that's the Device Context.
Then what is a SelectObject function?
Literally, it enables DC to "select" a Graphic Object. I.e, we use SelectObject to change a graphic object handle stored in the DC to another graphic object handle that we want to use.
Then what is a compatible device context?
compatible device context(=memory device context) uses memory, rather than devices.
From MSDN (emphasis mine):
To enable applications to place output in memory rather than sending
it to an actual device, use a special device context for bitmap
operations called a memory device context. A memory DC enables the
system to treat a portion of memory as a virtual device. It is an
array of bits in memory that an application can use temporarily to
store the color data for bitmaps created on a normal drawing surface.
Because the bitmap is compatible with the device, a memory DC is also
sometimes referred to as a compatible device context.
The memory DC stores bitmap images for a particular device. An
application can create a memory DC by calling the CreateCompatibleDC
function.
Compatible DC can be used, for example, to reduce flickering because we can save the bitmap in the memory and show it once, not showing every time the image changes.
Following MSDN docs would be helpful to newbies (including me).
Device Contexts, from the MFC viewpoint.
Device Contexts, from the Win32 viewpoint, and this following section.