I'm trying to get ImGui working in my engine but having some trouble "overlaying" it over my cube mesh. I split the two in seperate command buffers like
std::array<VkCommandBuffer, 2> cmdbuffers = { commandBuffers[imageIndex], imguicmdbuffers[imageIndex] };
And then in my queue submit info I put the command buffer count to 2 and pass it the data like so
submitInfo.commandBufferCount = 2;
submitInfo.pCommandBuffers = cmdbuffers.data();
But what happens now is that it only renders imgui, or if I switch the order in the array it only renders the cube, never both. Is it because they share the same render pass? I changed the VkRenderPassBeginInfo clear color to double check and indeed it either clears yellow and draws imgui or clears red and draws the cube. I've tried setting the clear alpha to 0 but that doesn't work and seems like a hack anyway. I feel like I lack understanding of how it submits and executes the command buffers and how it's tied to render passes/framebuffers, so whats up?
Given the following statements (that is, assuming they are accurate):
they share the same render pass
in my queue submit info I put the command buffer count to 2
VkRenderPassBeginInfo clear color
Certain things about the nature of your rendering become apparent (things you didn't directly state or provide code for). First, you are submitting two separate command buffers directly to the queue. Only primary command buffers can be submitted to the queue.
Second, by the nature of render passes, a render pass instance cannot span primary command buffers. So you must have two render pass instances.
Third, you specify that you can change the clear color of the image when you begin the render pass instance. Ergo, the render pass must specify that the image gets cleared as its load-op.
From all of this, I conclude that you are beginning the same VkRenderPass twice. A render pass that, as previously deduced, is set to clear the image at the beginning of the render pass instance. Which will dutifully happen both times, the second of which will wipe out everything that was rendered to that image beforehand.
Basically, you have two rendering operations, using a render pass that's set to destroy the data created by any previous rendering operation to the images it uses. That's not going to work.
You have a few ways of resolving this.
My preferred way would be to start employing secondary command buffers. I don't know if ImGui can be given a CB to record its data into. But if it can, I would suggest making it record its data into a secondary CB. You can then execute that secondary CB into the appropriate subpass of your renderpass. And thus, you don't submit two primary CBs; you only submit one.
Alternatively, you can make a new VkRenderPass which doesn't clear the previous image; it should load the image data instead. Your second rendering operation would use that render pass, while your initial one would retain the clear load-op.
Worst-case scenario, you can have the second operation render to an entirely different image, and then merge it with the primary image in a third rendering operation.
Related
I'm currently trying to get into Vulkan, and I've mostly followed this well-know Vulkan tutorial, all the while trying to integrate it into a framework I built around OpenGL. I'm at the point where I can successfully render an object on the screen, and have the object move around by passing a transformation matrix to a uniform buffer linked to my shader code.
In this tutorial the author is focusing on drawing one object to the screen, which is a good starting point, but I would like to have end code that would look like this:
drawRect(position1, size1, color1);
drawRect(position2, size2, color2);
...
My first try to implement something like this ended up with me submitting the command buffer, which is created an recorded only once at the beginning, once for each object I wanted to render, and making sure to update the uniform data in-between each command buffer submission. This didn't work however, and after some debugging with renderdoc, I realized it was because starting a render pass clears the screen.
If I understand my situation correctly, the only way to achieve what I want would involve re-creating the command buffers every frame:
Record n, the number of time we want to draw something on the screen;
At the end of a frame, allocate n uniform buffers, and fill them with the corresponding data;
Create n descriptor sets to be able to link these uniform buffers with my shader;
Record the command buffer by repeating n times the process of binding a descriptor set using vkCmdBindDescriptorSets and drawing the requested data using vkCmdDrawIndexed.
This seems like a lot of work to do every frame. Is this how I should handle a dynamic number of draw calls ? Or is there some concept I don't know about/got wrong ?
Generally command buffers are actually re-recorded every frame, and Vulkan allows to multithread recording with command pools.
Indirect draws exist: you store data about draw commands (indeces count, instances count, etc.) into a separate buffer, and then the driver reads the data from the buffer when you submit the commands; vkCmdDraw*Indirect requires you to specify number of draw commands at recording time; vkCmdDraw*IndirectCount allows you to store number of draw commands in a buffer as well.
Also i dont see a reason why would you have to re-create uniform buffers, descriptor sets each frame; In fact, as far as I know, Vulkan encourages you to pre-bake things that you can, and descriptor sets are a tool for that.
Is it possible to make the rendering happen in the secondary command buffers? For example, there are 3 primary buffers and they call secondary buffers, which in turn render? I want to make a simple Manager that allows you to add and remove new objects on the screen.
Yes. And when using secondary command buffers and the VK_SUBPASS_CONTENTS_SECONDARY_COMMAND_BUFFERS flag for your renderpass (see VkSubpassContents) you actually have to put all rendering commands into secondary command buffers that are called from within the primary command buffer using vkCmdExecuteCommands.
I am making a simple STG engine with OpenGL (To be exact, with LWJGL3).In this game, there can be several different types of items(called bullet) in one frame, and each type can have 10-20 instances.I hope to find an efficient way to render it.
I have read some books about modern OpenGL and find a method called "Instanced Rendering", but it seems only to work with same instances.Should I use for-loop to draw all items directly for my case?
Another question is about memory.Should I create an VBO for each frame, since the number of items is always changing?
Not the easiest question to answer but I'll try my best anyways.
An important property of OpenGL is that the OpenGL context is always bound to a single thread. So every OpenGL-method has to be called within that thread. A common way of dealing with this is using Queuing.
Example:
We are using Model-View-Controller architecture.
We have 3 threads; One to read input, one to handle received messages and one to render the scene.
Here OpenGL context is bound to rendering thread.
The first thread receives a message "Add model to position x". First thread has no time to handle the message, because there might be another message coming right after and we don't want to delay it. So we just give this message for the second thread to handle by adding it to second thread's queue.
Second thread reads the message and performs the required tasks as far as it can before OpenGL context is required. Like reads the Wavefront (.obj)-file from the memory and creates arrays from the received data.
Our second thread then queues this data to our OpenGL thread to handle. OpenGL thread generates VBOs and VAO and stores the data in there.
Back to your question
OpenGL generated Objects stay in the context memory until they are manually deleted or the context is destroyed. So it works kind of like C, where you have to manually allocate memory and free it after it's no more used. So you should not create new Objects for each frame, but reuse the data that stays unchanged. Also when you have multiple objects that use the same model or texture, you should just load that model once and apply all object specific differences on shaders.
Example:
You have an environment with 10 rocks that all share the same rock model.
You load the data, store it in VBOs and attach those VBOs into a VAO. So now you have a VAO defining a rock.
You generate 10 rock entities that all have position, rotation and scale. When rendering, you first bind the shader, then bind the model and texture, then loop through the stone entities and for each stone entity you bind that entity's position, rotation and scale (usually stored in a transformationMatrix) and render.
bind shader
load values to shader's uniform variables that don't change between entities.
bind model and texture (as those stay the same for each rock)
for(each rock in rocks){
load values to shader's uniform variables that do change between each rock, like the transformation.
render
}
unbind shader
Note: You don't need to unbind/bind shader each frame if you only use one shader. Same goes for VAO's and every other OpenGL object as well. So the binding will also stay over each rendering cycle.
Hope this will help you when getting started. Altho I would recommend some tutorial that might have a bit more context to it.
I have read some books about modern OpenGL and find a method called
"Instanced Rendering", but it seems only to work with same
instances.Should I use for-loop to draw all items directly for my
case?
Another question is about memory.Should I create an VBO for each
frame, since the number of items is always changing?
These both depend on the amount of bullets you plan on having. If you think you will have less than a thousand bullets, you can almost certainly push all of them to a VBO each frame and upload and your end users will not notice. If you plan on some obscene amount, then don't do this.
I would say that you should write everything each frame because it's the simplest to do right now, and if you start noticing performance issues then you need to look into instancing or some other method. When you get to "later" you should be more comfortable with OpenGL and find out ways to optimize it that won't be over your head (not saying it is over your head right now, but more experience can only help make it less complex later on).
Culling bullets not on the screen either should be on your radar.
If you plan on having a ridiculous amount of bullets on screen, then you should say so and we can talk about more advanced methods, however my guess is that if you ever reach that limit on today's hardware then you have a large ambitious game with a zoomed out camera and a significant amount of entities on screen, or you are zoomed up and likely have a mess on your screen anyways.
20 objects is nothing. Your program will be plenty fast no matter how you draw them.
When you have 10000 objects, then you'll want to ask for an efficient way.
Until then, draw them whichever way is most convenient. This probably means a separate draw call per object.
I don't properly understand how to parallelize work on separate threads in Vulkan.
In order to begin issuing vkCmd*s, you need to begin a render pass. The call to begin render pass needs a reference to a framebuffer. However, vkAcquireNextImageKHR() is not guaranteed to return image indexes in a round robin way. So, in a triple-buffering setup, if the current image index is 0, I can't just bind framebuffer 1 and start issuing draw calls for the next frame, because the next call to vkAcquireNextImageKHR() might return image index 2.
What is a proper way to record commands without having to specify the framebuffer to use ahead of time?
You have one or more render passes that you want to execute per-frame. And each one has one or more subpasses, into which you want to pour work. So your main rendering thread will generate one or more secondary command buffers for those subpasses, and it will pass that sequence of secondary CBs off to the submission thread.
The submissions thread will create the primary CB that gets rendered. It begins/ends render passes, and into each subpass, it executes the secondary CB(s) created on the rendering thread for that particular subpass.
So each thread is creating its own command buffers. The submission thread is the one that deals with the VkFramebuffer object, since it begins the render passes. It also is the one that acquires the swapchain images and so forth. The render thread is the one making the secondary CBs that do all of the real work.
Yes, you'll still be doing some CB building on the submission thread, but it ought to be pretty minimalistic overall. This also serves to abstract away the details of the render targets from your rendering thread, so that code dealing with the swapchain can be localized to the submission thread. This gives you more flexibility.
For example, if you want to triple buffer, and the swapchain doesn't actually allow that, then your submission thread can create its own extra images, then copy from its internal images into the real swapchain. The rendering thread's code does not have to be disturbed at all to allow this.
You can use multiple threads to generate draw commands for the same renderpass using secondary command buffers. And you can generate work for different renderpasses in the same frame in parallel -- only the very last pass (usually a postprocess pass) depends on the specific swapchain image, all your shadow passes, gbuffer/shading/lighting passes, and all but the last postprocess pass don't. It's not required, but it's often a good idea to not even call vkAcquireNextImageKHR until you're ready to start generating the final renderpass, after you've already generated many of the prior passes.
First, to be clear:
In order to begin issuing vkCmd*s, you need to begin a render pass.
That is not necessarily true. In command buffers You can record multiple different commands, all of which begin with vkCmd. Only some of these commands need to recorded inside a render pass - the ones that are connected with drawing. There are some commands, which cannot be called inside a render pass (like for example dispatching compute shaders). But this is just a side note to sort things out.
Next thing - mentioned triple buffering. In Vulkan the way images are displayed depends on the supported present mode. Different hardware vendors, or even different driver versions, may offer different present modes, so on one hardware You may get present mode that is most similar to triple buffering (MAILBOX), but on other You may not get it. And present mode impacts the way presentation engine allows You to acquire images from a swapchain, and then displays them on screen. But as You noted, You cannot depend on the order of returned images, so You shouldn't design Your application to behave as if You always have the same behavior on all platforms.
But to answer Your question - the easiest, naive, way is to call vkAcquireNextImageKHR() at the beginning of a frame, record command buffers that use an image returned by it, submit those command buffers and present the image. You can create framebuffers on demand, just before You need to use it inside a command buffer: You create a framebuffer that uses appropriate image (the one associated with index returned by the vkAcquireNextImageKHR() function) and after command buffers are submitted and when they stop using it, You destroy it. Such behavior is presented in the Vulkan Cookbook: here and here.
More appropriate way would be to prepare framebuffers for all available swapchain images and take appropriate framebuffer during a frame. But You need to remember to recreate them when You recreate swapchain.
More advanced scenarios would postpone swapchain acquiring until it is really needed. vkAcquireNextImageKHR() function call may block Your application (wait until image is available) so it should be called as late as possible when You prepare a frame. That's why You should record command buffers that don't need to reference swapchain images first (for example those that render geometry into a G-buffer in deferred shading algorithms). After that when You want to display image on screen (like for example some postprocessing technique) You just take the approach describe above: acquire an image, prepare appropriate command buffer(s) and present the image.
You can also pre-record command buffers that reference particular swapchain images. If You know that the source of Your images will always be the same (like the mentioned G-buffer), You can have a set of command buffers that always perform some postprocess/copy-like operations from this data to all swapchain images - one command buffer per swapchain image. Then, during the frame, if all of Your data is set, You acquire an image, check which pre-recorded command buffer is appropriate and submit the one associated with acquired image.
There are multiple ways to achieve what You want, all of them depend on many factors - performance, platform, specific goal You want to achieve, type of operations You perform in Your application, synchronization mechanisms You implemented and many other things. You need to figure out what best suits You. But in the end - You need to reference a swapchain image in command buffers if You want to display image on screen. I'd suggest starting with the easiest option first and then, when You get used to it, You can improve Your implementation for higher performance, flexibility, easier code maintenance etc.
You can call vkAcquireNextImageKHR in any thread. As long as you make sure the access to the swapchain, semaphore and fence you pass to it is synchronized.
There is nothing else restricting you from calling it in any thread, including the recording thread.
You are also allowed to have multiple images acquired at a time. Assuming you have created enough. In other words acquiring the next image before you present the current one is allowed.
I am looking in this demo for rendering a scene in vulkan using depth peeling Order Independent Transparency
Blog: https://matthewwellings.com/blog/depth-peeling-order-independent-transparency-in-vulkan/
Code: https://github.com/openforeveryone/VulkanDepthPeel
I have modified the code so that I am able to save the final render in an output image(png) before presenting for rendering to the surface.
Once the primary command buffer consisting secondary command buffers responsible for drawing operations is submitted to queue for execution & rendering is finished, I am using vkCmdCopyImageToBuffer for copying the data from the current swap chain image(The copy operation is done after introducing the image barrier to make sure rendering is completed first) to a VkBuffer & then mapping the buffer memory to an unsigned char pointer & writing this information to the PNG file. But the output which I see in the PNG is different from the one rendered on window as the boxes are almost entirely transparent with some RGB information as you can see in the image below.
My guess is this might be the case due to this particular demo involving multiple subpasses & I am not copying the data properly but only thing bothering me is that since I am directly copying from swapchain image just before finally presenting to the surface, I should be having the final color data in the image & hence PNG & render should match.
Rendered Frame:
Saved Frame:
Let me know if I have missed explaining any details, any help is appreciated. Thanks!
You have alpha value 41 in the saved image.
If I just rewrite it to 255 then the images are identical.
You are probably using VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR with the swapchain, which does that automatically. But typical image viewer will treat the alpha as premultiplied — hence the perceived (brighter) image difference.