How to handle a dynamic number of draw calls in vulkan - c++

I'm currently trying to get into Vulkan, and I've mostly followed this well-know Vulkan tutorial, all the while trying to integrate it into a framework I built around OpenGL. I'm at the point where I can successfully render an object on the screen, and have the object move around by passing a transformation matrix to a uniform buffer linked to my shader code.
In this tutorial the author is focusing on drawing one object to the screen, which is a good starting point, but I would like to have end code that would look like this:
drawRect(position1, size1, color1);
drawRect(position2, size2, color2);
...
My first try to implement something like this ended up with me submitting the command buffer, which is created an recorded only once at the beginning, once for each object I wanted to render, and making sure to update the uniform data in-between each command buffer submission. This didn't work however, and after some debugging with renderdoc, I realized it was because starting a render pass clears the screen.
If I understand my situation correctly, the only way to achieve what I want would involve re-creating the command buffers every frame:
Record n, the number of time we want to draw something on the screen;
At the end of a frame, allocate n uniform buffers, and fill them with the corresponding data;
Create n descriptor sets to be able to link these uniform buffers with my shader;
Record the command buffer by repeating n times the process of binding a descriptor set using vkCmdBindDescriptorSets and drawing the requested data using vkCmdDrawIndexed.
This seems like a lot of work to do every frame. Is this how I should handle a dynamic number of draw calls ? Or is there some concept I don't know about/got wrong ?

Generally command buffers are actually re-recorded every frame, and Vulkan allows to multithread recording with command pools.
Indirect draws exist: you store data about draw commands (indeces count, instances count, etc.) into a separate buffer, and then the driver reads the data from the buffer when you submit the commands; vkCmdDraw*Indirect requires you to specify number of draw commands at recording time; vkCmdDraw*IndirectCount allows you to store number of draw commands in a buffer as well.
Also i dont see a reason why would you have to re-create uniform buffers, descriptor sets each frame; In fact, as far as I know, Vulkan encourages you to pre-bake things that you can, and descriptor sets are a tool for that.

Related

How to stop clearing between command buffers?

I'm trying to get ImGui working in my engine but having some trouble "overlaying" it over my cube mesh. I split the two in seperate command buffers like
std::array<VkCommandBuffer, 2> cmdbuffers = { commandBuffers[imageIndex], imguicmdbuffers[imageIndex] };
And then in my queue submit info I put the command buffer count to 2 and pass it the data like so
submitInfo.commandBufferCount = 2;
submitInfo.pCommandBuffers = cmdbuffers.data();
But what happens now is that it only renders imgui, or if I switch the order in the array it only renders the cube, never both. Is it because they share the same render pass? I changed the VkRenderPassBeginInfo clear color to double check and indeed it either clears yellow and draws imgui or clears red and draws the cube. I've tried setting the clear alpha to 0 but that doesn't work and seems like a hack anyway. I feel like I lack understanding of how it submits and executes the command buffers and how it's tied to render passes/framebuffers, so whats up?
Given the following statements (that is, assuming they are accurate):
they share the same render pass
in my queue submit info I put the command buffer count to 2
VkRenderPassBeginInfo clear color
Certain things about the nature of your rendering become apparent (things you didn't directly state or provide code for). First, you are submitting two separate command buffers directly to the queue. Only primary command buffers can be submitted to the queue.
Second, by the nature of render passes, a render pass instance cannot span primary command buffers. So you must have two render pass instances.
Third, you specify that you can change the clear color of the image when you begin the render pass instance. Ergo, the render pass must specify that the image gets cleared as its load-op.
From all of this, I conclude that you are beginning the same VkRenderPass twice. A render pass that, as previously deduced, is set to clear the image at the beginning of the render pass instance. Which will dutifully happen both times, the second of which will wipe out everything that was rendered to that image beforehand.
Basically, you have two rendering operations, using a render pass that's set to destroy the data created by any previous rendering operation to the images it uses. That's not going to work.
You have a few ways of resolving this.
My preferred way would be to start employing secondary command buffers. I don't know if ImGui can be given a CB to record its data into. But if it can, I would suggest making it record its data into a secondary CB. You can then execute that secondary CB into the appropriate subpass of your renderpass. And thus, you don't submit two primary CBs; you only submit one.
Alternatively, you can make a new VkRenderPass which doesn't clear the previous image; it should load the image data instead. Your second rendering operation would use that render pass, while your initial one would retain the clear load-op.
Worst-case scenario, you can have the second operation render to an entirely different image, and then merge it with the primary image in a third rendering operation.

How to render multiple different items in an efficient way with OpenGL

I am making a simple STG engine with OpenGL (To be exact, with LWJGL3).In this game, there can be several different types of items(called bullet) in one frame, and each type can have 10-20 instances.I hope to find an efficient way to render it.
I have read some books about modern OpenGL and find a method called "Instanced Rendering", but it seems only to work with same instances.Should I use for-loop to draw all items directly for my case?
Another question is about memory.Should I create an VBO for each frame, since the number of items is always changing?
Not the easiest question to answer but I'll try my best anyways.
An important property of OpenGL is that the OpenGL context is always bound to a single thread. So every OpenGL-method has to be called within that thread. A common way of dealing with this is using Queuing.
Example:
We are using Model-View-Controller architecture.
We have 3 threads; One to read input, one to handle received messages and one to render the scene.
Here OpenGL context is bound to rendering thread.
The first thread receives a message "Add model to position x". First thread has no time to handle the message, because there might be another message coming right after and we don't want to delay it. So we just give this message for the second thread to handle by adding it to second thread's queue.
Second thread reads the message and performs the required tasks as far as it can before OpenGL context is required. Like reads the Wavefront (.obj)-file from the memory and creates arrays from the received data.
Our second thread then queues this data to our OpenGL thread to handle. OpenGL thread generates VBOs and VAO and stores the data in there.
Back to your question
OpenGL generated Objects stay in the context memory until they are manually deleted or the context is destroyed. So it works kind of like C, where you have to manually allocate memory and free it after it's no more used. So you should not create new Objects for each frame, but reuse the data that stays unchanged. Also when you have multiple objects that use the same model or texture, you should just load that model once and apply all object specific differences on shaders.
Example:
You have an environment with 10 rocks that all share the same rock model.
You load the data, store it in VBOs and attach those VBOs into a VAO. So now you have a VAO defining a rock.
You generate 10 rock entities that all have position, rotation and scale. When rendering, you first bind the shader, then bind the model and texture, then loop through the stone entities and for each stone entity you bind that entity's position, rotation and scale (usually stored in a transformationMatrix) and render.
bind shader
load values to shader's uniform variables that don't change between entities.
bind model and texture (as those stay the same for each rock)
for(each rock in rocks){
load values to shader's uniform variables that do change between each rock, like the transformation.
render
}
unbind shader
Note: You don't need to unbind/bind shader each frame if you only use one shader. Same goes for VAO's and every other OpenGL object as well. So the binding will also stay over each rendering cycle.
Hope this will help you when getting started. Altho I would recommend some tutorial that might have a bit more context to it.
I have read some books about modern OpenGL and find a method called
"Instanced Rendering", but it seems only to work with same
instances.Should I use for-loop to draw all items directly for my
case?
Another question is about memory.Should I create an VBO for each
frame, since the number of items is always changing?
These both depend on the amount of bullets you plan on having. If you think you will have less than a thousand bullets, you can almost certainly push all of them to a VBO each frame and upload and your end users will not notice. If you plan on some obscene amount, then don't do this.
I would say that you should write everything each frame because it's the simplest to do right now, and if you start noticing performance issues then you need to look into instancing or some other method. When you get to "later" you should be more comfortable with OpenGL and find out ways to optimize it that won't be over your head (not saying it is over your head right now, but more experience can only help make it less complex later on).
Culling bullets not on the screen either should be on your radar.
If you plan on having a ridiculous amount of bullets on screen, then you should say so and we can talk about more advanced methods, however my guess is that if you ever reach that limit on today's hardware then you have a large ambitious game with a zoomed out camera and a significant amount of entities on screen, or you are zoomed up and likely have a mess on your screen anyways.
20 objects is nothing. Your program will be plenty fast no matter how you draw them.
When you have 10000 objects, then you'll want to ask for an efficient way.
Until then, draw them whichever way is most convenient. This probably means a separate draw call per object.

CPU/GPU Shared Buffer in Direct3D12

I have no experience with Direct3D, so I may just be looking in the wrong places. However, I would like to convert a program I have written in OpenGL (using FreeGLUT) to a Windows IoT compatible UWP (running Direct3D, 12 'caus it's cool). I'm trying to port my program to a Raspberry Pi 3 and I don't want to convert to Linux.
Through the examples provided by Microsoft I have figured out most of what I believe I need to know to get started, but I can't figure out how to share a dynamic data buffer between the CPU and GPU.
What I want to know how to do:
Create a CPU/GPU shared circular buffer
Read and Draw with the GPU
Write / Replace sections with the CPU
Quick semi-pseudo code:
while (!buffer.inUse()){ //wait until buffer is not in use
updateBuffer(buffer.id, data, start, end); //insert data into buffer
drawToScreen(buffer.id); //draw using vertex data in buffer
}
This was previously done in OpenGL by simply using glBegin()/glEnd() and glVertex3f() for each value in an array when it wasn't being written to.
Update: I basically want a Direct3D12 equivalent of OpenGLs VBO editing using glBufferSubData(). If that makes more sense.
Update 2: I found that I can get away with discarding the vertex buffer every frame and re-uploading a new buffer to the GPU. There's a fair amount of overhead, as one would expect with transferring 10,000 - 200,000 doubles every frame. So I'm trying to find a way to use constant buffers to port the 5-10 updated vertexes into the shader, so I can copy from the constant buffer into the vertex buffer using the shader and not have to use map/unmap every frame. This way my circular buffer on the CPU is independent of the buffer being used on the GPU, but they will both share the same information through periodic updates. I'll do some more looking and post another more specific question on shaders if I don't find a solution.

DirectX Adding Multiple Meshes to a Single Vertex Buffer

I'm fairly new to DirectX. I have what I think should be a pretty simple question, but I can't seem to find an answer to it anywhere.
Basically, I'd like to know how to add vertices from multiple meshes to a single vertex buffer. This would only happen once per mesh as the program is initialized, so I believe I want DEFAULT usage.
Is It possible to add each mesh to the buffer individually? or do I need to collect them all in a single array and pass them all at once? Default or Dynamic? Map/Unmap or updateSubresource? Thanks
For now I am using an index buffer and drawing once per object (horrible I know) but I am planning on switching to instancing as soon as I figure this out.

When using Direct 3D, what should be processed in code and what should be processed in HLSL?

I am very new to 3D programming, namely with DirectX. I have been trying to follow tutorials on how to do basic things, and I have been looking at the samples provided by Microsoft. One of the big questions I have had is how to tell what calculations should be done in the actual game code and what calculations should be done in HLSL. I have not been able to understand what should be done where, because it looks like, to me, you could have almost all code pertaining to calculations in your shader file, or you could have it all in the executable code and only send the bear minimum to the pixel and vertex shaders. How can one tell what code should go where? If you need an example, I'll try to find one.
"Code" - CPU code
"HLSL" - GPU code
Basically, you want everything that is pure graphics to happen on the GPU. That is, when the information about what you want to render has been sent to the GPU, it should take over and use that information to generate the final image.
You want to the CPU to say to the GPU "this is what I want to render, and here is everything you need to make it happen" and then make sure to tell the GPU "this is how you render it".
Some examples (not a complete or final list in anyway):
CPU:
Anything dealing with window opening/closing/resizing
User input from mouse, keyboard
Reading and setting configuration
Generating and updating view matrices
Application logic
Setting up and initializing rendering (textures, buffers etc)
Generating vertex data (position, texture coordinates etc)
Creating graphic entities (triangles, textures, colors etc)
Handling animation (timestepping, swapping buffers)
Sending updated data to the GPU for each frame
GPU:
Use the view matrices to put things on the right place on the screen
Interpolate from vertex data to fragment data
Shading (usually, this is the most complicated part)
Calculate and write final pixel color