DirectX Adding Multiple Meshes to a Single Vertex Buffer - c++

I'm fairly new to DirectX. I have what I think should be a pretty simple question, but I can't seem to find an answer to it anywhere.
Basically, I'd like to know how to add vertices from multiple meshes to a single vertex buffer. This would only happen once per mesh as the program is initialized, so I believe I want DEFAULT usage.
Is It possible to add each mesh to the buffer individually? or do I need to collect them all in a single array and pass them all at once? Default or Dynamic? Map/Unmap or updateSubresource? Thanks
For now I am using an index buffer and drawing once per object (horrible I know) but I am planning on switching to instancing as soon as I figure this out.

Related

How to render multiple different items in an efficient way with OpenGL

I am making a simple STG engine with OpenGL (To be exact, with LWJGL3).In this game, there can be several different types of items(called bullet) in one frame, and each type can have 10-20 instances.I hope to find an efficient way to render it.
I have read some books about modern OpenGL and find a method called "Instanced Rendering", but it seems only to work with same instances.Should I use for-loop to draw all items directly for my case?
Another question is about memory.Should I create an VBO for each frame, since the number of items is always changing?
Not the easiest question to answer but I'll try my best anyways.
An important property of OpenGL is that the OpenGL context is always bound to a single thread. So every OpenGL-method has to be called within that thread. A common way of dealing with this is using Queuing.
Example:
We are using Model-View-Controller architecture.
We have 3 threads; One to read input, one to handle received messages and one to render the scene.
Here OpenGL context is bound to rendering thread.
The first thread receives a message "Add model to position x". First thread has no time to handle the message, because there might be another message coming right after and we don't want to delay it. So we just give this message for the second thread to handle by adding it to second thread's queue.
Second thread reads the message and performs the required tasks as far as it can before OpenGL context is required. Like reads the Wavefront (.obj)-file from the memory and creates arrays from the received data.
Our second thread then queues this data to our OpenGL thread to handle. OpenGL thread generates VBOs and VAO and stores the data in there.
Back to your question
OpenGL generated Objects stay in the context memory until they are manually deleted or the context is destroyed. So it works kind of like C, where you have to manually allocate memory and free it after it's no more used. So you should not create new Objects for each frame, but reuse the data that stays unchanged. Also when you have multiple objects that use the same model or texture, you should just load that model once and apply all object specific differences on shaders.
Example:
You have an environment with 10 rocks that all share the same rock model.
You load the data, store it in VBOs and attach those VBOs into a VAO. So now you have a VAO defining a rock.
You generate 10 rock entities that all have position, rotation and scale. When rendering, you first bind the shader, then bind the model and texture, then loop through the stone entities and for each stone entity you bind that entity's position, rotation and scale (usually stored in a transformationMatrix) and render.
bind shader
load values to shader's uniform variables that don't change between entities.
bind model and texture (as those stay the same for each rock)
for(each rock in rocks){
load values to shader's uniform variables that do change between each rock, like the transformation.
render
}
unbind shader
Note: You don't need to unbind/bind shader each frame if you only use one shader. Same goes for VAO's and every other OpenGL object as well. So the binding will also stay over each rendering cycle.
Hope this will help you when getting started. Altho I would recommend some tutorial that might have a bit more context to it.
I have read some books about modern OpenGL and find a method called
"Instanced Rendering", but it seems only to work with same
instances.Should I use for-loop to draw all items directly for my
case?
Another question is about memory.Should I create an VBO for each
frame, since the number of items is always changing?
These both depend on the amount of bullets you plan on having. If you think you will have less than a thousand bullets, you can almost certainly push all of them to a VBO each frame and upload and your end users will not notice. If you plan on some obscene amount, then don't do this.
I would say that you should write everything each frame because it's the simplest to do right now, and if you start noticing performance issues then you need to look into instancing or some other method. When you get to "later" you should be more comfortable with OpenGL and find out ways to optimize it that won't be over your head (not saying it is over your head right now, but more experience can only help make it less complex later on).
Culling bullets not on the screen either should be on your radar.
If you plan on having a ridiculous amount of bullets on screen, then you should say so and we can talk about more advanced methods, however my guess is that if you ever reach that limit on today's hardware then you have a large ambitious game with a zoomed out camera and a significant amount of entities on screen, or you are zoomed up and likely have a mess on your screen anyways.
20 objects is nothing. Your program will be plenty fast no matter how you draw them.
When you have 10000 objects, then you'll want to ask for an efficient way.
Until then, draw them whichever way is most convenient. This probably means a separate draw call per object.

Cant understand concept of merge-instancing

I was reading slides from a presentation that was talking about "merge-instancing". (the presentation is from Emil Persson, the link: www.humus.name/Articles/Persson_GraphicsGemsForGames.pptx, from slide 19)
I can't understand what's going on, I know instancing only from openGL and I thought it can only draw the same mesh multiple times. Can somebody explain? Does it work differently with directX?
Instancing: You upload a mesh to the GPU and activate its buffers whenever you want to render it. Data is not duplicated.
Merging: You want to create a mesh from multiple smaller meshes (as the complex of building in the example), so you either:
Draw each complex using instancing, which means, multiple draw calls for each complex
You merge the instances into a single mesh, which will replicate the vertices and other data for each complex, but you will be able to render the whole complex with a single draw call
Instance-Merging: You create the complex by referencing the vertices of the instances that take part on it. Then you use the vertices to know where to fetch the data for each instance: This way you have the advantage of instancing (Each mesh is uploaded once to the GPU) and the merging benefits (you draw the whole complex with a single draw call)

How should you efficiently batch complex meshes?

What is the best way to render complex meshes? I wrote different solutions below and wonder what is your opinion about them.
Let's take an example: how to render the 'Crytek-Sponza' mesh?
PS: I do not use Ubershader but only separate shaders
If you download the mesh on the following link:
http://graphics.cs.williams.edu/data/meshes.xml
and load it in Blender you'll see that the whole mesh is composed by about 400 sub-meshes with their own materials/textures respectively.
A dummy renderer (version 1) will render each of the 400 sub-mesh separately! It means (to simplify the situation) 400 draw calls with for each of them a binding to a material/texture. Very bad for performance. Very slow!
pseudo-code version_1:
foreach mesh in meshList //400 iterations :(!
mesh->BindVBO();
Material material = mesh->GetMaterial();
Shader bsdf = ShaderManager::GetBSDFByMaterial(material);
bsdf->Bind();
bsdf->SetMaterial(material);
bsdf->SetTexture(material->GetTexture()); //Bind texture
mesh->Render();
Now, if we take care of the materials being loaded we can notice that the Sponza is composed in reality of ONLY (if I have a good memory :)) 25 different materials!
So a smarter solution (version 2) should be to gather all the vertex/index data in batches (25 in our example) and not store VBO/IBO into sub-meshes classes but into a new class called Batch.
pseudo-code version_2:
foreach batch in batchList //25 iterations :)!
batch->BindVBO();
Material material = batch->GetMaterial();
Shader bsdf = ShaderManager::GetBSDFByMaterial(material);
bsdf->Bind();
bsdf->SetMaterial(material);
bsdf->SetTexture(material->GetTexture()); //Bind texture
batch->Render();
In this case each VBO contains data that share exactly the same texture/material settings!
It's so much better! Now I think 25 VBO for render the sponza is too much! The problem is the number of Buffer bindings to render the sponza! I think a good solution should be to allocate a new VBO if the first one if 'full' (for example let's assume that the maximum size of a VBO (value defined in the VBO class as attribute) is 4MB or 8MB).
pseudo-code version_3:
foreach vbo in vboList //for example 5 VBOs (depends on the maxVBOSize)
vbo->Bind();
BatchList batchList = vbo->GetBatchList();
foreach batch in batchList
Material material = batch->GetMaterial();
Shader bsdf = ShaderManager::GetBSDFByMaterial(material);
bsdf->Bind();
bsdf->SetMaterial(material);
bsdf->SetTexture(material->GetTexture()); //Bind texture
batch->Render();
In this case each VBO does not contain necessary data that share exactly the same texture/material settings! It depends of the sub-mesh loading order!
So OK, there are less VBO/IBO bindings but not necessary less draw calls! (are you OK by this affirmation ?). But in a general manner I think this version 3 is better than the previous one! What do you think about this ?
Another optimization should be to store all the textures (or group of textures) of the sponza model in array(s) of textures! But if you download the sponza package you will see that all texture has different sizes! So I think they can't be bound together because of their format differences.
But if it's possible, the version 4 of the renderer should use only less texture bindings rather than 25 bindings for the whole mesh! Do you think it's possible ?
So, according to you, what is the best way to render the sponza mesh ? Have you another suggestion ?
You are focused on the wrong things. In two ways.
First, there's no reason you can't stick all of the mesh's vertex data into a single buffer object. Note that this has nothing to do with batching. Remember: batching is about the number of draw calls, not the number of buffers you use. You can render 400 draw calls out of the same buffer.
This "maximum size" that you seem to want to have is a fiction, based on nothing from the real world. You can have it if you want. Just don't expect it to make your code faster.
So when rendering this mesh, there is no reason to be switching buffers at all.
Second, batching is not really about the number of draw calls (in OpenGL). It's really about the cost of the state changes between draw calls.
This video clearly spells out (about 31 minutes in), the relative cost of different state changes. Issuing two draw calls with no state changes between them is cheap (relatively speaking). But different kinds of state changes have different costs.
The cost of changing buffer bindings is quite small (assuming you're using separate vertex formats, so that changing buffers doesn't mean changing vertex formats). The cost of changing programs and even texture bindings is far greater. So even if you had to make multiple buffer objects (which again, you don't have to), that's not going to be the primary bottleneck.
So if performance is your goal, you'd be better off focusing on the expensive state changes, not the cheap ones. Making a single shader that can handle all of the material settings for the entire mesh, so that you only need to change uniforms between them. Use array textures so that you only have one texture binding call. This will turn a texture bind into a uniform setting, which is a much cheaper state change.
There are even fancier things you can do, involving base instance counts and the like. But that's overkill for a trivial example like this.

opengl - rendering design (beginner)

I have a simulation program where I want to render about 500 - 1000 objects (rather small, max 50 triangles) in an animation (let's say 500 timesteps) or interactively (altering one object means recalculating all other objects in the worst case).
What would be the best approach for such a rendreing task?
I was thinking of VBOs and using glBufferSubData to update the objects for each timestep. Or is there some other method?
Also, as there are about 20 types of objects should I use 20 different VBOs so I can set up the attributes accordingly?
If you're doing keyframe animation (one set of vertices per frame), then either upload them all as separate VBOs and change which one you bind, or upload them all in a single VBO and change the attributes. I doubt that there would be much of a performance difference between these two solutions.
I would avoid glBufferSubData, since OpenGL should be able to manage all your memory for you. If this were a significantly larger set of data, I would suggest this method as you could stream the vertices you needed from disk to avoid having it all in memory at once, but with a small set of data this isn't an issue.
If you're doing bone-based animation, the glBufferSubData method is basically the only way to do it if you're skinning on the CPU. A vertex shader that does skinning (on the GPU) will perform much better than CPU skinning, just store your frames in a mat3x4 uniform.
For such a small number of objects, you probably should select the very easiest way to do it and optimize only if you really have to...
And by easy, I mean conceptually easiest for you.
You can use a single VBO object with just different offsets into it if you like, there's no need to use several.

What is the most efficient way to manage a large set of lines in OpenGL?

I am working on a simple CAD program which uses OpenGL to handle on-screen rendering. Every shape drawn on the screen is constructed entirely out of simple line segments, so even a simple drawing ends up processing thousands of individual lines.
What is the best way to communicate changes in this collection of lines between my application and OpenGL? Is there a way to update only a certain subset of the lines in the OpenGL buffers?
I'm looking for a conceptual answer here. No need to get into the actual source code, just some recommendations on data structure and communication.
You can use a simple approach such as using a display list (glNewList/glEndList)
The other option, which is slightly more complicated, is to use Vertex Buffer Objects (VBOs - GL_ARB_vertex_buffer_object). They have the advantage that they can be changed dynamically whereas a display list can not.
These basically batch all your data/transformations up and them execute on the GPU (assuming you are using hardware acceleration) resulting in higher performance.
Vertex Buffer Objects are probably what you want. Once you load the original data set in, you can make modifications to existing chunks with glBufferSubData().
If you add extra line segments and overflow the size of your buffer, you'll of course have to make a new buffer, but this is no different than having to allocate a new, larger memory chunk in C when something grows.
EDIT: A couple of notes on display lists, and why not to use them:
In OpenGL 3.0, display lists are deprecated, so using them isn't forward-compatible past 3.0 (2.1 implementations will be around for a while, of course, so depending on your target audience this might not be a problem)
Whenever you change anything, you have to rebuild the entire display list, which defeats the entire purpose of display lists if things are changed often.
Not sure if you're already doing this, but it's worth mentioning you should try to use GL_LINE_STRIP instead of individual GL_LINES if possible to reduce the amount of vertex data being sent to the card.
My suggestion is to try using a scene graph, some kind of hierarchical data structure for the lines/curves. If you have huge models, performance will be affected if you have plain list of lines. With a graph/tree structure you can check easily which items are visible and which are not by using bounding volumes. Also with a scenegraph you can apply transformation easily and reuse geometries.