In my program I have a texture which is used several times in different situations. In each situation I need to apply a certain set of parameters.
I want to avoid having to create an additional buffer and essentially creating a copy of the texture for every time I need to use it for something else, so I'd like to know if there's a better way?
This is what sampler objects are for (available in core since version 3.3, or using ARB_sampler_objects). Sampler objects separate the texture image from its parameters, so you can use one texture with several parameter sets. That functionality was created with exactly your problem in mind.
Quote from ARB_sampler_objects extension spec:
In unextended OpenGL textures are considered to be sets of image data (mip-chains, arrays, cube-map face sets, etc.) and sampling state (sampling mode, mip-mapping state, coordinate wrapping and clamping rules, etc.) combined into a single object. It is typical for an application to use many textures with a limited set of sampling states that are the same between them. In order to use textures in this way, an application must generate and configure many texture names, adding overhead both to applications and to implementations. Furthermore, should an application wish to sample from a texture in more than one way (with and without mip-mapping, for example) it must either modify the state of the texture or create two textures, each with a copy of the same image data. This can introduce runtime and memory costs to the application.
Related
I'm working on my own D3D12 wrapper, and I'm in the beginning 'schema' phase.
I understand the purpose of the PSO in a very simple rendering pipeline, but say I've multiple objects, meshes, models, whatever terminology works best, and I would like to use a different pixel shader for each,
for clarification, I would make multiple PSOs for each of these objects correct?
Sorry for the simple question, it's just a clarification I really need, thank you.
You need a distinct Pipeline State Object for every unique combination of all states:
VS, PS, GS, etc. Shader Objects
Blend, Depth, and Raster state
Render Target format
Number of render targets (MRT vs. 1)
Sample count (MSAA vs. not)
In practice that means at least one PSO per unique material in your entire scene.
I am in the middle of rendering different textures on multiple meshes of a model, but I do not have much clues about the procedures. Someone suggested for each mesh, create its own descriptor sets and call vkCmdBindDescriptorSets() and vkCmdDrawIndexed() for rendering like this:
// Pipeline with descriptor set layout that matches the shared descriptor sets
vkCmdBindPipeline(...pipelines.mesh...);
...
// Mesh A
vkCmdBindDescriptorSets(...&meshA.descriptorSet... );
vkCmdDrawIndexed(...);
// Mesh B
vkCmdBindDescriptorSets(...&meshB.descriptorSet... );
vkCmdDrawIndexed(...);
However, the above approach is quite different from the chopper sample and vulkan's samples that makes me have no idea where to start the change. I really appreciate any help to guide me to a correct direction.
Cheers
You have a conceptual object which is made of multiple meshes which have different texturing needs. The general ways to deal with this are:
Change descriptor sets between parts of the object. Painful, but it works on all Vulkan-capable hardware.
Employ array textures. Each individual mesh fetches its data from a particular layer in the array texture. Of course, this restricts you to having each sub-mesh use textures of the same size. But it works on all Vulkan-capable hardware (up to 128 array elements, minimum). The array layer for a particular mesh can be provided as a push-constant, or a base instance if that's available.
Note that if you manage to be able to do it by base instance, then you can render the entire object with a multi-draw indirect command. Though it's not clear that a short multi-draw indirect would be faster than just baking a short sequence of drawing commands into a command buffer.
Employ sampler arrays, as Sascha Willems suggests. Presumably, the array index for the sub-mesh is provided as a push-constant or a multi-draw's draw index. The problem is that, regardless of how that array index is provided, it will have to be a dynamically uniform expression. And Vulkan implementations are not required to allow you to index a sampler array with a dynamically uniform expression. The base requirement is just a constant expression.
This limits you to hardware that supports the shaderSampledImageArrayDynamicIndexing feature. So you have to ask for that, and if it's not available, then you've got to work around that with #1 or #2. Or just don't run on that hardware. But the last one means that you can't run on any mobile hardware, since most of them don't support this feature as of yet.
Note that I am not saying you shouldn't use this method. I just want you to be aware that there are costs. There's a lot of hardware out there that can't do this. So you need to plan for that.
The person that suggested the above code fragment was me I guess ;)
This is only one way of doing it. You don't necessarily have to create one descriptor set per mesh or per texture. If your mesh e.g. uses 4 different textures, you could bind all of them at once to different binding points and select them in the shader.
And if you a take a look at NVIDIA's chopper sample, they do it pretty much the same way only with some more abstraction.
The example also sets up descriptor sets for the textures used :
VkDescriptorSet *textureDescriptors = m_renderer->getTextureDescriptorSets();
binds them a few lines later :
VkDescriptorSet sets[3] = { sceneDescriptor, textureDescriptors[0], m_transform_descriptor_set };
vkCmdBindDescriptorSets(m_draw_command[inCommandIndex], VK_PIPELINE_BIND_POINT_GRAPHICS, layout, 0, 3, sets, 0, NULL);
and then renders the mesh with the bound descriptor sets :
vkCmdDrawIndexedIndirect(m_draw_command[inCommandIndex], sceneIndirectBuffer, 0, inCount, sizeof(VkDrawIndexedIndirectCommand));
vkCmdDraw(m_draw_command[inCommandIndex], 1, 1, 0, 0);
If you take a look at initDescriptorSets you can see that they also create separate descriptor sets for the cubemap, the terrain, etc.
The LunarG examples should work similar, though if I'm not mistaken they never use more than one texture?
What is the best way to render complex meshes? I wrote different solutions below and wonder what is your opinion about them.
Let's take an example: how to render the 'Crytek-Sponza' mesh?
PS: I do not use Ubershader but only separate shaders
If you download the mesh on the following link:
http://graphics.cs.williams.edu/data/meshes.xml
and load it in Blender you'll see that the whole mesh is composed by about 400 sub-meshes with their own materials/textures respectively.
A dummy renderer (version 1) will render each of the 400 sub-mesh separately! It means (to simplify the situation) 400 draw calls with for each of them a binding to a material/texture. Very bad for performance. Very slow!
pseudo-code version_1:
foreach mesh in meshList //400 iterations :(!
mesh->BindVBO();
Material material = mesh->GetMaterial();
Shader bsdf = ShaderManager::GetBSDFByMaterial(material);
bsdf->Bind();
bsdf->SetMaterial(material);
bsdf->SetTexture(material->GetTexture()); //Bind texture
mesh->Render();
Now, if we take care of the materials being loaded we can notice that the Sponza is composed in reality of ONLY (if I have a good memory :)) 25 different materials!
So a smarter solution (version 2) should be to gather all the vertex/index data in batches (25 in our example) and not store VBO/IBO into sub-meshes classes but into a new class called Batch.
pseudo-code version_2:
foreach batch in batchList //25 iterations :)!
batch->BindVBO();
Material material = batch->GetMaterial();
Shader bsdf = ShaderManager::GetBSDFByMaterial(material);
bsdf->Bind();
bsdf->SetMaterial(material);
bsdf->SetTexture(material->GetTexture()); //Bind texture
batch->Render();
In this case each VBO contains data that share exactly the same texture/material settings!
It's so much better! Now I think 25 VBO for render the sponza is too much! The problem is the number of Buffer bindings to render the sponza! I think a good solution should be to allocate a new VBO if the first one if 'full' (for example let's assume that the maximum size of a VBO (value defined in the VBO class as attribute) is 4MB or 8MB).
pseudo-code version_3:
foreach vbo in vboList //for example 5 VBOs (depends on the maxVBOSize)
vbo->Bind();
BatchList batchList = vbo->GetBatchList();
foreach batch in batchList
Material material = batch->GetMaterial();
Shader bsdf = ShaderManager::GetBSDFByMaterial(material);
bsdf->Bind();
bsdf->SetMaterial(material);
bsdf->SetTexture(material->GetTexture()); //Bind texture
batch->Render();
In this case each VBO does not contain necessary data that share exactly the same texture/material settings! It depends of the sub-mesh loading order!
So OK, there are less VBO/IBO bindings but not necessary less draw calls! (are you OK by this affirmation ?). But in a general manner I think this version 3 is better than the previous one! What do you think about this ?
Another optimization should be to store all the textures (or group of textures) of the sponza model in array(s) of textures! But if you download the sponza package you will see that all texture has different sizes! So I think they can't be bound together because of their format differences.
But if it's possible, the version 4 of the renderer should use only less texture bindings rather than 25 bindings for the whole mesh! Do you think it's possible ?
So, according to you, what is the best way to render the sponza mesh ? Have you another suggestion ?
You are focused on the wrong things. In two ways.
First, there's no reason you can't stick all of the mesh's vertex data into a single buffer object. Note that this has nothing to do with batching. Remember: batching is about the number of draw calls, not the number of buffers you use. You can render 400 draw calls out of the same buffer.
This "maximum size" that you seem to want to have is a fiction, based on nothing from the real world. You can have it if you want. Just don't expect it to make your code faster.
So when rendering this mesh, there is no reason to be switching buffers at all.
Second, batching is not really about the number of draw calls (in OpenGL). It's really about the cost of the state changes between draw calls.
This video clearly spells out (about 31 minutes in), the relative cost of different state changes. Issuing two draw calls with no state changes between them is cheap (relatively speaking). But different kinds of state changes have different costs.
The cost of changing buffer bindings is quite small (assuming you're using separate vertex formats, so that changing buffers doesn't mean changing vertex formats). The cost of changing programs and even texture bindings is far greater. So even if you had to make multiple buffer objects (which again, you don't have to), that's not going to be the primary bottleneck.
So if performance is your goal, you'd be better off focusing on the expensive state changes, not the cheap ones. Making a single shader that can handle all of the material settings for the entire mesh, so that you only need to change uniforms between them. Use array textures so that you only have one texture binding call. This will turn a texture bind into a uniform setting, which is a much cheaper state change.
There are even fancier things you can do, involving base instance counts and the like. But that's overkill for a trivial example like this.
In short: What is the "preferred" way to wrap OpenGL's buffers, shaders and/or matrices required for a more high level "model" object?
I am trying to write this tiny graphics engine in C++ built on core OpenGL 3.3 and I would like to implement an as clean as possible solution to wrapping a higher level "model" object, which would contain its vertex buffer, global position/rotation, textures (and also a shader maybe?) and potentially other information.
I have looked into this open source engine, called GamePlay3D and don't quite agree with many aspects of its solution to this problem. Is there any good resource that discusses this topic for modern OpenGL? Or is there some simple and clean way to do this?
That depends a lot on what you want to be able to do with your engine. Also note that these concepts are the same with DirectX (or any other graphic API), so don't focus too much your search on OpenGL. Here are a few points that are very common in a 3D engine (names can differ):
Mesh:
A mesh contains submeshes, each submesh contains a vertex buffer and an index buffer. The idea being that each submesh will use a different material (for example, in the mesh of a character, there could be a submesh for the body and one for the clothes.)
Instance:
An instance (or mesh instance) references a mesh, a list of materials (one for each submesh in the mesh), and contains the "per instance" shader uniforms (world matrix etc.), usually grouped in a uniform buffer.
Material: (This part changes a lot depending on the complexity of the engine). A basic version would contain some textures, some render states (blend state, depth state), a shader program, and some shader uniforms that are common to all instances (for example a color, but that could also be in the instance depending on what you want to do.)
More complex versions usually separates the materials in passes (or sometimes techniques that contain passes) that contain everything that's in the previous paragraph. You can check Ogre3D documentation for more info about that and to take a look at one possible implementation. There's also a very good article called Designing a Data-Driven Renderer in GPU PRO 3 that describes an even more flexible system based on the same idea (but also more complex).
Scene: (I call it a scene here, but it could really be called anything). It provides the shader parameters and textures from the environment (lighting values, environment maps, this kind of things).
And I thinks that's it for the basics. With that in mind, you should be able to find your way around the code of any open-source 3D engine if you want the implementation details.
This is in addition to Jerem's excellent answer.
At a low level, there is no such thing as a "model", there is only buffer data and the code used to process it. At a high level, the concept of a "model" will differ from application to application. A chess game would have a static mesh for each chess piece, with shared textures and materials, but a first-person shooter could have complicated models with multiple parts, swappable skins, hit boxes, rigging, animations, et cetera.
Case study: chess
For chess, there are six pieces and two colors. Let's over-engineer the graphics engine to show how it could be done if you needed to draw, say, thousands of simultaneous chess games in the same screen, instead of just one game. Here is how you might do it.
Store all models in one big buffer. This buffer has all of the vertex and index data for all six models clumped together. This means that you never have to switch buffers / VAOs when you're drawing pieces. Also, this buffer never changes, except when the user goes into settings and chooses a different style for the chess pieces.
Create another buffer containing the current location of each piece in the game, the color of each piece, and a reference to the model for that piece. This buffer is updated every frame.
Load the necessary textures. Maybe the normals would be in one texture, and the diffuse map would be an array texture with one layer for white and another for black. The textures are designed so you don't have to change them while you're drawing chess pieces.
To draw all the pieces, you just have to update one buffer, and then call glMultiDrawElementsIndirect()... once per frame, and it draws all of the chess pieces. If that's not available, you can fall back to glDrawElements() or something else.
Analysis
You can see how this kind of design won't work for everything.
What if you have to stream new models into memory, and remove old ones?
What if the models have different size textures?
What if the models are more complex, with animations or forward kinematics?
What about translucent models?
What about hit boxes and physics data?
What about different LODs?
The problem here is that your solution, and even the very concept of what a "model" is, will be very different depending on what your needs are.
Currently, My app is using a large amount of memory after loading textures (~200Mb)
I am loading the textures into a char buffer, passing it along to OpenGL and then killing the buffer.
It would seem that this memory is used by OpenGL, which is doing its own texture management internally.
What measures could I take to reduce this?
Is it possible to prevent OpenGL from managing textures internally?
One typical solution is to keep track of which textures you are needing at a given position of your camera or time-frame, and only load those when you need (opposed to load every single texture at the loading the app). You will have to have a "manager" which controls the loading-unloading and bounding of the respective texture number (e.g. a container which associates a string, name of the texture, with an integer) assigned by the glBindTexture)
Other option is to reduce the overall quality/size of the textures you are using.
It would seem that this memory is used by OpenGL,
Yes
which is doing its own texture management internally.
No, not texture management. It just need to keep the data somewhere. On modern systems the GPU is shared by several processes running simultanously. And not all of the data may fit into fast GPU memory. So the OpenGL implementation must be able to swap data out. The GPU fast memory is not storage, it's just another cache level. Just like the system memory is cache for system storage.
Also GPUs may crash and modern drivers reset them in situ, without the user noticing. For this they need a full copy of the data as well.
Is it possible to prevent OpenGL from managing textures internally?
No, because this would either be tedious to do, or break things. But what you can do, is loading only the textures you really need for drawing a given scene.
If you look through my writings about OpenGL, you'll notice that for years I tell people not to writing silly things like "initGL" functions. Put everything into your drawing code. You'll go through a drawing scheduling phase anyway (you must sort translucent objects far-to-near, frustum culling, etc.). That gives you the opportunity to check which textures you need, and to load them. You can even go as far and load only lower resolution mipmap levels so that when a scene is initially shown it has low detail, and load the higher resolution mipmaps in the background; this of course requires appropriate setting of minimum and maximum mip levels to be set as either texture or sampler parameter.