Vulkan - when should I create a new pipeline?

Vulkan - when should I create a new pipeline? - c++

So I want to render two independent meshes in Vulkan. I'm dabbling in textures and the 1st mesh uses 4 of them while the 2nd uses 5. I'm doing indexed draws.
Each mesh has its own uniform buffer and sampler array packed into separate descriptor sets for simplicity, each one with a binding for the UBO and another binding for the samplers. The following code is run for each mesh, where descriptorSet is the descriptor set associated to a single mesh. filepaths is the vector of image paths that mesh in particular uses.
std::vector<VkWriteDescriptorSet> descriptorWrites;
descriptorWrites.resize(2);
VkDescriptorBufferInfo bufferInfo = {};
bufferInfo.buffer = buffers[i];
bufferInfo.offset = 0;
bufferInfo.range = sizeof(UniformBufferObject);
descriptorWrites[0].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
descriptorWrites[0].dstSet = descriptorSet;
descriptorWrites[0].dstBinding = 0;
descriptorWrites[0].dstArrayElement = 0;
descriptorWrites[0].descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC;
descriptorWrites[0].descriptorCount = 1;
descriptorWrites[0].pBufferInfo = &bufferInfo;
std::vector<VkDescriptorImageInfo> imageInfos;
imageInfos.resize(filepaths.size());
for (size_t j = 0; j < filepaths.size(); j++) {
imageInfos[j].imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
imageInfos[j].imageView = imageViews[j];
imageInfos[j].sampler = samplers[j];
}
descriptorWrites[1].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
descriptorWrites[1].dstSet = descriptorSet;
descriptorWrites[1].dstBinding = 1;
descriptorWrites[1].dstArrayElement = 0;
descriptorWrites[1].descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
descriptorWrites[1].descriptorCount = imageInfos.size();
descriptorWrites[1].pImageInfo = imageInfos.data();
vkUpdateDescriptorSets(devicesHandler->device, descriptorWrites.size(), descriptorWrites.data(), 0, nullptr);
So in order to tell Vulkan how these descriptor sets are laid out I need of course two descriptor set layouts i.e. one per mesh, which differ in the binding for the samplers due to the different size of filepaths:
// <Stuff for binding 0 for UBO here>
// ...
VkDescriptorSetLayoutBinding layoutBinding = {};
layoutBinding.binding = 1;
layoutBinding.descriptorCount = filepaths.size();
layoutBinding.descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
layoutBinding.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT;
Now, when I create the pipeline I need to provide the pipeline layout. I'm doing it as follows, where layouts are the descriptor set layouts of the meshes stuffed into a vector.:
VkPipelineLayoutCreateInfo pipelineLayoutInfo = {};
pipelineLayoutInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO;
pipelineLayoutInfo.setLayoutCount = layouts.size();
pipelineLayoutInfo.pSetLayouts = layouts.data();
Finally before rendering I bind the aproppriate descriptor set.
Naively I would think that way to define the pipeline layout this is the way to go (simply taking all the involved layouts and passing them on pSetLayouts) but it's not working. The error I get is:
descriptorSet #0 being bound is not compatible with overlapping descriptorSetLayout at index 0 of pipelineLayout 0x6e due to: DescriptorSetLayout 87 has 5 descriptors, but DescriptorSetLayout 88, which comes from pipelineLayout, has 6 descriptors.. The Vulkan spec states: Each element of pDescriptorSets must have been allocated with a VKDescriptorSetLayout that matches (is the same as, or identically defined as) the VkDescriptorSetLayout at set n in layout, where n is the sum of firstSet and the index into pDescriptorSets.
I also noticed that if I reduce the number of textures used from 5 to 4 in the second mesh so they match the 4 from the first mesh, then it works. So I'm wondering if I need to create a pipeline for every possible configuration of the layouts? That is, one pipeline with setLayoutCount set to 4 and another set to 5, and bind the corresponding one when I'm going to draw one mesh or the other? Is that stupid? Am I missing something?
Worth noting is that if I render each mesh alone everything runs smoothly. The problem arises when I put both of them in the scene.
Also, I know buffers should be allocated consecutively and taking into account alignments and that what I'm doing there is a bad practice - but I am just not dealing with that yet.

Passing multiple set layouts to the pipeline means that you want the pipeline to be able to access all the bindings in both sets simultaneously, e.g. the shaders have access to two UBOs at (set=0, binding=0) and (set=1, binding=0), four textures at (set=0, binding=1), and five textures as (set=1, binding=1).
Then when you bind the set for the second mesh as the only set, you get the incompatibility because it has a different layout (5 textures) than the pipeline expects for set 0 (4 textures).
So yes, when you have different descriptor set layouts, you need different pipelines. If you use the pipeline cache, much of the compilation may actually be reused between the two pipelines.
If you're trying to use the same pipeline for both meshes, then presumably the code in your shader that accesses the fifth texture is conditional, based on a uniform or something? The alternative is to bind a dummy texture when drawing the 4-texture mesh; since it won't be accessed, it doesn't matter what its contents are, it can be 1x1, etc. Then you can use the same 5-texture set layout and same pipeline for both meshes.

Related

How to draw a terrain model efficiently from Esri Grid (osg)?

I have many Esri Grid files (https://en.wikipedia.org/wiki/Esri_grid#ASCII) and I would like to render them in 3D without losing precision, I am using OpenSceneGraph.
The problem is this grids are around 1000x1000 (or more) points, so when I extract the vertices, then compute the triangles to create the geometry, I end up having millions of them and the interaction with the scene is impossible (frame rate drops to 0).
I've tried several approches:
Triangle list
Basically, as I read the file, I fill an array with 3 vertices per triangle (this leads to duplication);
osg::ref_ptr<osg::Geode> l_pGeodeSurface = new osg::Geode;
osg::ref_ptr<osg::Geometry> l_pGeometrySurface = new osg::Geometry;
osg::ref_ptr<osg::Vec3Array> l_pvTrianglePoints = osg::Vec3Array;
osg::ref_ptr<osg::Vec3Array> l_pvOriginalPoints = osg::Vec3Array;
... // Read the file and fill l_pvOriginalPoints
for(*triangle inside the file*)
{
... // Compute correct triangle indices (l_iP1, l_iP2, l_iP3)
// Push triangle vertices inside the array
l_pvTrianglePoints->push_back(l_pvOriginalPoints->at(l_iP1));
l_pvTrianglePoints->push_back(l_pvOriginalPoints->at(l_iP2));
l_pvTrianglePoints->push_back(l_pvOriginalPoints->at(l_iP3));
}
l_pGeometrySurface->setVertexArray(l_pvTrianglePoints);
l_pGeometrySurface->addPrimitiveSet(new osg::DrawArrays(GL_TRIANGLES, 0, 3, l_pvTrianglePoints->size()));
Indexed triangle list
Same as before, but the array contains the every vertices just once and I create a second array of indices (basically i tell osg how to build triangles, no duplication)
osg::ref_ptr<osg::Geode> l_pGeodeSurface = new osg::Geode;
osg::ref_ptr<osg::Geometry> l_pGeometrySurface = new osg::Geometry;
osg::ref_ptr<osg::DrawElementsUInt> l_pIndices = new osg::DrawElementsUInt(osg::PrimitiveSet::TRIANGLES, *number of indices*);
osg::ref_ptr<osg::Vec3Array> l_pvOriginalPoints = osg::Vec3Array;
... // Read the file and fill l_pvOriginalPoints
for(i = 0; i < *number of indices*; i++)
{
... // Compute correct triangle indices (l_iP1, l_iP2, l_iP3)
// Push vertices indices inside the array
l_pIndices->at(i) = l_iP1;
l_pIndices->at(i+1) = l_iP2;
l_pIndices->at(i+2) = l_iP3;
}
l_pGeometrySurface->setVertexArray(l_pvOriginalPoints );
l_pGeometrySurface->addPrimitiveSet(l_pIndices.get());
Instancing
this was a bit of an experiment, since I've never used shaders, I tought I could instance a single triangle, then manipulate its coordinates in a vertex shader for every triangle in my scene, using transformation matrices (passing the matrices as a uniform array, one for triangle). I ended up with too many uniforms just with a grid 20x20.
I used these links as a reference:
https://learnopengl.com/Advanced-OpenGL/Instancing,
https://books.google.it/books?id=x_RkEBIJeFQC&pg=PT265&lpg=PT265&dq=osg+instanced+geometry&source=bl&ots=M8ii8zn8w7&sig=ACfU3U0_92Z5EGCyOgbfGweny4KIUfqU8w&hl=en&sa=X&ved=2ahUKEwj-7JD0nq7qAhUXxMQBHcLaAiUQ6AEwAnoECAkQAQ#v=onepage&q=osg%20instanced%20geometry&f=false
None of the above solved my issue, what else can I try? Am I missing something in terms of rendering techinques? I thought it was fairly simple task, but I'm kind of stuck.

I feel like you should consider taking a step back. If you're visualizing GIS-based terrain data, osgEarth is really designed for doing this and has fairly efficient LOD tools for large terrains. Do you need the data always represented at maximum full LOD or are you looking for dynamic LOD to improve frame rate?
Depending on your goals and requirements you might want to look at some more advanced terrain rendering techniques, like rightfield tracing, etc. If the terrain is always static, you can precompute quadtrees and Signed Distance Functions and trace against the heightfield.

Compute to Graphics Dependencies

I am doing a Marching cube algorithm in a Compute shader. The vertices generated by the compute stage will be input to the vertex stage.
Compute -> Vertices -> Render
There is no way of knowing how many vertices that the compute stage will output, so I need a storage buffer looking something like this:
layout(set = 1, binding = 0) buffer Count{
int value;
} count;
layout(set = 2, binding = 0) buffer Mesh {
vec4 vertices[1<<15];
} mesh;
The vertices do not need a roundtrip to the CPU, but the count is a variable used by the vkCmdDraw command. So I need to put the count buffer in host visible memory, map that memory and do a memcpy after the compute stage. Is this a good way of solving this problem or is there some other way where I don't have to read back data to the CPU?

Well, this is exactly what vkCmdDrawIndirect is for. The vertex count is stored in a Vkuffer, which makes the CPU round-trip unnecessary.

Can you modify a uniform from within the shader? If so. how?

So I wanted to store all my meshes in one large VBO. The problem is, how do you do have just one draw call, but let every mesh have its own model to world matrix?
My idea was to submit an array of matrices to a uniform before drawing. In the VBO I would make the color of every first vertex of a mesh negative (So I'd be using the signing bit to check whether a vertex was the first of a mesh).
Okay, so I can detect when a new mesh has started and I have an array of matrices ready and probably a uniform called 'index'. But how do I increase this index by one every time I encounter a new mesh?
Can you modify a uniform from within the shader? If so, how?

Can you modify a uniform from within the shader?
If you could, it wouldn't be uniform anymore, would it?
Furthermore, what you're wanting to do cannot be done even with Image Load/Store or SSBOs, both of which allow shaders to write data. It won't work because vertex shader invocations are not required to be executed sequentially. Many happen at the same time, and there's no way for any shader invocation to know that it will happen "after" the "first vertex" in a mesh.
The simplest way to deal with this is the obvious solution. Render each mesh individually, but set the uniforms for each mesh before each draw call. Without changing buffers between draws, of course. Uniform changes, while not exactly cheap, aren't the most expensive state changes that exist.
There are more complicated drawing methods that could allow you more performance. But that form is adequate for most needs. You've already done the hard part: you removed the need for any state change (textures, buffers, vertex formats, etc) except uniform state.

There are two approaches to minimize draw calls - instancing and batching. The first (instancing) allows you to draw multiple copies of same meshes in one draw call, but it depends on the API (is available from OpenGL 3.1). Batching is similar to instancing but allows you to draw different meshes. Both of these approaches have restrictions - meshes should be with the same materials and shaders.
If you would to draw different meshes in one VBO then instancing is not an option. So, batching requires keeping all meshes in 'big' VBO with applied world transform. It not a problem with static meshes, but have some discomfort with animated. I give you some pseudocode with batching implementation
struct SGeometry
{
uint64_t offsetVB;
uint64_t offsetIB;
uint64_t sizeVB;
uint64_t sizeIB;
glm::mat4 oldTransform;
glm::mat4 transform;
}
std::vector<SGeometry> cachedGeometries;
...
void CommitInstances()
{
uint64_t vertexOffset = 0;
uint64_t indexOffset = 0;
for (auto instance in allInstances)
{
Copy(instance->Vertexes(), VBO);
for (uint64_t i = 0; i < instances->Indices().size(); ++i)
{
auto index = instances->Indices()[i];
index += indexOffset;
IBO[i] = index;
}
cachedGeometries.push_back({vertexOffset, indexOffset});
vertexOffset += instance->Vertexes().size();
indexOffset += instance->Indices().size();
}
Commit(VBO);
Commit(IBO);
}
void ApplyTransform(glm::mat4 modelMatrix, uint64_t instanceId)
{
const SGeometry& geom = cachedGeometries[i];
glm::mat4 inverseOldTransform = glm::inverse(geom.oldTransform);
VertexStream& stream = VBO->GetStream(Position, geom.offsetVB);
for (uint64_t i = 0; i < geom.sizeVB; ++i)
{
glm::vec3 pos = stream->Get(i);
// We need to revert absolute transformation before applying new
pos = glm::vec3(inverseOldNormalTransform * glm::vec4(pos, 1.0f));
pos = glm::vec3(normalTransform * glm::vec4(pos, 1.0f));
stream->Set(i);
}
// .. Apply normal transformation
}
GPU Gems 2 has a good article about geometry instancing http://www.amazon.com/GPU-Gems-Programming-High-Performance-General-Purpose/dp/0321335597

Sum of absolute difference of 2 geometries within a shader in unity

I am trying to do a Sum of absolute difference within my shader and write back the single result back to a uniform float in a in unity.
In the shader I have 2 geometries with the same number of vertices that map one to one.
// substract vertices
float norm = 10;
float error=infereCrater.vertex.y-v.vertex.y;
error = error*error*norm;
o.debugColor = float3(error,1-error ,0.0f);
//////
o.posWorld =mul(_Object2World,v.vertex);
o.normalWorld = normalize(mul(float4(v.normal,0.0),_World2Object).xyz);
o.tangentWorld = normalize(mul(float4(v.tangent,0.0),_World2Object).xyz);
o.binormalWorld = cross(o.normalWorld,o.tangentWorld);
o.tex = v.texcoord;
o.pos = mul(UNITY_MATRIX_MVP,v.vertex);
TRANSFER_VERTEX_TO_FRAGMENT(o);
return o;
}
I am available to calculate the error for each individual vertex and change the color of the surface based on the difference.
I hit a road block where I don't know how to sync all the threads and start adding up the values.
Is there a way to call another vertex shader after the first one is done?
How can the vertex shader read the values of adjacent vertex to it? (don't think its possible because in local memory of thread)
Or its possible to have a global array, to store the difference values, copy this to the CPU (which I don't want because of latency) and add them in the CPU?
I don't want to use compute shader because I am not in Windows

OpenGL render multiple objects using single VBO and updata object's matrices using another VBO

So, I need the way to render multiple objects(not instances) using one draw call. Actually I know how to do this, just to place data into single vbo/ibo and render, using glDrawElements.
The question is: what is efficient way to update uniform data without setting it up for every single object, using glUniform...?
How can I setup one buffer containing all uniform data of dozens of objects, include MVP matrices, bind it and perform render using single draw call?
I tried to use UBOs, but it's not what I need at all.
For rendering instances we just place uniform data, including matrices, at another VBO and set up attribute divisor using glVertexAttribDivisor, but it only works for instances.
Is there a way to do that I want in OpenGL? If not, what can I do to overcome overheads of setting uniform data for dozens of objects?
For example like this:
{
// setting up VBO
glGenBuffers(1, &vbo);
glBindBuffer(vbo);
glBufferData(..., data_size);
// setup buffer
for(int i = 0; i < objects_num; i++)
glBufferSubData(...offset, size, &(objects[i]));
// the same for IBO
.........
// when setup some buffer, that will store all uniforms, for every object
.........
glDrawElements(...);
}
Thanks in advance for helping.

If you're ok with requiring OpenGL 4.3 or higher, I believe you can render this with a single draw call using glMultiDrawElementsIndirect(). This allows you to essentially make multiple draw calls with a single API call. Each sub-call is defined by values in a struct of the form:
typedef struct {
GLuint count;
GLuint instanceCount;
GLuint firstIndex;
GLuint baseVertex;
GLuint baseInstance;
} DrawElementsIndirectCommand;
Since you do not want to draw multiple instances of the same vertices, you use 1 for the instanceCount in each draw call. The key idea is that you can still use instancing by specifying a different baseInstance value for each one. So each object will have a different gl_InstanceID value, and you can use instanced attributes for the values (matrices, etc) that you want to vary per object.
So if you currently have a rendering loop:
for (int k = 0; k < objectCount; ++k) {
// set uniforms for object k.
glDrawElements(GL_TRIANGLES, object[k].indexCount,
GL_UNSIGNED_INT, object[k].indexOffset * sizeof(GLuint));
}
you would instead fill an array of the struct defined above with the arguments:
DrawElementsIndirectCommand cmds[objectCount];
for (int k = 0; k < objectCount; ++k) {
cmds[k].count = object[k].indexCount;
cmds[k].instanceCount = 1;
cmds[k].firstIndex = object[k].indexOffset;
cmds[k].baseVertex = 0;
cmds[k].baseInstance = k;
}
// Rest of setup.
glMultiDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_INT, 0, objectCount, 0);
I didn't provide code for the full setup above. The key steps include:
Drop the cmds array into a buffer, and bind it as GL_DRAW_INDIRECT_BUFFER.
Store the per-object values in a VBO. Set up the corresponding vertex attributes, which includes specifying them as instanced with glVertexAttribDivisor(1).
Set up the per-vertex attributes as usual.
Set up the index buffer as usual.
For this to work, the indices for all the objects will have to be in the same index buffer, and the values for each attribute will have to be in the same VBO across all objects.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Vulkan - when should I create a new pipeline? - c++

Related

How to draw a terrain model efficiently from Esri Grid (osg)?

Compute to Graphics Dependencies

Can you modify a uniform from within the shader? If so. how?

Sum of absolute difference of 2 geometries within a shader in unity

OpenGL render multiple objects using single VBO and updata object's matrices using another VBO

Categories

Resources