glMultiDrawElementsIndirect custom drawID with multiple instances per DrawElementsIndirectCommand

glMultiDrawElementsIndirect custom drawID with multiple instances per DrawElementsIndirectCommand - opengl

If I setup a custom ascending integer drawID vertex buffer stream for per instance data with:
glVertexAttribDivisor(drawIDVertexStreamIdx, 1)
Using glMultiDrawElementsIndirect() given:
struct DrawElementsIndirectCommand
{
uint count;
uint instanceCount;
uint firstIndex;
uint baseVertex;
uint baseInstance;
};
When setting the instanceCount to more than one, I am confused looking at old notes and online about what exactly happens regards the drawID passed to the shader?
If say there are two DrawElementsIndirectCommand records invoked from one glMultiDrawElementsIndirect(), with the first record having an instanceCount of 3, and the second an instanceCount of say 1, what do the instances actually see in the shader? (Assuming the drawID vertex stream contains 0,1,2,3 etc)
Are they supposed to see 0,1,2 for the first DrawElementsIndirectCommand record instances, and 3 for the second DrawElementsIndirectCommand record instance?
All the examples I can find online seem to specifically set instanceCount to one and rely on multiple DrawElementsIndirectCommand records which makes me now doubt this understanding is correct?
const int CustomDrawIDIdx = 1;
const int VertexCount = 4;
DrawElementsIndirectCommand drawCallRecords[2] =
{
{ VertexCount, 3, 0, 0, 0 },
{ VertexCount, 1, 0, 0, 0 },
};
...
// Attempt to set up custom drawID from a vertex attribute, where the vertex stream for it is a sequence of integers 0, 1, 2 etc
glVertexAttribIPointer(CustomDrawIDIdx, 1, GL_UNSIGNED_INT, 0, NULL);
glVertexAttribDivisor(CustomDrawIDIdx, 1);
...
glMultiDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_INT, &drawCallRecords, 2, 0);
In the vertex shader:
layout (location = 1) in uint customDrawID;
void main()
{
bool match = (customDrawID== gl_InstanceID);
...
}
So this should cause 4 actual draw calls, as actual draw calls caused by glMultiDrawElementsIndirect() are determined by the number DrawElementsIndirectCommand records and their contained instance counts.
For each draw call gl_InstanceID should start from zero and count up for each instance, but go back to zero after each DrawElementsIndirectCommand record is processed?
So gl_InstanceID should do (0, 1, 2) for drawCallRecords[0], and then (0) for drawCallRecords[1]? What does the customDrawID do?
I am also curious if ARB_shader_draw_parameters on nVidia (GTX1070+) is still not recommended compared to using a custom drawID from a vertex stream?
*** UPDATE to reflect the answer by the very patient and helpful Nicol Bolas:
So given:
DrawElementsIndirectCommand drawCallRecords[2] =
{
{ VertexCount, 3, 0, 0, 0 },
{ VertexCount, 1, 0, 0, 3 /*baseInstance will push us along in customDrawID vertex stream*/ },
};
Then customDrawID will then do (0, 1, 2) and (3) across all instances in the multidrawindirect.
Which means that each drawn instance across the TWO draw calls, and the 3 + 1 total instances drawn (3 instances of one 'object', 1 instance of another 'object'), in one multidrawindirect call could reference completely unique transformation matrices for example. And in essence that emulates the functionality of gl_DrawID as long as you keep bumping the baseInstance like that (exclusive sum style) for each DrawElementsIndirectCommand record.
The baseInstance in each DrawElementsIndirectCommand record will push the offset into the customDrawID vertex stream, giving a customDrawID that is unique to each instance across all objects drawn.

So this should cause 4 actual draw calls, as actual draw calls caused by glMultiDrawElementsIndirect() are determined by the number DrawElementsIndirectCommand records and their contained instance counts.
No. Instances and "draw calls" are not the same thing.
A single draw call is defined as if by a call to glDraw*InstancedBaseVertexBaseInstance; that's what happens when the system reads a single entry from the array of draw data. That single draw call includes all instances. Instancing is a thing that happens within a draw call.
This is also why per-instance values are not guaranteed to be dynamically uniform expressions.
Individual draws within a multi-draw command are, gl_DrawID aside, completely separate from one another. They do not interact. The values your shader gets for instanced arrays or gl_InstanceID would be no different from issuing each draw call separately.
Your "custom drawID" is not a draw ID at all; it is an instanced array value. Therefore, it follows the rules of instancing, and cares nothing for which draw call it is in.
bool match = (customDrawID== gl_InstanceID);
No.
Even if the actual array you use to feed customDrawID is just a zero-based integer index, instances arrays and gl_InstanceID don't work the same way.
gl_InstanceID ignores the base instance. Instance arrays do not. As such, the instance value fetched from any instance array will always be offset first by the base instance.
So if you want the customDrawID for a specific draw call to start from a particular value, then you set baseInstance to be the instance index of that particular value. So given a zero-based integer index array, if you want a particular draw call to have its first instance receive the customDrawID value of "3", you set baseInstance to 3.

Related

What is the relationship with index in glBindBufferBase() and stream number in geometry shader in OpenGL?

I want to transform data from geometry shaders to feedback buffer,so I set the names of variables like this:
char *Varying[] = {"lColor","lPos","gl_NextBuffer","rColor","rPos"};
then bind two buffers vbos[2] to transformFeedback object Tfb,vbos[0] to bind point 0, and vbos[1] to bind point 2:
glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 0, vbos[0]);
glBindBufferBase(GL_TRANSFORM_FEEDBACK_BUFFER, 2, vbos[1]);
and set stream number like this:
layout(stream=0) out vec4 lColor;
layout(stream=0) out vec4 lPos;
layout(stream = 2) out rVert
{
vec4 rColor;
vec4 rPos;
};
So I thought lColor and lPos are transformed to vbos[0],rColor and rPos are transformed to vbos[1],but nothing came out,unless I change vbos[1] to bind point 1,and stream number of rvert to 1.
I think there is three variables about indexes for binding geometry shader output and feedback buffers,first is index in glBindBufferBase(target,index,id),second is sequences of names in glTransformFeedbackVaryings(program,count,names,buffermode),third is stream number of variables in geometry shader,what is the relationship of these three parameters?
And I found that no matter what parameters I set the glDrawTransformFeedbackStream ,picture didn't change,Why?

The geometry shader layout qualifier stream has no direct relationship to the TF buffer binding index assignment. There is one caveat here: variables from two different streams can't be assigned to the same buffer binding index. But aside from that, the two have no relationship.
In geometry shader transform feedback, streams are written independently of each other. When you output a vertex/primitive, you say which stream the GS invocation is writing. Only the output variables assigned to that stream are written, and the system keeps track of how much stuff has been written to each stream.
But how the output variables map to feedback buffers is entirely separate (aside from the aforementioned caveat).
In your example, glTransformFeedbackVaryings with {"lColor","lPos","gl_NextBuffer","rColor","rPos"} does the following. It starts with buffer index 0. It assigns lColor and lPos to buffer index 0. gl_NextBuffer causes the system to increment the value of the current buffer index. That value is 0, so incrementing it makes it 1. It then assigns rColor and rPos to buffer 1.
That's why your code doesn't work.
You can skip buffer indices in the TF buffer bindings. To do that, you have to use gl_NextBuffer twice, since each use increments the current buffer index.
Or if your GL version is high enough, you can just assign the buffer bindings and offsets directly in your shader:
layout(stream=0, xfb_offset = 0, xfb_buffer = 0) out vec4 lColor;
layout(stream=0, xfb_offset = 16, xfb_buffer = 0) out vec4 lPos;
layout(stream = 2, xfb_offset = 0, xfb_buffer = 2) out rVert
{
vec4 rColor;
vec4 rPos;
};

Vulkan - when should I create a new pipeline?

So I want to render two independent meshes in Vulkan. I'm dabbling in textures and the 1st mesh uses 4 of them while the 2nd uses 5. I'm doing indexed draws.
Each mesh has its own uniform buffer and sampler array packed into separate descriptor sets for simplicity, each one with a binding for the UBO and another binding for the samplers. The following code is run for each mesh, where descriptorSet is the descriptor set associated to a single mesh. filepaths is the vector of image paths that mesh in particular uses.
std::vector<VkWriteDescriptorSet> descriptorWrites;
descriptorWrites.resize(2);
VkDescriptorBufferInfo bufferInfo = {};
bufferInfo.buffer = buffers[i];
bufferInfo.offset = 0;
bufferInfo.range = sizeof(UniformBufferObject);
descriptorWrites[0].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
descriptorWrites[0].dstSet = descriptorSet;
descriptorWrites[0].dstBinding = 0;
descriptorWrites[0].dstArrayElement = 0;
descriptorWrites[0].descriptorType = VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC;
descriptorWrites[0].descriptorCount = 1;
descriptorWrites[0].pBufferInfo = &bufferInfo;
std::vector<VkDescriptorImageInfo> imageInfos;
imageInfos.resize(filepaths.size());
for (size_t j = 0; j < filepaths.size(); j++) {
imageInfos[j].imageLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL;
imageInfos[j].imageView = imageViews[j];
imageInfos[j].sampler = samplers[j];
}
descriptorWrites[1].sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET;
descriptorWrites[1].dstSet = descriptorSet;
descriptorWrites[1].dstBinding = 1;
descriptorWrites[1].dstArrayElement = 0;
descriptorWrites[1].descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
descriptorWrites[1].descriptorCount = imageInfos.size();
descriptorWrites[1].pImageInfo = imageInfos.data();
vkUpdateDescriptorSets(devicesHandler->device, descriptorWrites.size(), descriptorWrites.data(), 0, nullptr);
So in order to tell Vulkan how these descriptor sets are laid out I need of course two descriptor set layouts i.e. one per mesh, which differ in the binding for the samplers due to the different size of filepaths:
// <Stuff for binding 0 for UBO here>
// ...
VkDescriptorSetLayoutBinding layoutBinding = {};
layoutBinding.binding = 1;
layoutBinding.descriptorCount = filepaths.size();
layoutBinding.descriptorType = VK_DESCRIPTOR_TYPE_COMBINED_IMAGE_SAMPLER;
layoutBinding.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT;
Now, when I create the pipeline I need to provide the pipeline layout. I'm doing it as follows, where layouts are the descriptor set layouts of the meshes stuffed into a vector.:
VkPipelineLayoutCreateInfo pipelineLayoutInfo = {};
pipelineLayoutInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO;
pipelineLayoutInfo.setLayoutCount = layouts.size();
pipelineLayoutInfo.pSetLayouts = layouts.data();
Finally before rendering I bind the aproppriate descriptor set.
Naively I would think that way to define the pipeline layout this is the way to go (simply taking all the involved layouts and passing them on pSetLayouts) but it's not working. The error I get is:
descriptorSet #0 being bound is not compatible with overlapping descriptorSetLayout at index 0 of pipelineLayout 0x6e due to: DescriptorSetLayout 87 has 5 descriptors, but DescriptorSetLayout 88, which comes from pipelineLayout, has 6 descriptors.. The Vulkan spec states: Each element of pDescriptorSets must have been allocated with a VKDescriptorSetLayout that matches (is the same as, or identically defined as) the VkDescriptorSetLayout at set n in layout, where n is the sum of firstSet and the index into pDescriptorSets.
I also noticed that if I reduce the number of textures used from 5 to 4 in the second mesh so they match the 4 from the first mesh, then it works. So I'm wondering if I need to create a pipeline for every possible configuration of the layouts? That is, one pipeline with setLayoutCount set to 4 and another set to 5, and bind the corresponding one when I'm going to draw one mesh or the other? Is that stupid? Am I missing something?
Worth noting is that if I render each mesh alone everything runs smoothly. The problem arises when I put both of them in the scene.
Also, I know buffers should be allocated consecutively and taking into account alignments and that what I'm doing there is a bad practice - but I am just not dealing with that yet.

Passing multiple set layouts to the pipeline means that you want the pipeline to be able to access all the bindings in both sets simultaneously, e.g. the shaders have access to two UBOs at (set=0, binding=0) and (set=1, binding=0), four textures at (set=0, binding=1), and five textures as (set=1, binding=1).
Then when you bind the set for the second mesh as the only set, you get the incompatibility because it has a different layout (5 textures) than the pipeline expects for set 0 (4 textures).
So yes, when you have different descriptor set layouts, you need different pipelines. If you use the pipeline cache, much of the compilation may actually be reused between the two pipelines.
If you're trying to use the same pipeline for both meshes, then presumably the code in your shader that accesses the fifth texture is conditional, based on a uniform or something? The alternative is to bind a dummy texture when drawing the 4-texture mesh; since it won't be accessed, it doesn't matter what its contents are, it can be 1x1, etc. Then you can use the same 5-texture set layout and same pipeline for both meshes.

OpenGL - Is vertex attribute state bound to specific VBOs?

As I understand VAOs/VBOs currently, a VAO retains all the attribute information that has been set up since it was bound, eg. the offset, stride, number of components, etc. of a given vertex attribute within a VBO.
What I seem to be unclear on is how VAOs and VBOs work together. A lot of the examples I have seen specify the vertex attributes with respect to the currently bound VBO, and when the VAO is bound the data in the VBO become accessible. One way I can see of using VAOs in this way would be to have one per object (where each object uses its own VBO), but I've read that this is poor performance-wise because of switching between many VAOs unnecessarily. I also would rather like to avoid having to store all my object data in one monolithic VBO because I will need to add and remove objects within my scene at any time - as a 3D editor, I feel the application would be much better suited to having each geometry object own its own buffer, rather than in some large, preallocated VBO. (Is this a correct assumption?)
My question therefore is whether one VAO can store vertex attribute configurations independently of the VBOs? Would I be able to configure a VAO to expect data in a certain format (eg. position, normal, UV) and then "swap in" different VBOs as I draw the different geometry objects, or is the format information essentially bound only to the VBO itself? If the latter, is it worth me using VAOs at all?

ARB_vertex_attrib_binding allows you to separate Vao attribute format and buffer binding.
https://www.opengl.org/wiki/Vertex_Specification#Separate_attribute_format
Internally, when you configure your Vao, Vertex buffer is automatically associated with attribute index. With ARB_vertex_attrib_binding, you have new gl functions to define Attribute formats independently from the bound buffer, which may be switched with VertexBuffer functions.
Here some piece of code in c# with openTK: (full surce: https://github.com/jpbruyere/GGL/tree/ottd/Tetra )
The solution here is to build a VAO with all your meshes concatenated, keeping for each of them only
BaseVertex = the vertice offset in the VAO
IndicesOffset = the offset in the Element buffer (ebo index)
IndicesCount = and the total indice count of the model
protected void CreateVAOs()
{
//normal vao binding
vaoHandle = GL.GenVertexArray();
GL.BindVertexArray(vaoHandle);
GL.EnableVertexAttribArray(0);
GL.BindBuffer(BufferTarget.ArrayBuffer, positionVboHandle);
GL.VertexAttribPointer(0, 3, VertexAttribPointerType.Float, true, Vector3.SizeInBytes, 0);
... other attrib bindings come here
//ARB vertex attrib binding use for fast instance buffers switching
//note that I use 4 attrib indices to bind a matrix
GL.VertexBindingDivisor (instanceBufferIndex, 1);
for (int i = 0; i < 4; i++) {
GL.EnableVertexAttribArray (instanceBufferIndex + i);
GL.VertexAttribBinding (instanceBufferIndex+i, instanceBufferIndex);
GL.VertexAttribFormat(instanceBufferIndex+i, 4, VertexAttribType.Float, false, Vector4.SizeInBytes * i);
}
if (indices != null)
GL.BindBuffer(BufferTarget.ElementArrayBuffer, eboHandle);
GL.BindVertexArray(0);
}
Then, I define Instances of mesh with just a Matrix array for each, that's a normal buffer creation, but not staticaly bound to the vao.
instancesVboId = GL.GenBuffer ();
GL.BindBuffer (BufferTarget.ArrayBuffer, instancesVboId);
GL.BufferData<Matrix4> (BufferTarget.ArrayBuffer,
new IntPtr (modelMats.Length * Vector4.SizeInBytes * 4),
modelMats, BufferUsageHint.DynamicDraw);
GL.BindBuffer (BufferTarget.ArrayBuffer, 0);
To render such vao, I loop inside my instance array:
public void Bind(){
GL.BindVertexArray(vaoHandle);
}
public void Render(PrimitiveType _primitiveType){
foreach (VAOItem item in Meshes) {
GL.ActiveTexture (TextureUnit.Texture1);
GL.BindTexture (TextureTarget.Texture2D, item.NormalMapTexture);
GL.ActiveTexture (TextureUnit.Texture0);
GL.BindTexture (TextureTarget.Texture2D, item.DiffuseTexture);
//Here I bind the Instance buffer with my matrices
//that's a fast switch without changing vao confing
GL.BindVertexBuffer (instanceBufferIndex, item.instancesVboId, IntPtr.Zero,Vector4.SizeInBytes * 4);
//here I draw instanced with base vertex
GL.DrawElementsInstancedBaseVertex(_primitiveType, item.IndicesCount,
DrawElementsType.UnsignedShort, new IntPtr(item.IndicesOffset*sizeof(ushort)),
item.modelMats.Length, item.BaseVertex);
}
}
The final VAO is bound only once.

OpenGL render multiple objects using single VBO and updata object's matrices using another VBO

So, I need the way to render multiple objects(not instances) using one draw call. Actually I know how to do this, just to place data into single vbo/ibo and render, using glDrawElements.
The question is: what is efficient way to update uniform data without setting it up for every single object, using glUniform...?
How can I setup one buffer containing all uniform data of dozens of objects, include MVP matrices, bind it and perform render using single draw call?
I tried to use UBOs, but it's not what I need at all.
For rendering instances we just place uniform data, including matrices, at another VBO and set up attribute divisor using glVertexAttribDivisor, but it only works for instances.
Is there a way to do that I want in OpenGL? If not, what can I do to overcome overheads of setting uniform data for dozens of objects?
For example like this:
{
// setting up VBO
glGenBuffers(1, &vbo);
glBindBuffer(vbo);
glBufferData(..., data_size);
// setup buffer
for(int i = 0; i < objects_num; i++)
glBufferSubData(...offset, size, &(objects[i]));
// the same for IBO
.........
// when setup some buffer, that will store all uniforms, for every object
.........
glDrawElements(...);
}
Thanks in advance for helping.

If you're ok with requiring OpenGL 4.3 or higher, I believe you can render this with a single draw call using glMultiDrawElementsIndirect(). This allows you to essentially make multiple draw calls with a single API call. Each sub-call is defined by values in a struct of the form:
typedef struct {
GLuint count;
GLuint instanceCount;
GLuint firstIndex;
GLuint baseVertex;
GLuint baseInstance;
} DrawElementsIndirectCommand;
Since you do not want to draw multiple instances of the same vertices, you use 1 for the instanceCount in each draw call. The key idea is that you can still use instancing by specifying a different baseInstance value for each one. So each object will have a different gl_InstanceID value, and you can use instanced attributes for the values (matrices, etc) that you want to vary per object.
So if you currently have a rendering loop:
for (int k = 0; k < objectCount; ++k) {
// set uniforms for object k.
glDrawElements(GL_TRIANGLES, object[k].indexCount,
GL_UNSIGNED_INT, object[k].indexOffset * sizeof(GLuint));
}
you would instead fill an array of the struct defined above with the arguments:
DrawElementsIndirectCommand cmds[objectCount];
for (int k = 0; k < objectCount; ++k) {
cmds[k].count = object[k].indexCount;
cmds[k].instanceCount = 1;
cmds[k].firstIndex = object[k].indexOffset;
cmds[k].baseVertex = 0;
cmds[k].baseInstance = k;
}
// Rest of setup.
glMultiDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_INT, 0, objectCount, 0);
I didn't provide code for the full setup above. The key steps include:
Drop the cmds array into a buffer, and bind it as GL_DRAW_INDIRECT_BUFFER.
Store the per-object values in a VBO. Set up the corresponding vertex attributes, which includes specifying them as instanced with glVertexAttribDivisor(1).
Set up the per-vertex attributes as usual.
Set up the index buffer as usual.
For this to work, the indices for all the objects will have to be in the same index buffer, and the values for each attribute will have to be in the same VBO across all objects.

Generating Smooth Normals from active Vertex Array

I'm attempting to hack and modify several rendering features of an old opengl fixed pipeline game, by hooking into OpenGl calls, and my current mission is to implement shader lighting. I've already created an appropriate shader program that lights most of my objects correctly, but this game's terrain is drawn with no normal data provided.
The game calls:
void glVertexPointer(GLint size, GLenum type, GLsizei stride, const GLvoid * pointer);
and
void glDrawElements(GLenum mode, GLsizei count, GLenum type, const GLvoid * indices);`
to define and draw the terrain, thus I have these functions both hooked, and I hope to loop through the given vertex array at the pointer, and calculate normals for each surface, on either every DrawElements call or VertexPointer call, but I'm having trouble coming up with an approach to do so - specifically, how to read, iterate over, and understand the data at the pointer. In this case, the usual parameters for the glVertexPointer calls are size = 3, type = GL_float, stride = 16, pointer = some pointer. Hooking glVertexPointer, I don't know how I could iterate through the pointer and grab all the vertices for the mesh, considering I don't know the total count of all the vertices, nor do I understand how the data is structured at the pointer given the stride - and similarly how i should structure the normal array
Would it be a better idea to try to calculate the normals in drawelements for each specified index in the indice array?

Depending on your vertex array building procedure, indices would be the only relevant information for building your normals.
Difining normal average for one vertex is simple if you add a normal field in your vertex array, and sum all the normal calculations parsing your indices array.
You have than to divide each normal sum by the number of repetition in indices, count that you can save in a temporary array following vertex indices (incremented each time a normal is added to the vertex)
so to be more clear:
Vertex[vertexCount]: {Pos,Normal}
normalCount[vertexCount]: int count
Indices[indecesCount]: int vertexIndex
You may have 6 normals per vertex so add a temporary array of normal array to averrage those for each vertex:
NormalTemp[vertexCount][6] {x,y,z}
than parsing your indice array (if it's triangle):
for i=0 to indicesCount step 3
for each triangle top (t from 0 to 2)
NormalTemp[indices[i + t]][normalCount[indices[i+t]]+1] = normal calculation with cross product of vectors ending with other summits or this triangle
normalCount[indices[i+t]]++
than you have to divide your sums by the count
for i=0 to vertexCount step 1
for j=0 to NormalCount[i] step 1
sum += NormalTemp[i][j]
normal[i] = sum / normacount[i]

While I like and have voted up the j-p's answer I would still like to point out that you could get away with calculating one normal per face and just using for all 3 vertices. It would be faster, and easier, and sometimes even more accurate.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js