I have a lot of cases in my application, where I make drawcalls using the same shader with different uniform values and thought about instancing the drawcalls. However, the drawcalls have a varying number of triangles in my case.
As far as I understand DrawIndexedInstanced, it only permits to draw multiple instances with the same number of triangles/indices, so I guess I can't use this.
I thought that DrawIndexedInstancedIndirect may help, but that only seems to execute multiple calls to DrawIndexedIstanced basically.
Is there a way in Directx11 to draw instanced with a different number of triangles for each instance, or will I have to stay with normal drawcalls?
As stated in the documentation, instanced drawing is to
[...] reusing the same geometry to draw multiple objects in a scene.
It improves performance by not swapping the vertex data, but reusing it, which seems not be the case for your data, where the vertex sources are different for each draw call.
So you'll have to stick to single draw calls, but to improve your performance you could stage them after each other. Each state change has a certain cost being submitted to the gpu, if you keep your shader set as it is used for all draw calls, you can save some performance by doing all draw calls with the same shader and uniform values after each other and only switch if it is needed.
Related
Suppose I want to render many different models, each with a different transformation matrix I want to be applied to their vertices. As far as I understand, the naive approach is to specify a matrix uniform in the vertex shader, the value of which is updated for each mesh during rendering.
It's obvious to me that this is a bad idea, due to the expense of many uniform updates and draw calls. So, what is the most efficient way to achieve this in modern OpenGL?
I've genuinely tried to find a straight, clear answer to this question. Most answers I find vaguely mention UBOs, or instance drawing (which afaik won't work unless you are drawing instances of the same mesh many times, which is not my goal).
With OpenGL 4.6 or with ARB_shader_draw_parameters, each draw in a multi-draw rendering command (functions of the form glMultiDraw*) is assigned a draw index from 0 to the number of draw calls specified by that function. This index is provided to the Vertex Shader via the gl_DrawID input value. You can then use this index to fetch a matrix from any number of constructs: UBOs, SSBOs, buffer textures, etc.
This works for multi-draw indirect rendering as well. So in theory, you can have a compute shader operation generate a bunch of rendering commands, then render your entire scene with a single draw call (assuming that all of your objects live in the same vertex buffers and can use the same shader and other state). Or at the very least, a large portion of the scene.
Furthermore, this index is considered dynamically uniform, so you can also use it (or values derived from it and other dynamically uniform values) to index into arrays of textures, fetch a texture from an array of bindless textures, or the like.
After some searching, it is said that the separated VAOs which shares the exact same shader attribute layouts, merging these into one VAO and put all these datas into one VBO so that I can draw this objects with only one draw call.
This perfectly makes sense, but how about uniform variables? Say that I want to draw tree and ball. these have different count of vertices, have different transform but share exactly the same shader program.
Until now, My program was like
// generate VAOs, called only once
glGenVertexArray(1, &treeVaoId);
// generate VBO and bind tree vertices
// enabled vertex attribute and set it for shader program
glGenVertexArray(1, &ballVaoId);
// repeat for ball
// draw, called each frame
// give the tree's transform to shader as uniform
glDrawArrays(...) // first draw call
// repeat for ball
glDrawArrays(...) // second draw call
And with these vertices to one VAO and VBO, like:
glGenVertexArray(1, &treeBallId);
// generate enough size of VBO and bind tree vertices and ball vertices after it.
// enabled vertex attribute and set it for shader program
// to draw, I have to separately give transform to it's uniform to each tree and ball, but how?
glDrawArrays(...)
Sorry for poor example, but the point is, Is there a way for giving different uniform variable while drawing one VAO? Or, is my approach totally wrong?
The purpose of batching is to improve performance by minimizing state changes between draw calls (batching reduces them to 0, since there is nothing between draw calls). However, there are degrees of performance improvement, and not all state changes are equal.
On the scale of the costs of state changes, changing program uniforms is the least expensive state change. That's not to say that it's meaningless, but you should consider how much effort you really want to spend compared to the results you get out of it. Especially if you're not pushing hardware as fast as possible.
VAO changes (non-buffer-only changes, that is) are among the more expensive state changes, so you gained a lot by eliminating them.
As the name suggests, uniform variables cannot be changed from one shader instance to another within a draw call. Even a multi-draw call. But that doesn't mean that there's nothing that can be done.
Multi-draw functionality allows you to issue multiple draw calls in a single function call. These individual draws can get their vertex data from different parts of the vertex and index buffers. What you need is a way to communicate to your vertex shader which draw call it is taking part in, so that it can index an array of some sort to extract that draw call's per-object data.
Semi-recent hardware has access to gl_DrawID, which is the index into a multi-draw command of the particular draw call being executed. The ARB_shader_draw_parameters extension is fairly widely implemented. You can use that index to fetch per-object data from a UBO or SSBO.
I built a 2D graphical engine, and I created a batching system for it, so, if I have 1000 sprites with the same texture, I can draw them with one single call to openGl.
This is achieved by putting in a single vbo vertex array all the vertices of all the sprites with the same texture.
Instead of "print these vertices, print these vertices, print these vertices", I do "put all the vertices toghether, print", just to be very clear.
Easy enough, but now I'm trying to achieve the same thing in 3D, and I'm having a big problem.
The problem is that I'm using a Model View Projection matrix to place and render my models, which is the common approach to render a model in 3D space.
For each model on screen, I need to pass the MVP matrix to the shader, so that I can use it to transform each vertex to the correct position.
If I would do the transformation outside the shader, it would be executed by the cpu, which I not a good idea, for obvious reasons.
But the problem lies there. I need to pass the matrix to the shader, but for each model the matrix is different.
So I cannot do the same I did with 2d sprites, because changing a shader uniform requires a draw every time.
I hope I've been clear, maybe you have a good idea I didn't have or you already had the same problem. I know for a fact that there is a solution somewhere, because in engine like Unity, you can use the same shader for multiple models, and get away with one draw call
There exists a feature exactly like what you're looking for, and it's called instancing. With instancing, you store n matrices (or whatever else you need) in a Uniform Buffer and call glDrawElementsInstanced to draw n copies. In the shader, you get an extra input gl_InstanceID, with which you index into the Uniform Buffer to fetch the matrix you need for that particular instance.
You can read more about instancing here: https://www.opengl.org/wiki/Vertex_Rendering#Instancing
The answer depends on whether the vertex data for each item is identical or not. If it is, you can use instancing as in #orost's answer, using glDrawElementsInstanced, and gl_InstanceID within the vertex shader, and that method should be preferred.
However, if each 3D model requires different vertex data (which is frequently the case), you can still render them using a single draw call. To do this, you would add another stream into your vertex data with glVertexAttribPointer (and glEnableVertexAttribArray). This extra stream would contain the index of the matrix within the uniform buffer that vertex should use when rendering - so each mesh within the VBO would have an identical index in the extra stream. The uniform buffer contains the same data as in the instancing setup.
Note this method may require some extra CPU processing, if you need to redo the batching - for example, an object within a batch should not be rendered anymore. If this process is required frequently, it should be determined whether batching items is actually beneficial or not.
Besides instancing and adding another vertex attribute as some object ID, I'd like to also mention another strategy (which requires modern OpenGL, though):
The extension ARB_multi_draw_indirect (in core since GL 4.3) adds indirect drawing commands. These commands do source their parameters (number of vertices, starting index and so on) directly from another buffer object. With these functions, many different objects can be drawn with a single draw call.
However, as you still want some per-object state like transformation matrices, that feature is not enough. But in combination with ARB_shader_draw_parameters (not in core GL yet), you get the gl_DrawID parameter, which will be incremented by one for each single object in one mult draw indirect call. That way, you can index into some UBO, or TBO, or SSBO (or whatever) where you store per-object data.
I am currently working on a new Renderer using DX11. To batch multiple meshes I would like to use geometry instancing with Texture2dArrays to prevent texture atlases.
This would be the pseudo code for rendering:
foreach effect in effects
foreach batch in batches
SetTexture2DArray()
SetInstanceBuffer() //Transform & Material (cbuffer)
SetVertexBuffer()
SetIndexBuffer()
DrawIndexed()
Each mesh consists of 3 Textures and geometry. Meshes with the same input layout would be get combined in one batch. One batch can hold up to ~300 meshes to get an TexturArray of 900 Textures per batch.Is it possiple to to use diffrent combine textures of diffrent sizes into on TextureArray?If not I could only combine meshes with the same input layout and texures sizes.Do you think this is a good system generally?
About texture arrays, each slice needs to be same size.
About merging models less draw calls is generally better, but having one single buffer holding different subsets with varying amount of primitives each can lead to some overdraw, and frustrum culling will be harder to apply in that use case if it's needed (depending on the amount of geometry you might just send it anyway, modern cards can eat up geometry rather easily). If all your geometry is visible at all time then merging is a definite good option.
I read in this Apple documentation (under the header "Avoid Storing Constants in Attribute Arrays") it says that if a model's vertices all have the same colour then colour shouldn't be a vertex attribute. What do they mean by "OpenGL ES 2.0 applications can either set a constant vertex attributeā¦"?
My question is, is it better to use a uniform value for colour, and call have a uniform call and draw call for every object? Or to have the vertex attribute anyway, but draw everything in one fell swoop. (Or, a constant vertex attribute if that's better).
Basically, is the advantage of drawing everything at once only the lack of overhead of multiple function calls?
Just to get a sense of it, say I were drawing 1000 circles every frame, each a different colour and having 40 vertices. Which would be better in that case?
The answer depends on how much stuff you are drawing in a single draw call. If you have an object of 30,000 vertices, where all of them have the same color, then you're wasting a lot of per-vertex reads (assuming that the color data makes your per-vertex data bigger. It may not). However, if you're talking about quad rendering, where each quad has a different color, then the uniform update overhead and multiple draw calls is going to kill your performance.
Note that there are methods for instancing under OpenGL, which allows you to have per-instance data as well as per-vertex data. But this generally doesn't buy much until you have multiple thousands of instances, and more than 100 vertices in the model.
For your specific example, there's no way to know which would be faster. You'd have to benchmark it.