I have read on many sites that glMultiDrawElements is "equivalent" to doing this:
for(int i = 0; i < drawCount; ++i){
glDrawElements(mode, counts[i], type, indices[i]);
}
But it is that part of "equivalents" that I do not understand. How equivalent? If both are exactly the same, what is the point of the glMultiDrawElements function if I can do the same with glDrawElements function and a for loop?
Does the glMultiDrawElements function give me some performance advantage?
glMultiDrawElements behaves as if you did those draw calls. Performance is not, and never has been, part of "behavior". The OpenGL specification defines behavior; it does not and cannot specify performance.
It should also be noted that, as of OpenGL 4.6/ARB_shader_draw_parameters, these are no longer exactly equivalent functions. Multi-draw functions have the ability to set the gl_DrawID input parameter to a Vertex Shader, thus allowing the shader pipeline to know which draw operation in the middle of a multi-draw operation is being executed.
You can't get that with glDrawElements.
Related
When I look at the documentation of glMultiDrawElementsIndirect (or in the Wiki) it says that a single call to glMultiDrawElementsIndirect is equivalent to repeatedly calling glDrawElementsIndirect (just with different parameters).
Does that mean that gl_InstanceID will reset for each of these "internal" calls? And if so, how am I able to tell all these calls apart in my vertex shader?
Background: I'm trying to draw all my different meshes all at once. But I need some way to know to which mesh the vertex, I'm processing in my vertex shader, belongs.
The documentation says "similarly to". "Equivalent" isn't the same thing. It also points to glDrawElementsInstancedBaseVertexBaseInstance, not glDrawElementsInstanced.
But yes, gl_InstanceId for any draw will start at zero, no matter what base instance you provide. That's how gl_InstanceId works, unfortunately.
Besides, that's not the question you want answered. You're not looking to ask which instance you're rendering, since each draw in the multi-draw can be rendering multiple instances. You're asking which draw in the multi-draw you are in. An instance ID isn't going to help.
And if so, how am I able to tell all these calls apart in my vertex shader?
Unless you have OpenGL 4.6 or ARB_shader_draw_parameters, you can't. Well, not directly.
That is, multidraw operations are expected to produce different results based on rendering from different parts of the current buffer objects, not based on computations in the shader. You're rendering with a different base vertex that selects different vertices from the arrays, or you're using different ranges of indices or whatever.
The typical pre-shader_draw_parameters solution would have been to use a unique base instance on each of the individual draws. Of course, since gl_InstanceId doesn't track the base instance (as previously stated), you would need to employ instanced arrays instead. So you'd get the mesh index from that.
Of course, 4.6/shader_draw_parameters gives you gl_DrawId, which just tells you what the index is within the multidraw command. It's also dynamically uniform, so you can use it to access arrays of opaque types in shaders.
I was thinking about:
Having a main shader which will be applied to every object of my application, it will be used for projection, transformation, positionning, coloring, etc..
And each object could have their own extra shader for extra stuff, for example a water object definitely needs an extra shader.
But there is a problem, how would I apply 2 or more shaders into one object ? Because I'll need to apply the main shader + object's own shader.
It would be really nice if OpenGL (or Direct3D!) allowed you to have multiple shaders at each vertex / fragment / whatever stage, but alas we are stuck with existing systems.
Assume you've written a bunch of GLSL functions. Some are general-purpose for all objects, like applying the modelview transformation and copying texture coords to the next stage. Some are specific to particular classes of object, such as water or rock.
What you then write is the ubershader, a program in which the main() functions at the vertex / fragment / whatever stages do nothing much other than call all these functions. This is a template or prototype from which you generate more specialised programs.
The most common way is to use the preprocessor and lots of #ifdefs around function calls inside main(). Maybe if you compile without any #defines you get the standard transform and Gouraud shading. Add in #define WATER to get the water effect, #define DISTORT for some kind of free form deformation algorithm, both if you want free-form deformed water, #define FOG to add in a fog effect, ...
You don't even need to have more than one copy of the ubershader source, since you can generate the #define strings at runtime and pass them into glCompileShader.
What you end up with is a lot of shader programs, one for each type of rendering. If for any reasons you'd rather have just one program throughout, you can do something similar on newer systems with GLSL subroutines.
These are basically function pointers in GLSL which you can set much like uniforms. Now your ubershader has 1, 2, ... function pointer calls in the main() functions. Your program just sets up #1 to be standard transform, #2 to be rock/water/whatever, #3 to be fog, ... If you don't want to use a stage, just have a NOP function that you can assign.
While this has the advantage of only using one program, it is not as flexible as the #define approach because any given pointer has to use the same function prototype. It's also more work if say WATER needs processing in multiple shaders, because you have to remember to set the function pointers in every one rather than just a single #define.
Hope this helps.
I'm drawing several alpha-blended triangles that overlap with a single glDrawElements call.
The indices list the triangles back to front and this order is important for the correct visualization.
Can I rely on the result of this operation being exactly the same as when drawing the triangles in the same order with distinct draw calls?
I'm asking this because I'm not sure whether some hardware would make some kind of an optimization and use the indices only for the information about the primitives that are drawn and disregard the actual primitive order.
To second GuyRT's answer, I looked through the GL4.4 core spec:
glDrawElements is described as follows (emphasis mine):
This command constructs a sequence of geometric primitives by
successively transferring elements for count vertices to the GL.
In section 2.1, on can find the following statement (emphasis mine):
Commands are always processed in the order in which they are received,
[...] This means, for example, that one primitive must be drawn
completely before any subsequent one can affect the framebuffer.
One might read this as only valid for primitves rendered through different draw calls (commands), however, in 7.12.1, there is some further confirmation for the more general interpretation reading for that statement (again, my emphasis):
The relative order of invocations of the same shader type are
undefined. A store issued by a shader when working on primitive B
might complete prior to a store for primitive A, even if primitive A
is specified prior to primitive B. This applies even to fragment
shaders; while fragment shader outputs are written to the framebuffer
in primitive order, stores executed by fragment shader invocations are
not.
Yes, you can rely on the order being the same as specified in the index array, and that fragments will be correctly blended with the results of triangles specified earlier in the array.
I cannot find a reference for this, but my UI rendering code relies on this behaviour (and I think it is a common technique).
To my knowledge OpenGL makes no statement about the order of triangles rendered within a single draw call of any kind. It would be counterproductive of it to do so, because it would place undesirable constraints on implementations.
Consider that modern rendering hardware is almost always multi-processor, so the individual triangles from a draw call are almost certainly being rendered in parallel. If you need to render in a particular order for alpha blending purposes, you need to break up your geometry. Alternatively you could investigate the variety of order independent transparency algorithms out there.
I've been reading through the openGL specification trying to find an answer to this question, without luck. I'm trying to figure out if OpenGL guarantees that draw calls such as GLDrawElements or GLDrawArrays will draw elements in precisely the order they appear in the VBO, or if it is free to process the fragments of those primitives in any order.
For example, if I have a vertex buffer with 30 vertices representing 10 triangles, each with the same coordinates. Will it always be the case that the triangle corresponding to vertices 0, 1 and 2 will be rendered first (and therefore on the bottom); and the triangle corresponding to vertices 28, 29, 30 always be rendered last (and therefore on top)?
The specification is very careful to define an order for the rendering of everything. Arrays of vertex data are processed in order, which results in the generation of primitives in a specific order. Each primitive is said to be rasterized in order, and later primitives cannot be rasterized until prior ones have finished.
Of course, this is all how OpenGL says it should behave. Implementations can (and do) cheat by rasterizing and processing multiple primitives at once. However, they will still obey the "as if" rule. So they cheat internally, but will still write the results as if it had all executed sequentially.
So yes, there's a specific order you can rely upon. Unless you're using shaders that perform incoherent memory accesses; then all bets are off for shader writes.
Although they may actually be drawn in a different order and take finish at different times, at the last raster operation pipeline stage, any blending (or depth/stencil/alpha test for that matter) will be done in the order that the triangles were issued.
You can confirm this by rendering some object using a blending equation that doesn't commute, for example:
glBlendFunc(GL_ONE, GL_DST_COLOR);
If the final framebuffer contents were written by the same arbitrary order that the primitives may be drawn, then in such an example you would see an effect that looks similar to Z-fighting.
This is why it's called a fragment shader (as opposed to pixel shader) because it's not a pixel yet since after the fragment stage it doesn't get written to the framebuffer just yet; only after the raster operation stage.
So, let say that I have two vertex buffers. One that describes the actual shape I want to draw, and the other one is able to influence the first one.
So, what I actually want to be able to do is something like this:
uniform VBO second_one;
void main()
{
for (int i = 0; i < size_of_array(second_one); ++i)
Do things with second_one[i] to alter the values
create the output informations
}
Things I might want to do can be gravity, that that each point in second_one tries to drag a bit the point closer to it and so on and then after the point is adjusted, apply the matrices to have its actual location.
I would be really surprise that it's possible, or something close to it. But the whole point is to be able to use a second VBO, or the make it as a uniform of type vec3 let say so I can access it.
For what you're wanting, you have three options.
An array of uniforms. GLSL lets you do uniform vec3 stuff[50];. And arrays in GLSL have a .length() method, so you can find out how big they are. Of course, there are limits to the number of uniforms you use, but you shouldn't need more than 20-30 of these. Anything more than that and you'll really feel the performance drain.
Uniform buffer objects. These can store a bit more data than non-block uniforms, but they still have limits. And the storage comes from a buffer object. But accesses to them are, depending on hardware, slightly slower than accesses to direct uniforms.
Buffer textures. This is a way to attach a buffer object to a texture. With this, you can access vast amounts of memory from within a shader. But be warned: they're not fast to access. If you can make due with one of the above methods, do so.
Note that #2 and #3 will only be found on hardware capable of supporting GL 3.x and above. So DX10-class hardware.