Is there an impact to do not use an activated Attrib? - c++

Should I disable shaders attributes when switching to a program shader which uses less (or different locations of) attributes?
I enable and disable these attributes with glEnableVertexAttribArray()/glDisableVertexAttribArray().
Is there any performance impact, or could it bring some bugs, or doing enable/disable will be slower than activate all attributes and let them activated ?

The OP most likely understands the first part already, but let me just reiterate some points on vertex attributes to set the basis for the more interesting part. I'll assume that all vertex data comes from buffers, and not talk about the case where calls like glVertexAttrib3f() are used to set "constant" values for attributes.
The glEnableVertexAttribArray() and glVertexAttribPointer() calls specify which vertex attributes are enabled, and describe how the GPU should retrieve their values. This includes their location in memory, how many components they have, their type, stride, etc. I'll call the collected state specified by these calls "vertex attribute state" in the rest of this answer.
The vertex attribute state is not part of the shader program state. It lives in Vertex Attribute Objects (VAOs), together with some other related state. Therefore, binding a different program changes nothing about the vertex attribute state. Only binding a different VAO does, or of course making one of the calls above.
Vertex attributes are tied to attribute/in variables in the vertex shader by setting the location of the in variables. This specifies which vertex attribute the value of each in variable should come from. The location value is part of the program state.
Based on this, when binding a different program, it is necessary that the locations of the in variables are properly set to refer to the desired attribute. As long as the same attribute is always used for the shader, this has to be done only once while building the shader. Beyond that, all the attributes used by the shader have to be enabled with glEnableVertexAttribArray(), or by binding a VAO that contains the state.
Now, finally coming to the core of the question: What happens if attributes that are not used by the program are enabled?
I believe that having unused attributes enabled is completely legal. At least I've never seen anything in the spec that says otherwise. I just checked again, and still found nothing. Therefore, there will be no bugs resulting from having unused attributes enabled.
Does it hurt performance? The short answer is that it possibly could. Let's look at two hypothetical hardware architectures:
Architecture A has reading of vertex attribute values baked into the vertex shader code.
Architecture B has a fixed function unit that reads vertex attribute values. This fixed function unit is controlled by the vertex attribute state, and writes the values into on-chip memory, where vertex shader instances pick them up.
With architecture A, having unused attributes enabled would have no effect at all. They would simply never be read.
With architecture B, the fixed function unit might read the unused attributes. The vertex shader would end up not using them, but they could still be read from main/video memory into on-chip memory. The driver could avoid that by checking which attributes are used by the current shader, and set up the fixed function unit with only those attributes. The downside is that the state setup for the fixed function unit has to be checked/updated every time a new shader is bound, which is otherwise unnecessary. But it prevents reading unused attributes from memory.
Going one step farther, let's say we do end up reading unused attributes from memory. If and how much this hurts is impossible to answer in general. Intuitively, I would expect it to matter very little if the attributes are interleaved, and the unused attributes are in the same cache lines as used attributes. On the other hand, if reading unused attributes causes extra cache misses, it would at least use memory bandwidth, and consume power.
In summary, I don't believe there's a clear and simple answer. Chances are that having unused attributes enabled will not hurt at all, or very little. But I would personally disable them anyway. There is a potential that it might make a difference, and it's very easy to do. Particularly if you use VAOs, you can generally set up the whole vertex attribute state with a single glBindVertexArray() call, so enabling/disabling exactly the needed attributes does not require additional API calls.

Related

Why do I need to enable vertex attribute when using VAO (DSA version) [duplicate]

After calling glVertexAttribPointer(GLuint index, ...) the vertex attribute is disabled by default as the docs say
By default, all client-side capabilities are disabled, including all generic vertex attribute arrays.
Why must we enable it using an extra function? Can someone name a case, where this is useful?
When researching I learned the following:
By using the layout(location = x) qualifier in GLSL or glBindAttribLocation we can set the location explicitly rather then letting OpenGL generate it. But this is not the point of my question.
glEnableVertexAttribArray can not be used to draw one VAO with multiple shaders. As attribute locations are queried using a program object, one would assume that locations are shader-specific; then one would expect, we can enable the right attributes locations before running the right shader. But testing this, I noticed, that one location value can occur more than one time in different shaders; furthermore the output looked wrong. If you wish to see the code, just ask.
Attribute locations are stored in the VAO.
The setting makes complete sense. There are very valid use cases for both having it enabled and disabled.
The name of the entry point already gives a strong hint why that is. Note the Array part in glEnableVertexAttribArray(). This call does not "enable the attribute". It enables using vertex attribute values from an array, meaning:
If it's enabled, a separate value from an array is used for each vertex.
If it's disabled, the current value of the attribute is used for all vertices.
The current value of an attribute is set with calls of the glVertexAttrib[1234]f() family. A typical code sequence for the use case where you want to use the same attribute value for all vertices in the next draw call is:
glDisableVertexAttribArray(loc);
glVertexAttrib4f(loc, colR, colG, colB, colA);
Compared to the case where each vertex gets its own attribute value from an array:
glEnableVertexAttribArray(loc);
glVertexAttribPointer(loc, ...);
Now, it is certainly much more common to source attributes from an array. So you could argue that the default is unfortunate for modern OpenGL use. But the setting, and the calls to change it, are definitely still very useful.
Remember GL has evolved from an underlying API which is over 20 years old and a huge amount of stuff is kept for for backwards compatibility, including a programming style which involves binding and state enables.
The hardware today is totally different to the original hardware the API was designed for, so in many cases there isn't a sensible "why" - that's just how the API works. Hence the move the new Vulkan API which drops all of the legacy support, and has a very different programming model ...
Why must we enable it using an extra function?
... because that is how the API works.
Can someone name a case, where this is useful?
... if you don't enable it it doesn't work, so I suspect that counts as useful.
Attribute locations are stored in the VAO.
VAO's didn't exist in the original API; they came along later, and really they just cache set of existing attribarray settings for VBOs, so you still need this API to set up what is referenced in the VAO.
If you you ask "why" a lot with OpenGL you'll go insane - it's not a very "clean" API from a programmers model point of view, and has evolved over multiple iterations while maintaining backwards compatibility in many cases. There are multiple ways of doing things, and many things which don't make sense if you try and use both at the same time. In most cases it's impossible to answer "why" accurately without finding out what someone was thinking 20 years ago when the original API was designed.
However you could imagine a theoretical use case where separate enables are useful. For example, imaging a case where you are rendering a model with 5 attribute arrays, and then a different model with 4 attribute arrays. For that second model what does the hardware do with the 5th attribute? Naively it might copy it into the GPU, so software needs to tell hardware not to do that. You could have an API where you write a special attribute (e.g. a NULL pointer, with zero length), or you have an API with an enable setting which simply tells the hardware not to read something.
Given an enable is probably just a bitmask in a register, then the enables are actually more efficient for the driver than having to decode a special case vertex attribute.

Occlusion Queries and Instanced Rendering

I'm facing a problem where the use of an occlusion query in combination with instanced rendering would be desirable.
As far as I understood, something like
glBeginQuery(GL_ANY_SAMPLES_PASSED, occlusionQuery);
glDrawArraysInstanced(mode, i, j, countInstances);
glEndQuery(GL_ANY_SAMPLES_PASSED);
will only tell me, if any of the instances were drawn.
What I would need to know is, what set of instances has been drawn (giving me the IDs of all visible instances). Drawing each instance in an own call is no option for me.
An alternative would be to color-code the instances and detect the visible instances manually.
But is there really no way to solve this problem with a query command and why would it not be possible?
It's not possible for several reasons.
Query objects only contain a single counter value. What you want would require a separate sample passed count for each instance.
Even if query objects stored arrays of sample counts, you can issue more than one draw call in the begin/end scope of a query. So how would OpenGL know which part of which draw call belonged to which query value in the array? You can even change other state within the query scope; uniform bindings, programs, pretty much anything.
The samples-passed count is determined entirely by the rasterizer hardware on the GPU. And the rasterizer neither knows nor cares which instance generated a triangle.
Instancing is a function of the vertex processing and/or vertex specification stages; by the time the rasterizer sees it, that information is gone. Notice that fragment shaders don't even get an instance ID as input, unless you explicitly create one by passing it from your vertex processing stage(s).
However, if you truly want to do this you could use image load/store and its atomic operations. That is, pass the fragment shader the instance in question (as an int data type, with flat interpolation). This FS also uses a uimageBuffer buffer texture, which uses the GL_R32UI format (or you can use an SSBO unbounded array). It then performs an imageAtomicAdd, using the instance value passed in as the index to the buffer. Oh, and you'll need to have the FS explicitly require early tests, so that samples which fail the fragment tests will not execute.
Then use a compute shader to build up a list of rendering commands for the instances which have non-zero values in the array. Then use an indirect rendering call to draw the results of this computation. Now obviously, you will need to properly synchronize access between these various operations. So you'll need to use appropriate glMemoryBarrier calls between each one.
Even if queries worked the way you want them to, this would be overall far more preferable than using a query object. Unless you're reading a query into a buffer object, reading a query object requires a GPU/CPU synchronization of some form. Whereas the above requires some synchronization and barrier operations, but they're all on-GPU operations, rather than synchronizing with the CPU.

How can I get automatic unique atomic counter binding points (no hard coded binding=)?

Many articles describe using atomic counters by specifying a fixed binding point:
//Shader:
layout(binding = 0, offset = 0) uniform atomic_uint myAtomicCounter;
//App code
glBindBufferBase(GL_ATOMIC_COUNTER_BUFFER, 0, myBufferHandle);
Here, the hard coded binding point binding = 0 is specified in both the shader and application code. I guess these articles do it this way because,
Atomic counters are not assigned a location and may not be
modified using the Uniform* commands. The bindings,
offsets, and strides belonging to atomic counters
of a program object are invalidated and new ones
assigned after each successful re-link. [shader_atomic_counters]
The above is fine until you want some more modular shader code. For example I have two shader include files, each needing an atomic counter that I've written as pluggable code that doesn't know about the other. Clearly I don't want to specify hard coded binding points and I'd like the application to handle it automatically. I don't really care which binding points they use, just that they aren't the same.
Vertex shader attributes appear similar. I can force a binding location at runtime (glBindAttribLocation) before shader linking or alternatively let OpenGL choose them for me and then query the result (glGetAttribLocation). I can also search through all attributes (glGetActiveAttrib).
How can I implement automatic unique binding points for atomic counters, so they're not hard coded and I can mix shader code?
I can see a few ways this might be possible, still with the limitation of not changing them after linking:
Don't specify the binding point in the shader and let OpenGL pick one when linking. I don't know if OpenGL will do this. If it does, how do you query to find the binding point used?
Query the shader objects before linking to find atomic counters. Then give them unique binding locations just like glBindAttribLocation for attributes. Is there a mechanism to assign binding points?
Parse all the shader code, looking for atomic counters and replace the binding points with unique indices in the shader code itself, perhaps using #define macros. A last resort. I really don't want to have to do this.
On point 1: The binding layout parameter for atomic counters is not optional.
On point 2: Because binding is not optional, there's no point in being able to set it from OpenGL code.
So your only recourse is #3.
You can query for atomic counters using glGetActiveUniform. Then get the buffer index and finally the binding point:
int atomicCounterIndex;
glGetActiveUniformsiv(programId, count, index,
GL_UNIFORM_ATOMIC_COUNTER_BUFFER_INDEX, &atomicCounterIndex);
glGetActiveAtomicCounterBufferiv(programId, atomicCounterIndex,
GL_ATOMIC_COUNTER_BUFFER_BINDING, &myBufferHandle)
This works since OpenGL 4.2 as stated in the docs

GLSL coherent imageBuffer access in single-stage (fragment shader), single-pass scenario

I have a single fragment shader that performs processing on an imageBuffer using image load/store operations.
I am exclusively concerned about the following scenario:
I have a single fragment shader (no multistage (eg. vertex then fragment shaders) considerations, and no multipass rendering)
imageBuffer variables are declared as coherent. Exclusively interested in coherent imageBuffers.
To make things perfectly clear, my scenario is the following:
// Source code of my sole and unique fragment shader:
coherent layout(1x32) uniform uimageBuffer data;
void main()
{
...
various calls to imageLoad(data, ..., ...);
...
various calls to imageStore(data, ..., ...);
...
}
I have largely looked at the spec
ARB_shader_image_load_store
especially this very paragraph:
"Using variables declared as "coherent" guarantees that the results of
stores will be immediately visible to shader invocations using
similarly-declared variables; calling MemoryBarrier is required to
ensure that the stores are visible to other operations."
Note: my "coherent uniform imageBuffer data;" declaration precisely is a "similarly-declared" variable. My scenario is single-pass, single-stage (fragment shader).
Now, I have looked at various web sites and stumbled (like most people I think) upon this thread on stackoverflow.com:
How exactly is GLSL's "coherent" memory qualifier interpreted by GPU drivers for multi-pass rendering?
and more specifically, this paragraph:
"Your shaders cannot even make the assumption that issuing a load
right after a store will get the memory that was just stored in this
very shader (yes really. You have to put a memoryBarrier in to pull
that one off)."
My question is the following:
With the coherent qualifier specified, in my single-shader, single-pass processing scenario, can I yes or no be sure that imageStore()'s will be immediately visible to ALL invocations of my fragment shader (eg. the current invocation as well as other concurrent invocations)?
By reading the ARB_shader_image_load_store spec, it seems to me that:
the answer to this question is yes,
I don't need any kind of memoryBarrier(),
the quoted sentence in the above referenced thread in stackoverflow may indeed be misleading and wrong.
Thanks for your insight.
Use that memory barrier.
For one thing GPU may optimize and fetch whole blocks of memeory to read FROM, and have separate memory to write TO.
In other words if Your shader always modify SINGLE location JUST ONCE then its ok, but if it relay on neighbors values AFTER some computation was applied, then You need memory barrier.
With the coherent qualifier specified, in my single-shader, single-pass processing scenario, can I yes or no be sure that imageStore()'s will be immediately visible to ALL invocations of my fragment shader (eg. the current invocation as well as other concurrent invocations)?
If each fragment shader writes to separate locations in the image, and each fragment shader only reads the locations that it wrote, then you don't even need coherent. However, if a fragment shader instance wants to read data written by other fragment shader instances, you're SOL. There's nothing you can do for that one.
If it were a compute shader, you could issue the barrier call to synchronize operations within a work group. That would ensure that the writes you want to read happen (you still need the memoryBarrier call to make them visible). But that would only ensure that writes from instances within this work group have happened. Writes from other instances are still undefined.
and more specifically, this paragraph:
BTW, that paragraph was wrong. Very very wrong. Too bad the person who wrote that paragraph will never ever be identified ;)

Should the same VAO be used with multiple programs?

Say we bind vertex attribute locations to the same values on two programs. Is it correct to use the same vertex array object to draw with these two programs?
Define "correct."
If two program objects use compatible attribute locations, then they use the same attribute locations. VAOs work off of attribute locations, so a VAO that works with one will work with another. So this will work.
In general, it is a matter of performance whether you actually take advantage of this. It's generally a good idea to avoid changing vertex array state, but it's not clear how important this is relative to other state changes. You're changing programs anyway, so not changing VAOs when you change programs will at worst be no slower and can lead to significant performance increases.
However, it is not clear how much work you should do to minimize vertex array state changes. If you can pack models into the same buffer objects with the same format, you can render all of them without VAO changing using functions like glDrawArrays or glDrawElementsBaseVertex.
I've tried using the same VAO with different shaders and I could see visual artifacts.
(The IDs of the attributes did match)
The solution was to use a new VAO for every single program.