OpenGL GLSL uniform branching vs. Multiple shaders - opengl

I've been reading many articles on uniform if statements that deal with branching to change the behavior of large shaders "uber shaders". I started on an uber shader (opengl lwjgl) but then I realized, the simple act of adding an if statement set by a uniform in the fragment shader that does simple calculations decreased my fps by 5 compared to seperate shaders without uniform if statements. I haven't set any cap to my fps limit, it's just refreshing as fast as possible. I'm about to add normal mapping and parrallax mapping and I can see two routes:
Uber vertex shader:
#version 400 core
layout(location = 0) in vec3 position;
layout(location = 1) in vec2 textureCoords;
layout(location = 2)in vec3 normal;
**UNIFORM float RenderFlag;**
void main(void){
if(RenderFlag ==0){
//Calculate outVariables for normal mapping to the fragment shader
}
if(RenderFlag ==1){
//Calcuate outVariables for parallax mapping to the fragment shader
}
gl_Position = MVPmatrix *vec4(position,1);
}
Uber fragment shader:
layout(location = 0) in vec3 position;
layout(location = 1) in vec2 textureCoords;
layout(location = 2)in vec3 normal;
**UNIFORM float RenderFlag;**
**UNIFORM float reflectionFlag;** // if set either of the 2 render modes
will have some reflection of the skybox added to it, like reflective
surface.
void main(void){
if(RenderFlag ==0){
//display normal mapping
if(reflectionFlag){
vec4 reflectColor = texture(cube_texture, ReflectDirR) ;
//add reflection color to final color and output
}
}
if(RenderFlag ==1){
//display parrallax mapping
if(reflectionFlag){
vec4 reflectColor = texture(cube_texture, ReflectDirR) ;
//add reflection color to final color and output
}
}
gl_Position = MVPmatrix *vec4(position,1);
}
The benefit of this (for me) is simplicity in the flow, but makes the overall program more complex and i'm faced with ugly nested if statements. Also if I wanted to completely avoid if statements I would need 4 seperate shaders, one to handle each possible branch (Normal w/o reflection : Normal with reflection : Parrallax w/o reflection : Parrallax with reflection) just for one feature, reflection.
1: Does GLSL execute both branches and subsequent branches and calculates BOTH functions then outputs the correct one?
2: Instead of a uniform flag for the reflection should I remove the if statement in favor of calculating the reflection color irregardless and adding it to the final color if it is a relatively small operation with something like
finalColor = finalColor + reflectionColor * X
where X = a uniform variable, if none X == 0, if Reflection X==some amount.

Right off the bat, let me point out that GL4 has added subroutines, which are sort of a combination of both things you discussed. However, unless you are using a massive number of permutations of a single basic shader that gets swapped out multiple times during a frame (as you might if you had some dynamic material system in a forward rendering engine), subroutines really are not a performance win. I've put some time and effort into this in my own work and I get worthwhile improvements on one particular hardware/driver combination, and no appreciable change (good or bad) on most others.
Why did I bring up subroutines? Mostly because you're discussing what amounts to micro optimization, and subroutines are a really good example of why it doesn't pay to invest a whole lot of time thinking about that until the very end of development. If you're struggling to meet some performance number and you've crossed every high-level optimization strategy off the list, then you can worry about this stuff.
That said, it's almost impossible to answer how GLSL executes your shader. It's just a high-level language; the underlying hardware architectures have changed several times over since GLSL was created. The latest generation of hardware has actual branch predication and some pretty complicated threading engines that GLSL 1.10 class hardware never had, some of which is actually exposed directly through compute shaders now.
You could run the numbers to see which strategy works best on your hardware, but I think you'll find it's the old micro optimization dilemma and you may not even get enough of a measurable difference in performance to make a guess which approach to take. Keep in mind "Uber shaders" are attractive for multiple reasons (not all performance related), none the least of which, you may have fewer and less complicated draw commands to batch. If there's no appreciable difference in performance consider the design that's simpler and easier to implement / maintain instead.

Related

How to allow user's custom shader for an openGL software

In softwares like Unity or Unreal, for example, how do they allow users to add their own custom shaders to an object?
Is this custom shader just a normal fragment shader or is it another kind of shader? And if it is just a fragment shader, how do they deal with the lights?
I'm not gonna post the code here because it's big and would pollute the page, but I'm starting to learn through here: https://github.com/opentk/LearnOpenTK/blob/master/Chapter2/6-MultipleLights/Shaders/lighting.frag (it's a series of tutorials, this is the shader from the last one), and they say we should put the light types in functions, inside the fragment shader, to calculate the colors of each fragment.
For example, this function to calculate a directional light, extracted from the code I sent above:
vec3 CalcDirLight(DirLight light, vec3 normal, vec3 viewDir)
{
vec3 lightDir = normalize(-light.direction);
//diffuse shading
float diff = max(dot(normal, lightDir), 0.0);
//specular shading
vec3 reflectDir = reflect(-lightDir, normal);
float spec = pow(max(dot(viewDir, reflectDir), 0.0), material.shininess);
//combine results
vec3 ambient = light.ambient * vec3(texture(material.diffuse, TexCoords));
vec3 diffuse = light.diffuse * diff * vec3(texture(material.diffuse, TexCoords));
vec3 specular = light.specular * spec * vec3(texture(material.specular, TexCoords));
return (ambient + diffuse + specular);
}
But I've never seen people adding lights in their shaders in Unity for example, people just add textures and mess with colors, unless they really want to specifically mess with the lights.
Is there a way of making just one fragment shader that will compute all the types of light, and the user could then apply another shader, just for the object material, on top of that?
If you don't know how to answer but have some good reading material, or place where I could learn more about openGL and GLSL it would be of great value as well.
There are a couple of different ways to structure shader files, each with different pros and cons.
As individual programs. You make each file it's own shader program. Simple to add new programs, and would allow your users to just write a program in GLSL, HLSL, or an engines custom shader language. You will have to provide some way for the user to express what kind of data the program expects, unless you query it from the engine, but it might get complicated to make something that's generic enough.
Über Shader! Put all desired functionality in one shader and let the behavior be controlled by control flow or preprocessor macros, such as #ifdef. So the user would just have to write the main function (which the application adds to the rest of the shader). This allows you to let the user use all the predefined variables and functions. The obvious downside is that it could be big and hard to handle and small changes might break many shaders.
Micro Shaders. Each program contains a small, common functionality, and the application concatenate them all to a functioning shader. The user just write the main function and tells the program which functionality to add. The problem is that it's easy to get conflicts in names unless you're careful and is harder to implement than the über shader.
Effect files. Provided by Microsoft’s effect framework or NVIDIA’s CgFX framework (deprecated).
Abstract Shade Trees. Don't actually know what this is, but it's suppose to be a solution.
You can also combine some of these techniques or try to invent your own solution based on your needs. Here's the solutions discussed (in sector 2.3.3 Existing Solutions).

GLSL only render fragments with Z-position between uniforms minZ and maxZ

I have a mesh with many thousands of vertices. The model represents a complex structure that needs to be visualized both in its entirety but also in part. The user of my application should be able to define a minimum and maximum value for the Z-Axis. Only fragments with a position between these limits should be rendered.
My naive solution would be to write a Fragment shader somewhat like this:
#extension GL_EXT_texture_array : enable
uniform sampler2DArray m_ColorMap;
uniform float m_minZ;
uniform float m_maxZ;
in vec4 fragPosition;
in vec3 texCoord;
void main(){
if (fragPosition.z < m_minZ || fragPosition.z > m_maxZ) {
discard;
}
gl_FragColor = texture2DArray(m_ColorMap, texCoord);
}
Alternatively I could try to somehow filter out vertices in the vertex shader. Perhaps by setting their position values to (0,0,0,0) if they fall out of range.
I am fairly certain both of these approaches can work. But I would like to know if there is some better way of doing this that I am not aware of. Some kind of standard approach for slicing models along an axis.
Please keep in mind that I do not want to use separate VBO's for each slice since they can be set dynamically by the user.
Thank you very much.

Write to some fragment locations but not others

Currently, I have two shaders that are intended to process the same type of objects, but produce different output: one color for the screen, the other selection info.
Output of draw shader:
layout(location = 0) out vec4 outColor;
Output of selection shader:
layout(location = 0) out vec4 selectionInfo0;
layout(location = 1) out ivec4 selectionInfo1;
I am considering combining the shaders together (these two and others in my application) for clarity and ease of maintenance (why edit two shaders when you can edit one?).
Output of unified shader:
layout(location = 0) out vec4 outColor;
layout(location = 1) out vec4 selectionInfo0;
layout(location = 2) out ivec4 selectionInfo1;
Under this scheme, I would set a uniform that determines which fragments need to be written to.
Can I write to some fragment locations and not others?
void main()
{
if(Mode == 1){
outColor = vec4(1, 0, 0, 1);
}
else {
selectionInfo0 = vec4(0.1, 0.2, 0.3, 0.4);
selectionInfo1 = ivec4(1, 2, 3, 4);
}
}
Is this a legitimate approach? Is there anything I should be concerned about?
Is this a legitimate approach?
That depends on how you define "legitimate". It can be made to function.
A fragment is either discarded in its entirety or it is not. If it is discarded, then the fragment has (mostly) no effect. If it is not discarded, then all of its outputs either have defined values (ie: you wrote to them), or they have undefined values.
However, undefined values can be OK, depending on other state. For example, the frambuffer's draw buffer state routes FS output colors to actual color attachments. It can also route them to GL_NONE, which throws them away. Similarly, you can use color write masks on a per-attachment basis, turning off writes to attachments you don't want to write to.
But this means that you cannot determine it on a per-fragment basis. You can only determine this using state external to the shader. The FS can't make this happen or not happen; it has to be done between draw calls with state changes.
If Mode is some kind of uniform value, then that should be OK. But if it is something derived on a per-vertex or per-fragment basis, then this will not work effectively.
As for branching performance, again, that depends on Mode. If it's a uniform, you shouldn't be concerned at all. Modern GPUs can handle that just fine. If it is something else... well, your entire scheme stops working for reasons that have already been detailed, so there's no reason to worry about it ;)
That all being said, I would advise against this sort of complexity. It is a confusing way of handling things from the driver's perspective. Also, because you're relying on a lot of things that other applications are not relying on, you open yourself up to driver bugs. Your idea is different from a traditional Ubershader, because your options fundamentally change the nature of the render targets and outputs.
So I would suggest you try to do things in as conventional a way as possible. If you really want to minimize the number of separate files you work with, employ #ifdefs, and simply patch the shader string with a #define, based on the reason you're loading it. So you have one shader file, but 2 programs built from it.

Fragment shader color interpolation: details and hardware support

I know using a very simple vertex shader like
attribute vec3 aVertexPosition;
attribute vec4 aVertexColor;
uniform mat4 uMVMatrix;
uniform mat4 uPMatrix;
varying vec4 vColor;
void main(void) {
gl_Position = uPMatrix * uMVMatrix * vec4(aVertexPosition, 1.0);
vColor = aVertexColor;
}
and a very simple fragment shader like
precision mediump float;
varying vec4 vColor;
void main(void) {
gl_FragColor = vColor;
}
to draw a triangle with red, blue, and green vertices will end up having a triangle like this
My questions are:
Do calculations for interpolating fragment colors belonging to one triangle (or a primitive) happen in parallel on GPU?
What are the algorithm and also hardware support for interpolating fragment colors inside the triangle?
The interpolation is the step Fragment Processor
Algorithm is very simple they just interpolate the color according to their UV
Yes, absolutely.
Triangle color interpolation is part of the fixed-function pipeline (it's actually part of the rasterization step, which happens before fragment processing), so it is carried out entirely in hardware with probably all video cards. The equations for interpolating vertex data can be found e.g. in OpenGL 4.3 specification, section 14.6.1 (pp. 405-406). The algorithm defines barycentric coordinates for the triangle and uses them to interpolate between the vertices.
Besides the answers giving here, I wanted to add that there doesn't have to be dedicated fixed-function hardware for the interpolations. Modern GPU tend to use "pull-model interpolation" where the interpolation is actually done via the shader units.
I recommend reading Fabian Giesen's blog article about the topic (and the whole series of articles about the graphics pipeline in genreal).
On the first question - though there are parallel units on the GPU, it depends on the size of the triangle in consideration. For most of the GPUs, drawing happens on a tile by tile basis, and depending on the "screen" size of the triangle, if it falls within just one tile completely, it will be processed completely in only one tile processor. If it is split, it can be done in parallel by different units.
The second question is answered by other posters before me.

Why Tesellation control shader is invoked many times?

My question is that all of Tesellation Control Shader Invocation produce the same result, why OPENGL has to call this shader many times for each patch.
For example: My Tesellation Control Shader calculates control points for the Bezier Surface. It take an array of three vertices, which is aggregated earlier from Vertex Shaders.
// attributes of the input CPs
in vec3 WorldPos_CS_in[];
My patch size is 3, so Tesellation Control Shader is called three times for the same input, save for gl_invocatinoID, and then all of them produce the same following control points:
struct OutputPatch
{
vec3 WorldPos_B030;
vec3 WorldPos_B021;
vec3 WorldPos_B012;
vec3 WorldPos_B003;
vec3 WorldPos_B102;
vec3 WorldPos_B201;
vec3 WorldPos_B300;
vec3 WorldPos_B210;
vec3 WorldPos_B120;
vec3 WorldPos_B111;
vec3 Normal[3];
vec2 TexCoord[3];
};
// attributes of the output CPs
out patch OutputPatch oPatch;
And, also the same information which help OpenGL divide this patch into tesellation coordinates:
// Calculate the tessellation levels
gl_TessLevelOuter[0] = gTessellationLevel;
gl_TessLevelOuter[1] = gTessellationLevel;
gl_TessLevelOuter[2] = gTessellationLevel;
gl_TessLevelInner[0] = gTessellationLevel;
It is clear that all of Tes Control Shader do same job. Does it waste resources? Why Tesellation Control Shader should be called for one time for each patch?
Well the control shader invocations don't produce exactly the same result, because the control point output is obviously different for each. But that's being pedantic.
In your program, and in all of mine so far, yes the control shader is doing exactly the same thing for every control point and the tessellation level doesn't change.
But suppose you have the shader generating new attributes for each control point, a texture normal or something? Then the shader would generate different results for each. It's nice to have the extra flexibility if you need it.
Modern GPUs try to do as much as possible in parallel. The older geometry shaders have one invocation generating multiple outputs. It's more efficient, not less, on modern GPUs to have multiple invocations each generating one output.