I'm looking at the source of an OpenGL application that uses shaders. One particular shader looks like this:
uniform float someConstantValue;
void main()
{
// Use someConstantValue
}
The uniform is set once from code and never changes throughout the application run-time.
In what cases would I want to declare someConstantValue as a uniform and not as const float?
Edit:
Just to clarify, the constant value is a physical constant.
Huge reason:
Error: Loop index cannot be compared with non-constant expression.
If I use:
uniform float myfloat;
...
for (float i = 0.0; i < myfloat; i++)
I get an error because myfloat isn't a constant expression.
However this is perfectly valid:
const float myfloat = 10.0;
...
for (float i = 0.0; i < myfloat; i++)
Why?
When GLSL (and HLSL for that matter) are compiled to GPU assembly instructions, loops are unrolled in a very verbose (yet optimized using jumps, etc) way. Meaning the myfloat value is used during compile time to unroll the loop; if that value is a uniform (ie. can change each render call) then that loop cannot be unrolled until run time (and GPUs don't do that kind of JustInTime compilation, at least not in WebGL).
First off, the performance difference between using a uniform or a constant is probably negligible. Secondly, just because a value is always constant in nature doesn't mean that you will always want it be constant in your program. Programmers will often tweak physical values to produce the best looking result, even when that doesn't match reality. For instance, the acceleration due to gravity is often increased in certain types of games to make them more fast paced.
If you don't want to have to set the uniform in your code you could provide a default value in GLSL:
uniform float someConstantValue = 12.5;
That said, there is no reason not to use const for something like pi where there would be little value in modifying it....
I can think of two reasons:
The developer reuses a library of shaders in several applications. So instead of customizing each shader for every app, the developer tries to keep them general.
The developer anticipates this variable will later be a user-controlled setting. So declaring it as uniform is preparation for that upcoming feature.
If I was the developer and none of the above applies then I would declare it as "const" instead because it can give a performance benefit and I wouldn't have to set the uniform from my code.
Related
In my GLSL code, I have large functions being called multiple times.
As an example, inside my perlin noise function I run a hash function many times.
float perlinNoise (...) {
...
float x0y0z0 = pcgUnit3(seedminX,seedminY,seedminZ);
float x0y0z1 = pcgUnit3(seedminX,seedminY,seedmaxZ);
float x0y1z0 = pcgUnit3(seedminX,seedmaxY,seedminZ);
float x0y1z1 = pcgUnit3(seedminX,seedmaxY,seedmaxZ);
float x1y0z0 = pcgUnit3(seedmaxX,seedminY,seedminZ);
float x1y0z1 = pcgUnit3(seedmaxX,seedminY,seedmaxZ);
float x1y1z0 = pcgUnit3(seedmaxX,seedmaxY,seedminZ);
float x1y1z1 = pcgUnit3(seedmaxX,seedmaxY,seedmaxZ);
...
}
In my procedural generation algorithm, I call the perlin noise function multiple times.
I notice that for every time I add a call to the perlin noise function, the compile time for the shaders takes longer. I think the problem is that glsl inlines the function calls, and thereby the shader code becomes very large.
Am I correct about the inlining, and if so, how do I prevent it from happening?
On page 13 of the GLSL_ES_Specification_1.00 it is described, how to set pragma derectives
#pragma optimize(off)
#pragma debug(on)
That would be the "inlining" case. "Am I correct about the inlining ..". For my knowlenge, no. Also not quite sure what is happening in the background. On Windows it gets transpiled to ANGEL, maybe have a look at the source could help.
In general: try to lower the precession, consider that three digits 1.000 could be "nice" enough. Floats can only store 5 -6 digits anyway. Avoid branching. That should be the First Aid Kid.
Some debugger tools out in the wild (please make your own research)
http://glsl-debugger.github.io/
http://shader-playground.timjones.io/
Optimizer
https://zz85.github.io/glsl-optimizer/
I'm using OpenGL 3.3 GLSL 1.5 compatibility. I'm getting a strange problem with my vertex data. I'm trying to pass an index value to the fragment shader, but the value seems to change based on my camera position.
This should be simple : I pass a GLfloat through the vertex shader to the fragment shader. I then convert this value to an unsigned integer. The value is correct the majority of the time, except for the edges of the fragment. No matter what I do the same distortion appears. Why is does my camera position change this value? Even in the ridiculous example below, tI erratically equals something other than 1.0;
uint i;
if (tI == 1.0) i = 1;
else i = 0;
vec4 color = texture2D(tex[i], t) ;
If I send integer data instead of float data I get the exact same problem. It does not seem to matter what I enter as vertex Data. The value I enter into the data is not consistent across the fragment. The distortion even looks the exact same each time.
What you are doing here is invalid in OpenGL/GLSL 3.30.
Let me quote the GLSL 3.30 specification, section 4.1.7 "Samplers" (emphasis mine):
Samplers aggregated into arrays within a shader (using square brackets
[ ]) can only be indexed with integral constant expressions (see
section 4.3.3 “Constant Expressions”).
Using a varying as index to a texture does not represent a constant expression as defined by the spec.
Beginning with GL 4.0, this was somewhat relaxed. The GLSL 4.00 specification states now the following (still my emphasis):
Samplers aggregated into arrays within a shader (using square brackets
[ ]) can only be indexed with a dynamically uniform integral
expression, otherwise results are undefined.
With dynamically uniform being defined as follows:
A fragment-shader expression is dynamically uniform if all fragments
evaluating it get the same resulting value. When loops are involved,
this refers to the expression's value for the same loop iteration.
When functions are involved, this refers to calls from the same call
point.
So now this is a bit tricky. If all fragment shader invocations actaully get the same value for that varying, it would be allowed, I guess. But it is unclear that your code guarantees that. You should also take into account that the fragment might be even sampled outside of the primitive.
However, you should never check floats for equality. There will be numerical issues. I don't know what exactly you are trying to achieve here, but you might use some simple rounding behavior, or use an integer varying. You also should disable the interpolation of the value in any case using the flat qualifier (which is required for the integer case anyway), which should greatly improve the changes of that construct to become dynamically uniform.
I've written my first couple of GLSL programs for Processing (a visual language similar to Java that can load shaders) recently that make fractals. In the loop that handles the fractal code, I have an escape conditional that breaks if a point would tend to infinity.
It works fine and it is similar to how I generally write the code for non-GLSL. However someone told me that two paths are calculated every time a conditional is executed. I've had a hard time finding exactly how much of a penalty is caused by conditionals in GLSL.
Edit: To the best of my understanding in non-GLSL when an if is encountered a path is assumed. If the "correct" path was assumed everything is great. If the "wrong" path was assumed then "bad" work is discarded and instructions continue along the "correct" path. The penalty might be say 3 (or whatever number) of instructions. I want to know if there is some number (3 or whatever) of instructions that are the penalty or if both paths are calculated all the way through.
Here is the code if the explanation is not clear enough:
// Mandelbrot Set code
int i = 0;
float zr = x;
float zi = y;
for (; i < maxIterations; i++) {
float sqZr = zr*zr;
float sqZi = zi*zi;
float twoZri = 2.0*zr*zi;
zr = sqZr-sqZi+x;
zi = twoZri+y;
if (sqZr+sqZi > 16.0) break;
}
On old GPUs, both sides of an if() clause were executed and the correct result chosen at the end. On newer ones, this is only the case if the compiler thinks it would be more efficient. if() clauses are not free: the generic rule of thumb I have used for some time is: "if() costs 14 clock cycles" though the latest GPUs may be cheaper.
Why is this so? Because GPUs are stream processors, they want to have identical data-loading profiles for all pixels (especially for gradient values like texture colors or values from vertex registers). The principle of SIMD -- even when the devices are not strictly SIMD -- is usually the way to get the most performance from such devices.
When in doubt, see if you can use one of the NVIDIA perf analysis tools on your code, or just try writing the code (it's short!) a few different ways and comparing your performance for your specific GPU.
(BTW Processing is not Java-like: it's Java)
My 9600GT hates me.
Fragment shader:
#version 130
uint aa[33] = uint[33](
0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,
0,0,0
);
void main() {
int i=0;
int a=26;
for (i=0; i<a; i++) aa[i]=aa[i+1];
gl_FragColor=vec4(1.0,0.0,0.0,1.0);
}
If a=25 program runs at 3000 fps.
If a=26 program runs at 20 fps.
If size of aa <=32 issue doesn't appear.
Viewport size is 1000x1000.
Problem occurs only when the size of aa is >32.
Value of a as the threshold varies with the calls to the array inside the loop (aa[i]=aa[i+1]+aa[i-1] gives a different deadline).
I know gl_FragColor is deprecated. But that's not the issue.
My guess is that GLSL doesn't unroll automatically the loop if a>25 and size(aa)>32. Why. The reason why it depends on the size of the array is unknown to mankind.
A quite similar behavior explained here:
http://www.gamedev.net/topic/519511-glsl-for-loops/
Unwinding the loop manually does solve the issue (3000 fps), even if aa size is >32:
aa[0]=aa[1];
aa[1]=aa[2];
aa[2]=aa[3];
aa[3]=aa[4];
aa[4]=aa[5];
aa[5]=aa[6];
aa[6]=aa[7];
aa[7]=aa[8];
aa[8]=aa[9];
aa[9]=aa[10];
aa[10]=aa[11];
aa[11]=aa[12];
aa[12]=aa[13];
aa[13]=aa[14];
aa[14]=aa[15];
aa[15]=aa[16];
aa[16]=aa[17];
aa[17]=aa[18];
aa[18]=aa[19];
aa[19]=aa[20];
aa[20]=aa[21];
aa[21]=aa[22];
aa[22]=aa[23];
aa[23]=aa[24];
aa[24]=aa[25];
aa[25]=aa[26];
aa[26]=aa[27];
aa[27]=aa[28];
aa[28]=aa[29];
aa[29]=aa[30];
aa[30]=aa[31];
aa[31]=aa[32];
aa[32]=aa[33];
I am just putting in a summarizing answer of the comments here so this does not show up as unanswered anymore.
"#pragma optionNV (unroll all)"
fixes the immediate issue on nvidia.
In general though, GLSL compilers are very implementation dependent. The reason why there is a drop of at exactly 32 is easily explained by hitting a compiler heuristic like "don't unroll loops longer than 32". Also the huge speed difference might come from an unrolled loop using constants while a dynamic loop will require addressable array memory. Another reason could be that when unrolling dead code elimination an constant folding kicks in reducing the entire loop to nothing.
The most portable way to fix this is really manual unrolling, or even better manual constant folding. It is always questionable to compute constants in a fragment shader that can be computed outside. Some drivers might catch it for some cases, but it is better not to rely on that.
Currently I am learning how to create shaders in GLSL for a game engine I am working on, and I have a question regarding the language which puzzles me. I have learned that in shader versions lower than 3.0 you cannot use uniform variables in the condition of a loop. For example the following code would not work in shader versions older than 3.0.
for (int i = 0; i < uNumLights; i++)
{
...............
}
But isn't it possible to replace this with a loop with a fixed amount of iterations, but containing a conditional statement which would break the loop if i, in this case, is greater than uNumLights?. Ex :
for (int i = 0; i < MAX_LIGHTS; i++)
{
if(i >= uNumLights)
break;
..............
}
Aren't these equivalent? Should the latter work in older versions GLSL? And if so, isn't this more efficient and easy to implement than other techniques that I have read about, like using a different version of the shader for different number of lights?
I know this might be a silly question, but I am a beginner and I cannot find a reason why this shouldn't work.
GLSL can be confusing insofar as for() suggests to you that there must be conditional branching, even when there isn't because the hardware is unable to do it at all (which applies to if() in the same way).
What really happens on pre-SM3 hardware is that the HAL inside your OpenGL implementation will completely unroll your loop, so there is actually no jump any more. And, this explains why it has difficulties doing so with non-constants.
While technically possible to do it with non-constants anyway, the implementation would have to recompile the shader every time you change that uniform, and it might run against the maximum instruction count if you're just allowed to supply any haphazard number.
That is a problem because... what then? That's a bad situation.
If you supply a too big constant, it will give you a "too many instructions" compiler error when you build the shader. Now, if you supply a silly number in an uniform, and the HAL thus has to produce new code and runs against this limit, what can OpenGL do?
You most probably validated your program after compiling and linking, and you most probably queried the shader info log, and OpenGL kept telling you that everything was fine. This is, in some way, a binding promise, it cannot just decide otherwise all of a sudden. Therefore, it must make sure that this situation cannot arise, and the only workable solution is to not allow uniforms in conditions on hardware generations that don't support dynamic branching.
Otherwise, there would need to be some form of validation inside glUniform that rejects bad values. However, since this depends on successful (or unsuccessful) shader recompilation, this would mean that it would have to run synchronously, which makes it a "no go" approach. Also, consider that GL_ARB_uniform_buffer_object is exposed on some SM2 hardware (for example GeForce FX), which means you could throw a buffer object with unpredictable content at OpenGL and still expect it to work somehow! The implementation would have to scan the buffer's memory for invalid values after you unmap it, which is insane.
Similar to a loop, an if() statement does not branch on SM2 hardware, even though it looks like it. Instead, it will calculate both branches and do a conditional move.
(I'm assuming you are talking about pixel shaders).
Second variant is going to work only on gpu which supports shader model >= 3. Because dynamic branching (such as putting variable uNumLights into IF condition) is not supported on gpu shader model < 3 either.
Here you can compare what is and isn't supported between different shader models.
There is a fun work around I just figured out. Seems stupid and I can't promise you that it's a healthy choice, but it appears to work for me right now:
Set your for loop to the maximum you allow. Put a condition inside the loop to skip over the heavy routines, if the count goes beyond your uniform value.
uniform int iterations;
for(int i=0; i<10; i++){
if(i<iterations){
//do your thing...
}
}